How does an HTTP request get to the webserver?

Eddie Knight
4 min readSep 21, 2022

Let’s explore some common elements in enterprise architecture.

This is a deep dive to answer the question above, as part of a larger question: What happens when you type “google.com” into the browser and press enter?

In a previous article, we prepared ourselves to send a hypertext transfer protocol request to https://google.com.

A GET request is helpful in this situation because the body of that request is unnecessary and ignored by default. It’s usually left empty, and uses the fewest IP packets to communicate. The only information that the web server needs to know is the path that we’re requesting to view.

For older or less sophisticated applications, the IP address may take us directly to the server we’re trying to access. But we may also find that our request makes some additional hops after reaching the IP and before it reaches the web server.

Fpr a website as large as google.com, we’ll have multiple layers of infrastructure that our request must travel through after reaching the IP address that we’re targeting.

The following description is true for many large applications… though every infrastructure is built differently and I can only safely discuss common features. Much like the entry lobby to a large building, the exact flow of traffic and layout of the floor plan is unique in every situation, but some common trends exist.

The first infrastructure element we’re likely to encounter is the content delivery network.

The CDN acts as a buffer to help alleviate the work required for common tasks on a web server. Instead of going to the web server every time to get the same information, the CDN will cash the page on its own servers, spread out all around the world and will respond to the request from the location closest to the requester.

This saves processing power on the web servers and dramatically decreases the time it takes for us to load the webpage. In the case of google.com, it’s very likely that the homepage is delivered to us by a CDN, saving critical resources for the web searches and intensive logic.

But if we’re the first visitors to google.com after an update, the page won’t yet be cached with the CDN — and it’ll need to request the page update from the web servers.

Within the CDN, a piece of software called a reverse proxy will have instructions configured for the IP address we’re trying to reach. Information such as the port we’re requesting and the path we typed in the browser will all be considered by the reverse proxy, in addition to other configurations. This might include A/B testing, or other features set in place by the website’s maintainers.

Then a firewall is likely to exist alongside the CDN.

The firewall will have its own logic configured to determine what behaviors and origin points are allowed to continue. In many cases, the firewall will live immediately below the CDN, rejecting any traffic that doesn’t come through via the CDN.

After our request has been passed through the CDN and the firewall, it will encounter a load balancer. This will ensure our request continues on its journey to the proper region, and a machine that isn’t already overwhelmed by requests from other users.

In the case of google.com, it’s likely that we’ll also encounter a load balancer on Kubernetes, which is an orchestration technology that allows a large number of servers to be created, managed, and destroyed by automated processes.

The finer points of Kubernetes are best explored in a dedicated session, but for now we can safely say that our webserver lives in a pod within the Kubernetes cluster, that our request was passed to. Logic at the cluster level will balance traffic to an available pod that contains the webserver we’ve been assigned to interact with.

The webserver will have an HTTP server process with logic similar to the reverse proxy that handled our request at the top layer. Programmed instructions guide the machine to respond in a particular way based on a variety of parameters, including the port that our request is targeted.

The webserver will customize the information it sends back to us based on what our browser provided in the request, such as cookies specifying whether or not we’re logged in. Other information may come from outside of the webserver if it has instructions to reach out to external sources, such as an API to populate the image of the day.

Once the web server finishes running through its logic to handle our request, it will send its reply back to whatever process had directly called it. The reply will eventually be passed all the way back to the CDN, which will send an HTTP response back to our machine.

Continue reading to finish our journey as the HTTP response makes its way back to the browser.

If you want to see more articles like this, please hit the “like” button (or hold it down!) so I know to create more content like this in the future.

--

--