Network Engineering #2

In the last article, we discussed how communication between computers can be explained by the OSI model. In that discussion, we assumed that we already knew the IP addresses of the computers we wanted to communicate with. However, this is not always the case, especially when the computer is far away.

We also introduced routers, but we haven't fully explored what their roles are and how they help us reach computers that are far away. In this article, we’ll cover that and try to understand how we can access any public website or application.

DNS

First, we need to understand how we determine the IP address of the server we want to access. When we want to use a web service like Google, we enter a URL with a domain name (e.g., google.com) instead of an IP address, which is hard for humans to remember. From the URL, we need to find the IP address that maps to it so we can establish communication.

What the browser and operating system do is check if they have accessed the service recently. If they have, they likely stored the IP address of the domain and can connect to the server immediately. However, if they haven't accessed it recently, they must ask a DNS resolver—a server maintained by the Internet Service Provider (ISP)—to identify the IP address.

Once the DNS resolver receives the request from our computer, it checks if it already has the IP address in its cache and can respond quickly if so. If the IP address isn't cached, the resolver must make requests to other servers that store domain and IP address information. First, it queries the root server, which stores the location of Top-Level Domain (TLD) servers. The TLD is the suffix of a domain, like .com, .net, or .blog. The root server looks at the TLD of the request and provides the location of the associated TLD server if it exists.

Next, the resolver sends a request to the TLD server to ask for the location of the authoritative name server, which holds the specific IP address for the domain. Once the resolver receives the location of the authoritative name server, it sends a final request to obtain the IP address. The resolver caches the domain-IP pair and returns the IP address to our computer, which also caches the pair. Now, with the destination IP address, we can attach it to our request and forward it.

Why All This?

You might wonder why we need to go through this process to get an IP address. In theory, we could store all domain and IP address pairs in a table on our devices, which would eliminate the need for DNS lookups. However, there are billions of domains, and our computer memory would be overwhelmed by just storing that table. You might think we could store the table in the DNS resolvers, but again, there are billions of domains, and broadcasting updates to every DNS resolver in the world whenever a domain’s IP address changes would be impractical.

The current DNS system is a clever implementation of a divide-and-conquer strategy, distributing the task across multiple servers to handle the massive amount of data and queries. The root server only knows where the TLD servers are, and these locations don’t change often, making it easier to maintain. Similarly, TLD servers only manage the locations of authoritative name servers within their domain, offloading the responsibility of storing domain-IP pairs to the authoritative name servers, which are maintained by multiple domain owners or hosting providers. So, when we want to change a domain’s IP address, we only need to update the authoritative name server’s entry.

Routing

Now that we understand how to obtain the destination IP address, we need to explore how routers determine the best path to reach that destination.

An IP address can be broken down into two parts: the first part represents the subnetwork (or network) it belongs to, and the second part is the more specific location within that subnetwork. You can think of an IP address as having a structure similar to a physical mailing address, where the broader parts represent the country or state, and the finer parts represent the street or house number. If a router stores a table of routes, it can find the closest matching router that is likely to be physically close to the destination by looking for the longest prefix match, which is what routers do.

Routers constantly exchange information about their connections with other routers, sharing metrics like hop count, bandwidth, and latency. They use this data to update their routing tables. (A more detailed explanation of routing protocols is outside the scope of this article, but if you're interested, I recommend researching RIP, OSPF, and BGP.) When multiple routers have the same longest prefix match, the router uses these metrics to decide which interface to forward the request.

NAT & Port Forwarding

In the last article, we mentioned how routers replace private IP addresses with a public IP address, allowing multiple devices to share a single public IP address. However, if all devices use the same public IP, how does the router know which device should receive the response?

Some routers solve this with Port Address Translation (PAT), a specific implementation of Network Address Translation (NAT). PAT keeps a table of public port numbers and their corresponding private IP addresses. When a response comes back, the router looks up the port number in the table to determine the correct private IP address for the destination device. This operation extends to the 4th transport layer (where port numbers are defined) and makes changes at the 3rd network layer (IP addresses).

Another important function of routers is to handle requests from external computers to devices on the private network. If a server is running on a private network device, the router may be configured to forward requests from a specific public port to the correct private IP address and port. This is called port forwarding.

Conclusion

Now that we understand how to obtain a destination IP address, how routers find the optimal path using the longest prefix match, and how NAT allows multiple devices to share a public IP, we are ready to see how it all comes together to access any server on the internet.

For instance, when you try to access "tkdev.blog," your computer first resolves the domain to an IP address by contacting a DNS resolver. The DNS resolver queries the root, TLD, and authoritative name servers to obtain the IP address. Then, the request is forwarded to the router, which finds the best path using the longest prefix match and routing metrics.

When the router forwards the request to a server, it uses its public IP address with NAT. The router forwards the request to the appropriate server with port forwarding. The server processes the request, and when the response comes back, the router uses its NAT table to forward the response to the correct private IP address in the network. Now that we've covered the basics of how IP addresses work, in the next article, we’ll zoom out look at different communication methods that use this mechanism.

Resources

Computerphile. 2013. IP Addresses and the Internet - Computerphile. YouTube.
Computerphile. 2013. Network Address Translation - Computerphile. YouTube.
Computerphile. 2015. Routers, The Internet & YouTube Offline - Computerphile. YouTube.