We all use the OSI model to describe the way networks work. I have, in fact, included it in just about every presentation, and every book I have written, someplace in the fundamentals of networking. But if you have every looked at the OSI model and had to scratch your head trying to figure out how it really fits with the networks we operate today, or what the OSI model is telling you in terms of troubleshooting, design, or operation—you are not alone. Lots of people have scratched their heads about the OSI model, trying to understand how it fits with modern networking. There is a reason this is so difficult to figure out.

The OSI Model does not accurately describe networks.

What set me off in this particular direction this week is an article over at Errata Security:

The OSI Model was created by international standards organization for an alternative internet that was too complicated to ever work, and which never worked, and which never came to pass. Sure, when they created the OSI Model, the Internet layered model already existed, so they made sure to include today’s Internet as part of their model. But the focus and intent of the OSI’s efforts was on dumb networking concepts that worked differently from the Internet.

This is partly true, and yet a bit … over the top. 🙂 OTOH, the point is well taken: the OSI model is not an ideal model for understanding networks. Maybe a bit of analysis would be helpful in understanding why.

First, while the OSI model was developed with packet switching networks in mind, the general idea was to come as close as possible to emulating the circuit-switched networks widely deployed at the time. A lot of thought had gone into making those circuit-switched networks work, and applications had been built around the way they worked. Applications and circuit-switched networks formed a sort of symbiotic relationship, just as applications form with packet-switched networks today; it was unimaginable, at the time, that “everything would change.”

So while the designers of the OSI model understood the basic value of the packet-switched network, they also understood the value of the circuit-switched network, and tried to find a way to solve both sets of problems in the same network. Experience has shown it is possible to build a somewhat close-to-circuit switched network on top of packet switched networks, but not quite in the way, nor as close to perfect emulation, as those original designers thought. So the OSI model is a bit complex and perhaps overspecified, making it less-than-useful today.

Second, the OSI model largely ignored the role of middleboxes, focusing instead on the stacks implemented and deployed in hosts. This, again, makes sense, as there was no such thing as a device specialized in the switching of packets at the time. Hosts took packets in and processed them. Some packets were sent along to other hosts, other packets were consumed locally. Think PDP-11 with some rough code, rather than even an early Cisco CGS.

Third, the OSI model focuses on what each layer does from the perspective of an application, rather than focusing on what is being done to the data in order to transmit it. The OSI model is built “top down,” rather than “bottom up,” in other words. While this might be really useful if you are an application developer, it is not so useful if you are a network engineer.

So—what should we say about the OSI model?

It was much more useful at some point in the past, when networking was really just “something a host did,” rather than its own sort of sub-field, with specialized protocols, techniques, and designs. It was a very good attempt at sorting out what a network needed to do to move traffic, from the perspective of an application.

What it is not, however, is really all that useful for network engineers working within an engineering specialty to understand how to design protocols, and how to design networks on which those protocols will run. What should we replace it with? I would begin by pointing you to the RINA model, which I think is a better place to start. I’ve written a bit about the RINA model, and used the RINA model as one of the foundational pieces of Computer Networking Problems and Solutions.

Since writing that, however, I have been thinking further about this problem. Over the next six months or so, I plan to build a course around this question. For the moment, I don’t want to spoil the fun, or put any half-backed thoughts out there in the wild.

When a recursive resolver receives a query from a host, it will first consult any local cache to discover if it has the information required to resolve the query. If it does not, it will begin with the rightmost section of the domain name, the Top Level Domain (TLD), moving left through each section of the Fully Qualified Domain Name (FQDN), in order to find an IP address to return to the host, as shown in the diagram below.

This is pretty simple at its most basic level, of course—virtually every network engineer in the world understands this process (and if you don’t, you should enroll in my How the Internet Really Works webinar the next time it is offered!). The question almost no-one ever asks, however, is: what, precisely, is the recursive server sending to the root, TLD, and authoritative servers?

Begin with the perspective of a coder who is developing the code for that recursive server. You receive a query from a host, you have the code check the local cache, and you find there is no matching information available locally. This means you need to send a query out to some other server to determine the correct IP address to return to the host. You could keep a copy of the query from the host in your local cache and build a new query to send to the root server.

Remember, however, that local server resources may be scarce; recursive servers must be optimized to process very high query rates very quickly. Much of the user’s perception of network performance is actually tied to DNS performance. A second option is you could save local memory and processing power by sending the entire query, as you have received it, on to the root server. This way, you do not need to build a new query packet to send to the root server.

Consider this process, however, in the case of a query for a local, internal resource you would rather not let the world know exists. The recursive server, by sending the entire query to the root server, is also sending information about the internal DNS structure and potential internal server names to the external root server. As the FQDN is resolved (or not), this same information is sent to the TLD and authoritative servers, as well.

There is something else contained here, however, that is not so obvious—the IP address of the requestor is contained in that original query, as well. Not only is your internal namespace leaking, your internal IP addresses are leaking, as well.

This is not only a massive security hole for your organization, it also exposes information from individual users on the global ‘net.

There are several things that can be done to resolve this problem. Organizationally, running a private DNS server, hard coding resolving servers for internal domains, and using internal domains that are not part of the existing TLD infrastructure, can go a long way towards preventing information leaking of this kind through DNS. Operating a DNS server internally might not be ideal, of course, although DNS services are integrated into a lot of other directory services used in operational networks. If you are using a local DNS server, it is important to remember to configure DHCP and/or IPv6 ND to send the correct, internal, DNS server address, rather than an external address. It is also important to either block or redirect DNS queries sent to public servers by hosts using hard-coded DNS server configurations.

A second line of defense is through DNS query minimization. Described in RFC7816, query minimization argues recursive servers should use QNAME queries to only ask about the one relevant part of the FQDN. For instance, if the recursive server receives a query for www.banana.example, the server should request information about .example from the root server, banana.example from the TLD, and send the full requested domain name only to the authoritative server. This way, the full search is not exposed to the intermediate servers, protecting user information.

Some recursive server implementations already support QNAME queries. If you are running a server for internal use, you should ensure the server you are using supports DNS query minimization. If you are directing your personal computer or device to publicly reachable recursive servers, you should investigate whether these servers support DNS query minimization.

Even with DNS query minimization, your recursive server still knows a lot about what you ask for—the topic of discussion on a forthcoming episode of the Hedge, where our guest will be Geoff Huston.

YANG is a data modeling language used to model configuration data, state data, Remote Procedure Calls, and notifications for network management protocols, described in RFC7950. The origins of YANG are rooted in work Phil Shafer did in building an interface system for JUNOS. Phil joins us on this episode of the History of Networking to discuss the history of YANG.


The Internet, and networking protocols more broadly, were grounded in a few simple principles. For instance, there is the end-to-end principle, which argues the network should be a simple fat pipe that does not modify data in transit. Many of these principles have tradeoffs—if you haven’t found the tradeoffs, you haven’t looked hard enough—and not looking for them can result in massive failures at the network and protocol level.

Another principle networking is grounded in is the Robustness Principle, which states: “Be liberal in what you accept, and conservative in what you send.” In protocol design and implementation, this means you should accept the widest range of inputs possible without negative consequences. A recent draft, however, challenges the robustness principle—draft-iab-protocol-maintenance.

According to the authors, the basic premise of the robustness principle lies in the problem of updating older software for new features or fixes at the scale of an Internet sized network. The general idea is a protocol designer can set aside some “reserved bits,” using them in a later version of the protocol, and not worry about older implementations misinterpreting them—new meanings of old reserved bits will be silently ignored. In a world where even a very old operating system, such as Windows XP, is still widely used, and people complain endlessly about forced updates, it seems like the robustness principle is on solid ground in this regard.

The argument against this in the draft is implementing the robustness principle allows a protocol to degenerate over time. Older implementations are not removed from service because it still works, implementations are not updated in a timely manner, and the protocol tends to have an ever-increasing amount of “dead code” in the form of older expressions of data formats. Given an infinite amount of time, an infinity number of versions of any given protocol will be deployed. As a result, the protocol can and will break in an infinite number of ways.

The logic of the draft is something along the lines of: old ways of doing things should be removed from protocols which are actively maintained in order to unify and simplify the protocol. At least for actively maintained protocols, reliance on the robustness principle should be toned down a little.

Given the long list of examples in the draft, the authors make a good case.

There is another side to the argument, however. The robustness principle is not “just” about keeping older versions of software working “at scale.” All implementations, no matter how good their quality, have defects (or rather, unintended features). Many of these defects involve failing to release or initialize memory, failing to bounds check inputs, and other similar oversights. A common way to find these errors is to fuzz test code—throw lots of different inputs at it to see if it spits up an error or crash.

The robustness principle runs deeper than infinite versions—it also helps implementations deal with defects in “the other end” that generate bad data. The robustness principle, then, can help keep a network running even in the face of an implementation defect.

Where does this leave us? Abandoning the robustness principle is clearly not a good thing—while the network might end up being more correct, it might also end up simply not running. Ever. The Internet is an interlocking system of protocols, hardware, and software; the robustness principle is the lubricant that makes it all work at all.

Clearly, then, there must be some sort of compromise position that will work. Perhaps a two pronged attack might work. First, don’t discard errors silently. Instead, build logging into software that catches all errors, regardless of how trivial they might seem. This will generate a lot of data, but we need to be clear on the difference between instrumenting something and actually paying attention to what is instrumented. Instrumenting code so that “unknown input” can be caught and logged periodically is not a bad thing.

Second, perhaps protocols need some way to end of life older versions. Part of the problem with the robustness principle is it allows an infinite number of versions of a single protocol to exist in the same network. Perhaps the IETF and other standards organizations should rethink this, explicitly taking older ways of doing things out of specs on a periodic basis. A draft that says “you shouldn’t do this any longer,” or “this is no longer in accordance with the specification,” would not be a bad thing.

For the more “average” network engineer, this discussion around the robustness principle should lead to some important lessons, particularly as we move ever more deeply into an automated world. Be clear about versioning of APIs and components. Deprecate older processes when they should no longer be used.

Control your technical debt, or it will control you.

Multicast is, at best, difficult to deploy in large scale networks—PIM sparse and BIDIR are both complex, adding large amounts of state to intermediate devices. In the worst case, there is no apparent way to deploy any existing version of PIM, such as large-scale spine and leaf networks (variations on the venerable Clos fabric). BEIR, described in RFC8279, aims to solve the per-device state of traditional multicast.

In this network, assume A has some packet that needs to be delivered to T, V, and X. A could generate three packets, each one addressed to one of the destinations—but replicating the packet at A is wastes network resources on the A->B link, at least. Using PIM, these three destinations could be placed in a multicast group (a multicast address can be created that describes T, V, and X as a single destination). After this, a reverse shortest path tree can be calculated from each of the destinations in the group towards the source, A, and the correct forwarding state (the outgoing interface list) be installed at each of the routers in the network (or at least along the correct paths). This, however, adds a lot of state to the network.
BIER provides another option.

Returning to the example, if A is the sender, B is called the Bit-Forwarding Ingress Router, or BFIR. When B receives this packet from A, it determines T, V, and X are the correct destinations, and examines its local routing table to determine T is connected to M, V, is connected to N, and X is connected to R. Each of these are called a Bit-Forwarding Egress Router (BFER) in the network; each has a particular bit set aside in a bit field. For instance, if the network operator has set up an 8-bit BIER bit field, M might be bit 3, N might be bit 4, and R might be bit 6.

Given this, A will examine its local routing table to find the shortest path to M, N, and R. It will find the shortest path to M is through D, the shortest path to N is through C, and the shortest path to R is through C. A will then create two copies of the packet (it will replicate the packet). It will encapsulate one copy in a BIER header, setting the third bit in the header, and send the packet on to D. It will encapsulate the second copy into a BIER header, as well, setting the fourth and sixth bits in the header, and send the packet on to C.

D will receive the packet, determine the destination is M (because the third bit is set in the BIER header), look up the shortest path to M in its local routing table, and forward the packet. When M receives the packet, it pops the BIER header, finds T is the final destination address, and forwards the packet.
When C receives the packet, it finds there are two destinations, R and N, based on the bits set in the BIER header. The shortest path to N is through H, so it clears bit 6 (representing R as a destination), and forwards the packet to H. When H receives the packet, it finds N is the destination (because bit 4 is set in the BIER header), looks up N in its local routing table, and forwards the packet along the shortest path towards N. N will remove the BIER header and forward the packet to V. C will also forward the packet to G after clearing bit 4 (representing N as a destination BFER). G will examine the BIER header, find that R is a destination, examine its local routing table for the shortest path to R, and forward the packet. R, on receiving the packet, will strip the BIER header and forward the packet to X.

While the BFIR must know how to translate a single packet into multiple destinations, either through a group address, manual configuration, or some other signaling mechanism, the state at the remaining routers in the network is a simple table of BIER bitfield mappings to BFER destination addresses. It does not matter how large the tree gets, the amount of state is held to the number of BFERs at the network edge. This is completely different from any version of PIM, where the state at every router along the path depends on the number of senders and receivers.

The tradeoff is the hardware of the BIER Forwarding Routers (BFRs) must be able to support looking up the destination based on the bit set in the BIER bitfield in the header. Since the size of the BIER bitfield is variable, this is… challenging. One option is to use an MPLS label stack as a way to express the BIER header, which leverages existing MPLS implementations. Still, though, MPLS label depth is often limited, the ability to forward to multiple destinations is challenging, and the additional signaling required is more than trivial.

BIER is an interesting technology; it seems ideal for use in highly meshed data center fabrics, and even in longer haul and wireless networks where bandwidth is at a premium. Whether or not the hardware challenges can be overcome is a question yet to be answered.