It’s time for a short lecture on complexity.
Networks are complex. This should not be surprising, as building a system that can solve hard problems, while also adapting quickly to changes in the real world, requires complexity—the harder the problem, the more adaptable the system needs to be, the more resulting design will tend to be. Networks are bound to be complex, because we expect them to be able to support any application we throw at them, adapt to fast-changing business conditions, and adapt to real-world failures of various kinds.
There are several reactions I’ve seen to this reality over the years, each of which has their own trade-offs.
We all use the OSI model to describe the way networks work. I have, in fact, included it in just about every presentation, and every book I have written, someplace in the fundamentals of networking. But if you have every looked at the OSI model and had to scratch your head trying to figure out how it really fits with the networks we operate today, or what the OSI model is telling you in terms of troubleshooting, design, or operation—you are not alone. Lots of people have scratched their heads about the OSI model, trying to understand how it fits with modern networking. There is a reason this is so difficult to figure out.
The OSI Model does not accurately describe networks.
What set me off in this particular direction this week is an article over at Errata Security:
The OSI Model was created by international standards organization for an alternative internet that was too complicated to ever work, and which never worked, and which never came to pass. Sure, when they created the OSI Model, the Internet layered model already existed, so they made sure to include today’s Internet as part of their model. But the focus and intent of the OSI’s efforts was on dumb networking concepts that worked differently from the Internet.
When a recursive resolver receives a query from a host, it will first consult any local cache to discover if it has the information required to resolve the query. If it does not, it will begin with the rightmost section of the domain name, the Top Level Domain (TLD), moving left through each section of the Fully Qualified Domain Name (FQDN), in order to find an IP address to return to the host, as shown in the diagram below.
This is pretty simple at its most basic level, of course—virtually every network engineer in the world understands this process (and if you don’t, you should enroll in my How the Internet Really Works webinar the next time it is offered!). The question almost no-one ever asks, however, is: what, precisely, is the recursive server sending to the root, TLD, and authoritative servers?
A long time ago, I worked in a secure facility. I won’t disclose the facility; I’m certain it no longer exists, and the people who designed the system I’m about to describe are probably long retired. Soon after being transferred into this organization, someone noted I needed to be trained on how to change the cipher door locks. We gathered up a ladder, placed the ladder just outside the door to the secure facility, popped open one of the tiles on the drop ceiling, and opened a small metal box with a standard, low security key. Inside this box was a jumper board that set the combination for the secure door.
First lesson of security: there is (almost) always a back door.
I was reminded of this while reading a paper recently published about a backdoor attack on certificate authorities. There are, according to the paper, around 130 commercial Certificate Authorities (CAs). Each of these CAs issue widely trusted certificates used for everything from TLS to secure web browsing sessions to RPKI certificates used to validate route origination information. When you encounter these certificates, you assume at least two things: the private key in the public/private key pair has not been compromised, and the person who claims to own the key is really the person you are talking to. The first of these two can come under attack through data breaches. The second is the topic of the paper in question.
How do CAs validate the person asking for a certificate actually is who they claim to be? Do they work for the organization they are obtaining a certificate for? Are they the “right person” within that organization to ask for a certificate? Shy of having a personal relationship with the person who initiates the certificate request, how can the CA validate who this person is and if they are authorized to make this request?
Privacy problems are an area of wide concern for individual users of the Internet—but what about network operators? In this issue of The Internet Protocol Journal, Geoff Huston has an article up about privacy in DNS, and the various attempts to make DNS private on the part of the IETF—the result can be summarized with this long, but entertaining, quote:
The Internet is largely dominated, and indeed driven, by surveillance, and pervasive monitoring is a feature of this network, not a bug. Indeed, perhaps the only debate left today is one over the respective merits and risks of surveillance undertaken by private actors and surveillance by state-sponsored actors. … We have come a very long way from this lofty moral stance on personal privacy into a somewhat tawdry and corrupted digital world, where “do no evil!” has become “don’t get caught!”
Before diving into a full-blown look at the many problems with DNS security, it is worth considering what kinds of information can leak through the DNS system. Let’s ignore the recent discovery that DNS queries can be used to exfiltrate data; instead, let’s look at more mundane data leakage from DNS queries.
sarcasm warning—take the following post with a large grain of salt
A thousand years from now, when someone is writing the history of computer networks, one thing they should notice is how we tend to reduce our language so as many terms as possible have precisely the same meaning. They might attribute this to marketing, or the hype cycle, but whatever the cause this is clearly a trend in the networking world. Some examples might be helpful, so … forthwith, the reduced terminology of the networking world.
Software Defined Networking (SDN): Used to mean a standardized set of interfaces that enabled open access to the forwarding hardware. Came to mean some form of control plane centralization over time. Now means automated configuration and management of network devices, centralized control planes, traffic engineering, and just about everything else.
Fabric: Used to mean a regular, non-planar, repeating network topology with scale-out characteristics. Now means any vaguely hierarchical topology that is not a ring.
The Internet, and networking protocols more broadly, were grounded in a few simple principles. For instance, there is the end-to-end principle, which argues the network should be a simple fat pipe that does not modify data in transit. Many of these principles have tradeoffs—if you haven’t found the tradeoffs, you haven’t looked hard enough—and not looking for them can result in massive failures at the network and protocol level.
Another principle networking is grounded in is the Robustness Principle, which states: “Be liberal in what you accept, and conservative in what you send.” In protocol design and implementation, this means you should accept the widest range of inputs possible without negative consequences. A recent draft, however, challenges the robustness principle—draft-iab-protocol-maintenance.
According to the authors, the basic premise of the robustness principle lies in the problem of updating older software for new features or fixes at the scale of an Internet sized network. The general idea is a protocol designer can set aside some “reserved bits,” using them in a later version of the protocol, and not worry about older implementations misinterpreting them—new meanings of old reserved bits will be silently ignored. In a world where even a very old operating system, such as Windows XP, is still widely used, and people complain endlessly about forced updates, it seems like the robustness principle is on solid ground in this regard.
There is a rule in sports and music about practice—the 10,000 hour rule—which says that if you want to be an expert on something, you need ten thousand hours of intentional practice. The corollary to this rule is: if you want to be really good at something, specialize. In colloquial language, you cannot be both a jack of all trades and a master of one.
Translating this to the network engineering world, we might say something like: it takes 10,000 hours to really know the full range of products from vendor x and how to use them. Or perhaps: only after you have spent 10,000 hours of intentional study and practice in building data center networks will you know how to build these things. We might respond to this challenge by focusing our studies and time in one specific area, gaining one series of certifications, learning one vendor’s gear, or learning one specific kind of work (such as design or troubleshooting).
This line of thinking, however, should immediately raise two questions. First, is it true? Anecdotal evidence seems to abound for this kind of thinking; we have all heard of the child prodigy who spent their entire lives focusing on a single sport. We also all know of people who have “paper skills” instead of “real skills;” the reason we often attribute to this is they have not done enough lab work, or they have not put in hours configuring, troubleshooting, or working on the piece of gear in question. Second, is it healthy for the person or the organization the person works for?
Floating point is not something many network engineers think about. In fact, when I first started digging into routing protocol implementations in the mid-1990’s, I discovered one of the tricks you needed to remember when trying to replicate the router’s metric calculation was always round down. When EIGRP was first written, like most of the rest of Cisco’s IOS, was written for processors that did not perform floating point operations. The silicon and processing time costs were just too high.
What brings all this to mind is a recent article on the problems with floating point performance over at The Next Platform by Michael Feldman. According to the article:
Over at the Communications of the ACM, Micah Beck has an article up about the hourglass model. While the math is quite interesting, I want to focus on transferring the observations from the realm of protocol and software systems development to network design. Specifically, start with the concept and terminology, which is very useful. Taking a typical design, such as this—
The first key point made in the paper is this—
The thin waist of the hourglass is a narrow straw through which applications can draw upon the resources that are available in the less restricted lower layers of the stack.