Lessons Learned from the Robustness Principle
The Internet, and networking protocols more broadly, were grounded in a few simple principles. For instance, there is the end-to-end principle, which argues the network should be a simple fat pipe that does not modify data in transit. Many of these principles have tradeoffs—if you haven’t found the tradeoffs, you haven’t looked hard enough—and not looking for them can result in massive failures at the network and protocol level.
Another principle networking is grounded in is the Robustness Principle, which states: “Be liberal in what you accept, and conservative in what you send.” In protocol design and implementation, this means you should accept the widest range of inputs possible without negative consequences. A recent draft, however, challenges the robustness principle—draft-iab-protocol-maintenance.
According to the authors, the basic premise of the robustness principle lies in the problem of updating older software for new features or fixes at the scale of an Internet sized network. The general idea is a protocol designer can set aside some “reserved bits,” using them in a later version of the protocol, and not worry about older implementations misinterpreting them—new meanings of old reserved bits will be silently ignored. In a world where even a very old operating system, such as Windows XP, is still widely used, and people complain endlessly about forced updates, it seems like the robustness principle is on solid ground in this regard.
The End of Specialization?
There is a rule in sports and music about practice—the 10,000 hour rule—which says that if you want to be an expert on something, you need ten thousand hours of intentional practice. The corollary to this rule is: if you want to be really good at something, specialize. In colloquial language, you cannot be both a jack of all trades and a master of one.
Translating this to the network engineering world, we might say something like: it takes 10,000 hours to really know the full range of products from vendor x and how to use them. Or perhaps: only after you have spent 10,000 hours of intentional study and practice in building data center networks will you know how to build these things. We might respond to this challenge by focusing our studies and time in one specific area, gaining one series of certifications, learning one vendor’s gear, or learning one specific kind of work (such as design or troubleshooting).
This line of thinking, however, should immediately raise two questions. First, is it true? Anecdotal evidence seems to abound for this kind of thinking; we have all heard of the child prodigy who spent their entire lives focusing on a single sport. We also all know of people who have “paper skills” instead of “real skills;” the reason we often attribute to this is they have not done enough lab work, or they have not put in hours configuring, troubleshooting, or working on the piece of gear in question. Second, is it healthy for the person or the organization the person works for?
Disaggregation and Business Value

I recently spoke at CHINOG on the business value of disaggregation, and participated in a panel on getting involved in the IETF.
The Hedge 1: Sonia Cuff and Stress in IT

Working in information technology is notoriously stressful — but why? In this episode of the Hedge, Sonia Cuff, Denise Donohue, and Russ White dig into the reasons information technology tends to produce so much stress, and what we can do about it.
The Floating Point Fix
Floating point is not something many network engineers think about. In fact, when I first started digging into routing protocol implementations in the mid-1990’s, I discovered one of the tricks you needed to remember when trying to replicate the router’s metric calculation was always round down. When EIGRP was first written, like most of the rest of Cisco’s IOS, was written for processors that did not perform floating point operations. The silicon and processing time costs were just too high.
What brings all this to mind is a recent article on the problems with floating point performance over at The Next Platform by Michael Feldman. According to the article:
Design Intelligence from the Hourglass Model
Over at the Communications of the ACM, Micah Beck has an article up about the hourglass model. While the math is quite interesting, I want to focus on transferring the observations from the realm of protocol and software systems development to network design. Specifically, start with the concept and terminology, which is very useful. Taking a typical design, such as this—

The first key point made in the paper is this—
The thin waist of the hourglass is a narrow straw through which applications can draw upon the resources that are available in the less restricted lower layers of the stack.
The Hedge

Since leaving the Network Collective as a co-host, I have been thinking about what to do “next” in the podcast space. While I will continue recording the History of Networking series until I either run out of guests or energy, it seems like an opportune time to build something new.
Hence—the Hedge, a new podcast published here at rule11.tech.
Why the Hedge? Because the people I work with in the engineering world remind me a lot of hedgehogs. If you scare them, they curl up into a ball of spikes and hiss. Because we always have problems with people hogging the (h)edge of the network. Because the (h)edge of the network and business is important. Because a hedge is a great place to gather just to have a conversation about what is going on in the world.
You can add your own bad hedge jokes here, because I’m out for the moment.
DORA, DevOps, and Lessons for Network Engineers
DevOps Research and Assessment (DORA) released their 2018 Accelerate report on the state of DevOps at the end of 2018; I’m a little behind in my reading, so I just got around to reading it, and trying to figure out how to apply their findings to the infrastructure (networking) side of the world.
DORA found organizations that outsource entire functions, such as building an entire module or service, tend to perform more poorly than organizations that outsource by integrating individual developers into existing internal teams (page 43). It is surprising companies still think outsourcing entire functions is a good idea, given the many years of experience the IT world has with the failures of this model. Outsourced components, it seems, too often become a bottleneck in the system, especially as contracts constrain your ability to react to real-world changes.
Cloudy with a chance of lost context
One thing we often forget in our drive to “more” is this—context matters. Ivan hit on this with a post about the impact of controller failures in software-defined networks just a few days ago: A controller (or management/provisioning) system is obviously the central point of failure in any network, but we have to go beyond…
It’s not a CLOS, it’s a Clos
Way back in the day, when telephone lines were first being installed, running the physical infrastructure was quite expensive. The first attempt to maximize the infrastructure was the party line. In modern terms, the party line is just an Ethernet segment for the telephone. Anyone can pick up and talk to anyone else who happens to be listening. In order to schedule things, a user could contact an operator, who could then “ring” the appropriate phone to signal another user to “pick up.” CSMA/CA, in essence, with a human scheduler.
This proved to be somewhat unacceptable to everyone other than various intelligence agencies, so the operator’s position was “upgraded.” A line was run to each structure (house or business) and terminated at a switchboard. Each line terminated into a jack, and patch cables were supplied to the operator, who could then connect two telephone lines by inserting a jumper cable between the appropriate jacks.
An important concept: this kind of operator driven system is nonblocking. If Joe calls Susan, then Joe and Susan cannot also talk to someone other than one another for the duration of their call. If Joe’s line is tied up, when someone tries to call him, they will receive a busy signal. The network is not blocking in this case, the edge is—because the node the caller is trying to reach is already using 100% of its available bandwidth for an existing call. Blocking networks did exist in the form of trunk connections, or connections between these switch panels. Trunk connections not only consume ports on the switchboard, they are expensive to build, and they require a lot of power to run. Hence, making a “long distance call” costs money because it consumes a blocking resource. It is only when we get to packet switched digital networks that the cost of a “long distance call” drops to the rough equivalent of a “normal” call, and we see “long distance” charges fade into memory (many of my younger readers have never been charged for “long distance calls,” in fact, and may not even know what I’m talking about).
