It is Always Something (RFC1925, Rule 7)

While those working in the network engineering world are quite familiar with the expression “it is always something!,” defining this (often exasperated) declaration is a little trickier. The wise folks in the IETF, however, have provided a definition in RFC1925. Rule 7, “it is always something,” is quickly followed with a corollary, rule 7a, which says: “Good, Fast, Cheap: Pick any two (you can’t have all three).”

You can either quickly build a network which works well and is therefore expensive, or take your time and build a network that is cheap and still does not work well, or… Well, you get the idea. There are many other instances of these sorts of three-way tradeoffs in the real world, such as the (in)famous CAP theorem, which states a database can be consistent, available, and partitionable (or partitioned). Eventual consistency, and problems from microloops to surprise package deliveries (when you thought you ordered one thing, but another was placed in your cart because of a database inconsistency) have resulted. Another form of this three-way tradeoff is the much less famous, but equally true, state, optimization, surface tradeoff trio in network design.

It is possible, however, to build a system which fails at all three measures—a system which is expensive, takes a long time to build, and does not perform well. The fine folks at the IETF have provided examples of such systems.

For instance, RFC1149 describes a system of transporting IPv4 packets over avian carriers, or pigeons. This is particularly useful in areas where electricity and network cabling are not commonly found. To quote the relevant part of the RFC:

The IP datagram is printed, on a small scroll of paper, in hexadecimal, with each octet separated by whitestuff and blackstuff. The scroll of paper is wrapped around one leg of the avian carrier. A band of duct tape is used to secure the datagram’s edges. The bandwidth is limited to the leg length.

The specification of duct tape is quite odd, however; perhaps the its water-resistant properties are required for this mode of transport. This mode of transport has been adapted for use with more the more modern (in relative terms only!) IPv6 transport in RFC6214. For situation where quality of service is critical, RFC2549 describes quality of service extensions to the transport mechanism.

To further prove it is possible to build a network which is slow, expensive, and does not work well, RFC4824 describes the transmission of packets through semaphore flag signaling, or SFSS. Commonly used to signal between two ships when no other form of communication is available, SFSS consists of a sender who positions (waves) flags of particular shape and color in specific positions to signal individual characters. Rather than transmitting letters or other characters, RFC4824 describes using these flags to transmit 0’s, 1’s, and the framing elements required to transmit an IPv4 datagram over long distances. The advantage of such a system is, much like the avian carrier, that it will operate where there is no electricity. The disadvantage is the cost of encoding and decoding the packet’s contents may be many orders of magnitude more difficult than using existing SFSS specifications to signal messages.

Finally, RFC7511 provides an option which allows the most use of resources on any network while providing the slowest possible network performance by allowing senders to include a scenic route header on IPv6 packets. This header notifies routers and other networking devices that this packet would like to take the scenic route to its destination, which will cause paths including avian carriers (as described in RFC6214) to be chosen over any other available path.

Slow Learning and Range

Jack of all trades, master of none.

This singular saying—a misquote of Benjamin Franklin (more on this in a moment)—is the defining statement of our time. An alternative form might be the fox knows many small things, but the hedgehog knows one big thing.

The rules for success in the modern marketplace, particularly in the technical world, are simple: start early, focus on a single thing, and practice hard.

But when I look around, I find these rules rarely define actual success. Consider my life. I started out with three different interests, starting jazz piano lessons when I was twelve, continuing music through high school, college, and for many years after. At the same time, I was learning electronics—just about everyone in my family is in electronic engineering (or computers, when those came along) in one way or another.

I worked as on airfield electronics for a few years in the US Air Force (one of the reasons I tend to be calm is I’ve faced death up close and personal multiple times, an experience that tends to center your mind), including RADAR, radio, and instrument landing systems. Besides these two, I was highly interested in art and illustration, getting to the point of majoring in art in college for a short time, and making a living doing commercial illustration for a time.

You might notice that none of this really has a lot to do with computer networking. That’s the point.

I once thought I was a bit of an anomaly in this—in fact, I’m a bit of an anomaly throughout my life, including coming rather late to deep philosophy and theology (perhaps a bit too late!).

After reading Range by David Epstein, it turns out I’m wrong. I’m not the exception, I’m the rule. My case is so common as to be almost trivial.

Epstein not only destroys the common view—start early, stay focused, and practice hard—with reasoning, he also gives so many examples of people who have succeeded because they “wandered around” for many years before settling into a single “thing”—and sometimes just never “settling” throughout their entire lives. People who experience many different things, experimenting with ideas, careers, and paths, have what Epstein calls range.

He gives several reasons for people with range succeeding. They learn how to fail fast, unlike those who are focused on succeeding at a single thing—he calls this “too much grit.” They also learn to think outside the box—they are not restricted by the “accepted norms” within any field of study. It also turns out that slower learning is much more effective, as shown by multiple experiments.

There are three warnings about becoming a person with range, however—the fox rather than the hedgehog, so-to-speak. First, it takes a long time. Slow learning is, after all slow. Second, range works best in a world full of specialist—like the world we live in right now. In a world full of generalists, specialists are likely to succeed more often than generalist. What is different stands out (both in bad and good ways, by the way). Third, people with range do better with wicked problems—problems that are not easily solved with repetition and linear thought.

Of course, computer networks are clearly wicked problems.

That original quote that bothers me so much? Franklin did not say: jack of all trades, master of non. Instead, he said: jack of all trades, master of one. What a difference a single letter makes.

Complexity Bites Back

What percentage of business-impacting application outages are caused by networks? According to a recent survey by the Uptime Institute, about 30% of the 300 operators they surveyed, 29% have experienced network related outages in the last three years—the highest percentage of causes for IT failures across the period.

A secondary question on the survey attempted to “dig a little deeper” to understand the reasons for network failure; the chart below shows the result.

We can be almost certain the third-party failures, if the providers were queried, would break down along the same lines. Is there a pattern among the reasons for failure?

Configuration change—while this could be somewhat managed through automation, these kinds of failures are more generally the result of complexity. Firmware and software failures? The more complex the pieces of software, the more likely it is to have mission-impacting errors of some kind—so again, complexity related. Corrupted policies and routing tables are also complexity related. The only item among the top preventable causes that does not seem, at first, to relate directly to complexity is network overload and/or congestion problems. Many of these cases, however, might also be complexity related.

The Uptime Institute draws this same lesson, though through a slightly different process, saying: “Networks are complex not only technically, but also operationally.”

For years—decades, even—we have talked about the increasing complexity of networks, but we have done little about it. Yes, we have automated all the things, but automation can only carry us so far in covering complexity up. Automation also adds a large dop of complexity on top of the existing network—sometimes (not always, of course!) automating a complex system without making substantial efforts at simplification is just like trying to put a fire out with a can of gas (or, in one instance I actually saw, trying to put out an electrical fire with a can of soda, with the predictable trip to the local hospital.

We are (finally) starting to be “bit hard” by complexity problems in our networks—and I suspect this is the leading edge of the problem, rather than the trailing edge.

Maybe it’s time to realize making every protocol serve every purpose in the network wasn’t a good idea—we now have protocols that are so complex that they can only be correctly configured by machines, and then only when you narrow the use case enough to make the design parameters intelligible.

Maybe it’s time to realize optimizing for every edge use case wasn’t a good idea. Sometimes it’s just better to throw resources at the problem, rather than throwing state at the control plane to squeeze out just one more ounce of optimization.

Maybe it’s time to stop building networks around “whatever the application developer can dream up.” To start working as a team with the application developers to build a complete system that puts complexity where it most makes sense, and divides complexity from complexity, rather than just assuming “the network can do that.”

Maybe it’s time to stop thinking we can automate our way out of this.

Maybe it’s time to lay our superhero capes down and just start building simpler systems.

The Hedge 74: Brian Keys and the Complexity of User Interfaces

Crossing from the domain of test pilots to the domain of network engineering might seem like a large leap indeed—but user interfaces and their tradeoffs are common across physical and virtual spaces. Brian Keys, Eyvonne Sharp, Tom Ammon, and Russ White as we start with user interfaces and move into a wider discussion around attitudes and beliefs in the network engineering world.

download

You Can Always Add Another Layer of Indirection (RFC1925, Rule 6a)

Many within the network engineering community have heard of the OSI seven-layer model, and some may have heard of the Recursive Internet Architecture (RINA) model. The truth is, however, that while protocol designers may talk about these things and network designers study them, very few networks today are built using any of these models. What is often used instead is what might be called the Infinitely Layered Functional Indirection (ILFI) model of network engineering. In this model, nothing is solved at a particular layer of the network if it can be moved to another layer, whether successfully or not.

For instance, Ethernet is the physical and data link layer of choice over almost all types of physical medium, including optical and copper. No new type of physical transport layer (other than wireless) can succeed unless if can be described as “Ethernet” in some regard or another, much like almost no new networking software can success unless it has a Command Line Interface (CLI) similar to the one a particular vendor developed some twenty years ago. It’s not that these things are necessarily better, but they are well-known.

Ethernet, however, goes far beyond providing physical layer connectivity. Because many applications rely on using Ethernet semantics directly, many networks are built with some physical form of Ethernet (or something claiming to be like Ethernet), with IP on top of this. On top of the IP, there is some other transport protocol, such as VXLAN, UDP, GRE, or perhaps even MPLS over UDP. On top of these layers rides … Ethernet. On which IP runs. On which TCP or UDP, or potentially VXLAN runs. It turns out it is easier to add another layer of indirection to solve many of the problems caused by applications that expect Ethernet than it is to solve them with IP—or any other transport protocol. You’ve heard of turtles all the way down—today we have Ethernet all the way down.

Many other suggestions of this type have been made in network engineering and protocol design across the years, but none of them seem to have been as widely deployed as Ethernet over IP over Ethernet. For instance, RFC3252 notes it has always been difficult to understand the contents of Ethernet, IP, and other packets as they are transmitted from host to host. The eXtensible Markup Language (XML) is, on the other hand, designed to be both machine- and human-readable. A logical solution to the problem of unreadable packets, then, is to add another layer of indirection by formatting all packets, including Ethernet and IP, into XML. Once this is done, there would be no need for expensive or complex protocol analyzers, as anyone could simply capture packets off the wire and read them directly. Adding another layer, in this case, could save many hours of troubleshooting time, and generally reduce the cost of operating a network significantly.

Once the idea of adding another layer has been fully grasped, the range of problems which can be solved becomes almost limitless. Many companies struggle to find some way to provide secure remote access to their employees, contractors, and even customers. The systems designed to solve this problem are often complex, difficult to deploy, and almost impossible to troubleshoot. RFC5514, however, provides an alternate solution: simply layer an IPv6 transport stream on top of the social media networks everyone already uses. Everyone, after all, already has at least one social media account, and can already reach that social media account using at least one device. Creating an IPv6 stream across social media would provide universal cloud-based access to anyone who desires.

Such streams could even be encrypted to ensure the operators and users of the underlying social media network cannot see any private information transmitted across the IPv6 channel created in this way.

On Using the Right Word

A while back, I was sitting in a meeting where the presenter described switching from a “traditional, hierarchical data center fabric” to a spine-and-leaf (while drawing CLOS, in all capital letters, on the whiteboard). He pointed out that the spine-and-leaf design is simpler because it only has two tiers rather than three.

There is so much wrong with this I almost winced in physical pain. Traditional hierarchical designs are not fabrics. Spine-and-leaf fabrics are not CLOS, but Clos, fabrics. Clos fabrics have three stages, not two—even if we draw them “folded” so you only see two apparent levels to the fabric. In fact, all spine-and-leaf fabrics always have an odd number of stages, and they are stages, not tiers.

More recently, I heard someone talking about an operating system that was built using microservices. I thought—“that would be at neat trick.” To build something with microservices does not just mean a piece of software using modules—this would be modular application (or operating system) design. Microservices architectures break the application up into the most basic components possible and then scale each kind of component out (rather than up) by spinning new copies of each service as needed. I cannot imagine scaling an operating system out by spinning multiple copies of the same service, and then providing some sort way to spread load across the various copies. Would you have some sort of anycast IPC? An internal DNS server or load balancer?

You can have an OS that natively participates in a larger microservices-based architecture, but what would microservices within the operating system look like, precisely?

Maybe my recent studies in philosophy make me much more attuned to the way we use language in the network engineering world—or maybe I’m just getting old. Whatever it is, our determination to make every word mean everything is driving me nuts.

What is the difference between a router and a switch? There used to be a simple definition—routers rewrite the L2 header and switches don’t. But now that routers switch packets, and switches route packets, the only difference seems to be … buffer depth? Feature set? The line between router and switch is fuzzy to the point of being meaningless, leaving us with no real term to describe a real switch any longer (a device that doesn’t do routing).

What about software defined networks? We’ve been treated to software defined everything now, of course. And intent? I get the point of intent, but we’re already moving down the path of making the meaning so broad that it can even contain configuring the CLI on an old AGS+. And don’t get me started on artificial intelligence, which is often learned to describe something closer to machine learning. Of course machine learning is often used to describe things that are really nothing more than statistical inference.

Maybe it’s time for a general rebellion against the sloppy use of language in network engineering. Or maybe I’m just tilting at yet another windmill. Wake me up when we’ve gotten to the point that we can use any word interchangeably with any other word in the network engineering dictionary. I await the AI that routes packets by reading your mind (through intent) called a swouter… or something.