The EXPERIENCE HAS SHOWN THAT Keyword (RFC1925, Rule 4)

The world of information technology is filled, often to overflowing, with those who “know better.” For instance, I was recently reading an introduction to networking in a very popular orchestration system that began with the declaration that routing was hard, and therefore this system avoided routing. The document then went on to describe a system of moving packets around using multiple levels of Network Address Translation (NAT) and centrally configured policy-based routing (or filter-based forwarding) that was clearly simpler than the distributed protocols used to run large-scale networks. I thought, for a moment, of writing the author and pointing out the system in question had merely reinvented routing in a rather inefficient and probably broken way, but I relented. Why? Because I know RFC1925, rule 4, by heart:

Some things in life can never be fully appreciated nor understood unless experienced firsthand. Some things in networking can never be fully understood by someone who neither builds commercial networking equipment nor runs an operational network.

Ultimately, the people who built this system will likely not listen to me; rather, they are going to have to experience the pain caused by large-scale failures for themselves before they will listen. Many network operators do wish for some way to get their experience across to users and application developers, however; one suggestion which has been made in the past is adding subliminal messages to the TELNET protocol. According to RFC1097, adding this new message type would allow operators to gently encourage users to upgrade the software they are using by displaying a message on-screen which the user’s mind can process, but is not consciously aware of reading. The uses of such a protocol extension, however, would be wide-ranging, such as informing application developers that the network is not cheap, and packets are not carried instantaneously from one host to another.

A further suggestion made in this direction is to find ways to more fully document operational experience in Internet Standards produced by the Internet Engineering Task Force (IETF). Currently, the standards for writing such standards (sometimes mistakenly called meta-standards, although these standards about standards are standards in their own right) only include a few keywords which authors of protocol standards can use to guide developers into creating well-developed implementations. For instance, according to RFC2119, a protocol designer can use the term MUST (note the uppercase, which means it must be shouted when reading the standard out loud) to indicate something an implementation must do. If the implementation does not do what it MUST, subliminal messaging (as described above) will be used to discourage the use of that implementation.

MUST NOT, SHOULD, SHOULD NOT, MAY, and MAY NOT are the remaining keywords defined by the IETF for use in standards. While these options do cover a number of situations, they do not express the full range of options available based on operational experience. RFC6919 proposed extensions to these keywords to allow a fuller range of intent which could be useful to express experience.
For instance, RFC6919 adds the keyword MUST (BUT WE KNOW YOU WON’T) to express operational frustration for those times when even subliminal messaging will not convince a user or application developer to create an implementation that will gracefully scale. The POSSIBLE keyword is also included to indicate what is possible in the real world, and the REALLY SHOULD NOT is included for those times when an application developer or user asks for the network operator to launch pigs into flight.

Of course, the keywords described in RFC6919 may, at some point, be extended further to include such keywords as EXPERIENCE HAS SHOWN THAT and THAT WILL NOT SCALE, but for now protocol developers and operators are still somewhat restricted in their ability to fully express the experience of operating large-scale networks.

Even with these additional keywords and the use of subliminal messaging, improper implementations will still slip out into the wild, of course.

And what about network operators who are just beginning to learn their craft, or have long experience but somehow still make mistakes in their deployments? Some have suggested in the past—particularly those who work in technical assistance centers—that all network devices be shipped according to the puzzle-box protocol.

All network devices should be shipped in puzzle boxes such that only those with an appropriate level of knowledge, experience, and intelligence can open the box and hence install the equipment. Some might argue the Command Line Interface (CLI) currently supplied with most networking equipment is the equivalent of a puzzle box, but given the state of most networks, it seems shipping network equipment with a complex and difficult-to-use CLI has not been fully effective.

Innovation Myths

Innovation has gained a sort-of mystical aura in our world. Move fast and break stuff. We recognize and lionize innovators in just about every way possible. The result is a general attitude of innovate or die—if you cannot innovate, then you will not progress in your career or life. Maybe it’s time to take a step back and bust some of the innovation myths created by this near idolization of innovation.

You can’t innovate where you are. Reality: innovation is not tied to a particular place and time. “But I work for an enterprise that only uses vendor gear… Maybe if I worked for a vendor, or was deeply involved in open source…” Innovation isn’t just about building new products! You can innovate by designing a simpler network that meets business needs, or by working with your vendor on testing a potential new product. Ninety percent of innovation is just paying attention to problems, along with a sense of what is “too complex,” or where things might be easier.

You don’t work in open source or open standards? That’s not your company’s problem, that’s your problem. Get involved. It’s not just about protocols, anyway. What about certifications, training, and the many other areas of life in information technology? Just because you’re in IT doesn’t mean you have to only invent new technologies.

Innovation must be pursued—it doesn’t “just happen.” We often tell ourselves stories about innovation that imply it “is the kind of thing we can accomplish with a structured, linear process.” The truth is the process of innovation is unpredictable and messy. Why, then, do we tell innovation stories that sound so purposeful and linear?

That’s the nature of good storytelling. It takes a naturally scattered collection of moments and puts them neatly into a beginning, middle, and end. It smoothes out all the rough patches and makes a result seem inevitable from the start, despite whatever moments of uncertainty, panic, even despair we experienced along the way.

Innovation just happens. Either the inspiration just strikes, or it doesn’t, right? You’re just walking along one day and some really innovative idea just jumps out at you. You’re struck by lightning, as it were. This is the opposite of the previous myth, and just as wrong in the other direction.

Innovation requires patience. According to Keith’s Law, any externally obvious improvement in a product is really the result of a large number of smaller changes hidden within the abstraction of the system itself. Innovation is a series of discoveries over months and even years. Innovations are gradual, incremental, and collective—over time.

Innovation often involves combining existing components. If you don’t know what’s already in the field (and usefully adjacent fields), you won’t be able to innovate. Innovation, then, requires a lot of knowledge across a number of subject areas. You have to work to learn to innovate—you can’t fake this.

Innovation often involves a group of people, rather than lone actors. We often emphasize lone actors, but they rarely work alone. To innovate, you have to inteniontally embed yourself in a community with a history of innovation, or build such a community yourself.

Innovation must take place in an environment where failure is seen as a good thing (at least your were trying) rather than a bad one.

Innovative ideas don’t need to be sold. Really? Then let’s look at Qiubi, which “failed after only 7 months of operation and after having received $2 billion in backing from big industry players.” The idea might have been good, but it didn’t catch on. The idea that you can “build a better mousetrap” and “the world will beat a path to your door,” just isn’t true, and it never has been.

The bootom line is…Innovation does require a lot of hard work. You have to prepare your mind, learn to look for problems that can be solved in novel ways, be inquisitive enough to ask why, and if there is a better way, stubborn enough to keep trying, and confident enough to sell your innovation to others. But you can innovate where you are—to believe otherwise is a myth.

The Dangers of Flying Pigs (RFC1925, rule 3)

There are many times in networking history, and in the day-to-day operation of a network, when an engineer has been asked to do what seems to be impossible. Maybe installing a circuit faster than a speeding bullet or flying over tall buildings to make it to a remote site faster than any known form of conveyance short of a transporter beam (which, contrary to what you might see in the movies, has not yet been invented).

One particular impossible assignment in the early days of network engineering was the common request to replicate the creation of the works of Shakespeare making use of the infinite number of monkeys (obviously) connected to the Internet. The creation of appropriate groups of monkeys, the herding of these groups, and the management of their output were once considered a nearly impossible task, similar to finding a token dropped on the floor or lost in the ether.

This problem proved so intractable that the IETF finally created an entire suite of management tools for managing the infinite monkeys used for these experiments, which is described in RFC2795. This RFC describes the Infinite Monkey Protocol Suite (IMPS), which runs on top of the Internet Protocol, the Infinite Threshold Accounting Gadget (I-TAG), and the KEEPER specification, which provides a series of messages to manage the infinite monkeys. The problem raised a number of problems about the construction of the experiment, such as whether the compilation of works should take place on a letter-by-letter or word-by-word basis. Ultimately, the problem was apparently solved through the creation of infinite monkey simulators, such as this one.

For those situations, such as assembling and managing an infinite suite of monkeys gathered for test, when a network engineer is asked to perform something which is apparently impossible, the first thing that is required is a lot of hot, caffeinated beverage. And there is no better way to make such beverages than through a hypertext-controlled hot beverage device. This device is so important, in fact, that the IETF described the interface and protocols for it fairly early, in RFC2324. While having a hypertext control interface to such devices is important, sometimes the making of caffeinated beverages should be automated; an interface which can be used for automation is described in RFC2325. If the engineer prefers some form of caffeine other than coffee, the procedures in RFC7168 should be followed.

Another common problem posed to network engineers is to make pigs fly. While it has often been reported that pigs cannot, in fact, fly, those who report this are apparently not well acquainted with engineers who have been given large amounts of a hot, caffeinated beverage. In fact, that which is probable, and yet impossible, is often more likely to occur than that which is possible, and yet improbable, once a network engineer has been given enough of this kind of beverage.

There is a danger, however, with attempting to perform the possible, no matter how good the intentions or plan. As RFC1925 states in rule 3: “With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead.” Network engineers plus hot caffeinated beverages may just achieve the impossible.

Or you might end up with a pig on your head. It’s hard to tell, so be careful what you ask for.

You Cannot Increase the Speed of Light (RFC1925 Rule 2)

According to RFC1925, the second fundamental truth of networking is: No matter how hard you push and no matter what the priority, you can’t increase the speed of light.

However early in the world of network engineering this problem was first observed (see, for instance, Tanenbaum’s “station wagon example” in Computer Networks), human impatience is forever trying to overcome the limitations of the physical world, and push more data down the pipe than mother nature intended (or Shannon’s theory allows).

One attempt at solving this problem is the description of an infinitely fat pipe (helpfully called an “infan(t)”) described in RFC5984. While packets would still need to be clocked onto such a network, incurring serialization delay, the ability to clock an infinite number of packets onto the network at the same moment in time would represent a massive gain in a network’s ability, potentially reaching speeds faster than the speed of light. The authors of RFC5984 describe several attempts to build such a network, including black fiber, on which the lack of light implies data transmission. This is problematic, however, because a lack of information can be interpreted differently depending on the context. A pregnant pause has far different meaning than a shocked pause, for instance, or just a plain pause.

The team experimenting with faster than light communication also tried locking netcats up in boxes, but this seemed to work and not work at the same time. Finally, the researchers settled on ESP based forwarding, in which two people with a telepathic link transmit data over long distances. They compute the delay of such communication at around 350ms, regardless of the distance involved. This is clearly a potential faster than speed-of-light communication medium.

Another plausible option for building infinitely fat pipes is to broadcast everything. If you could reach an entire region in some way at once, it might be possible to build a full mesh of hosts, each transmitting to every other host in the region at the same time, ultimately constituting an infinitely fat pipe. Such a system is described in RFC6217, which describes the transmission of broadcast packets across entire regions using air as a medium. This kind of work is a logical extension of the stretched Ethernet segments often used between widely separated data centers and campuses, only using a more easily accessed medium (the air). The authors of this RFC note the many efficiencies gained from using broadcast only transmission modes, such as not needing destination addresses, the TCP three-way handshake process, and acknowledgements (which reportedly consume an inordinate amount of bandwidth).

Foreseeing the time when faster than speed-of-light networking would be possible, R. Hinden wrote a document detailing some of the design considerations for such networks which was published as RFC6921. This document is primarily concerned with the ability of the TCP three-way handshake to support an environment where the network’s speed of transmission is so much faster than the speed at which packets are processed or clocked onto the network that an acknowledgement is received before the original packet is transmitted. R. Hinden suggests that it might be possible to use packet drops in normal networks to emulate this behavior, and find some way to solve it in case faster than speed-of-light networks become generally available—such as the ESP network described in RFC5984.

More recent, and realistic, work in faster than speed-of-light networking has been undertaken by the proposed Quantum Networking Research Group in the IRTF. You can read the proposed architecture for a quantum Internet here.

Random Thoughts

This week is very busy for me, so rather than writing a single long, post, I’m throwing together some things that have been sitting in my pile to write about for a long while.

From Dalton Sweeny:

A physicist loses half the value of their physics knowledge in just four years whereas an English professor would take over 25 years to lose half the value of the knowledge they had at the beginning of their career. . . Software engineers with a traditional computer science background learn things that never expire with age: data structures, algorithms, compilers, distributed systems, etc. But most of us don’t work with these concepts directly. Abstractions and frameworks are built on top of these well studied ideas so we don’t have to get into the nitty-gritty details on the job (at least most of the time).

This is precisely the way network engineering is. There is value in the kinds of knowledge that expire, such as individual product lines, etc.—but the closer you are to the configuration, the more ephemeral the knowledge is. This is one of the entire points of rule 11 is your friend. Learn the foundational things that make learning the ephemeral things easier. There are only four problems (really) in moving data from one place to another. There are only around four solutions for each of those problems. Each of those solutions is bounded into a small set (again, about four for each) sub-solutions, or ways of implementing the solution, etc.

I’m going to spend some time talking about this in the Livelesson I’m currently recording, so watch this space for an announcement sometime early next year about publication.

From Ivan P:

What I’m pointing out in this rant is the reverse reasoning along the lines “vendor X is doing something, which confirms there’s a real market need for it”. I’ve been in IT too long, and seen how the startup/VC sausage is made, to believe that fairy tale… and even when it’s true, it doesn’t necessarily imply that you need whatever vendor X is selling.

There are two ways to look at this. Either vendors should lead the market in building solutions, or they should follow whatever the customer wants. From my perspective, one of the problems we have right now is everything is a massive mish-mash of these two things. The operator’s design team thinks of a neat way to do X, and then promises the account team a big check if its implemented. It doesn’t matter that X could be solved some other way that might be simpler, etc.—all that matters is the check. In this case, the vendor stops challenging the customer to build things better, and starts just acting like a commodity provider, rather than an innovative partner.

The interaction between the customer and the vendor needs to be more push-pull than it currently is—right now, it seems like either the operator simply dictates terms to the vendor, or the vendor pretty much dictates architecture to the operator. We need to find a middle ground. The vendor does need to have a solid solution architecture, but the architecture needs to be flexible, as well, and the blocks used to build that architecture need to be usable in ways not anticipated by the vendor’s design folks.

On the other hand, we need to stop chasing features. This isn’t just true of vendors, this is true of operators as well. You get feature lists because that’s what you ask for. Often, operators ask for feature lists because that’s the easiest thing to measure, or because they already have a completely screwed up design they are trying to brownfield around. The worst is—“we have this brownfield that we just can’t get rid of, so we want to build yet another overlay on top, which will make everything simpler.” After about the twentieth overlay a system crash becomes a matter of when rather than if.

It Has to Work (RFC1925, Rule 1)

From time immemorial, humor has served to capture truth. This is no different in the world of computer networks. A notable example of using humor to capture truth is the April 1 RFC series published by the IETF. RFC1925, The Twelve Networking Truths, will serve as our guide.

According to RFC1925, the first fundamental truth of networking is: it has to work. While this might seem to be overly simplistic, it has proven—over the years—to be much more difficult to implement in real life than it looks like in a slide deck. Those with extensive experience with failures, however, can often make a better guess at what is possible to make work than those without such experience. The good news, however, is the experience of failure can be shared, especially through self-deprecating humor.

Consider RFC748, which is the first April First RFC published by the IETF, the TELNET RANDOMLY-LOSE Option. This RFC describes a set of additional signals in the TELNET protocol (for those too young to remember, TELNET is what people used to communicate with hosts before SSH and web browsers!) that instruct the server not to provide random losses through such things as “system crashes, lost data, incorrectly functioning programs, etc., as part of their services.” The RFC notes that many systems apparently have undocumented features that provide such losses, frustrating users and system administrators. The option proposed would instruct the server to disable features which cause these random losses.

Lesson learned? Although one of the general rules of application design is the network is not reliable, the counter rule suggested by RFC748 is the application is not reliable, either. This a key point in the race to Mean Time to Innocence (MTTI). RFC1882, published a few years after RFC748, is a veritable guidebook for finding problems in a network, including transceiver failures, databases with broken b-trees, unterminated contacts, and a plethora of other places to look.  Published just before Christmas, RFC1882 is an ideal guide for those who want to spend time with their families during the most festive times of the year.

Another common problem in large-scale networks is services that want to choose to operate from the safety and security of an anonymous connection. RFC6593 describes the Doman Pseudonym System, specifically designed to support services that do not wish to be discovered. The specification describes two parties to the protocol, the first being the seeker, or “it,” and the second being the service which is attempting to hide from it. The process used is for the seeker to send a transmission declaring the beginning of the search sequence called the “ready or not,” followed by a countdown during which “it” is not allowed to peek at a list of available services. During this countdown, the service may change its name or location, although it will be penalized if discovered doing so. This Domain Pseudonym System is the perfect counterpart to the Domain Name System normally used to discover services on large-scale networks, as shown by the many networks that already deploy such a hide-and-seek method to managing services.

What if all the above guidance for network operators fails, and you are stuck troubleshooting a problem? RFC2321 has an answer to this problem: RITA — The Reliable Internetwork Troubleshooting Agent. The typical RITA is described as 51.25cm in length, and yellow/orange in color. The first test the operator can perform with the RITA is placing it on the documentation for the suspect system, or on top of the suspect system itself. If the RITA eventually flies away, there is a greater than 90% chance there is a defect in the system tested. The odds of the defects in the tested system being the root cause of the problem the operator is currently troubleshooting is not guaranteed, however. The RITA has such a high success rate because it is believed that 100% of systems in operation do, in fact, contain defects. The 10% failure rate primarily occurs in cases where the RITA itself dies during the test, or decides to go to sleep rather than flying to some other location.

Each of these methods can help the network operator fulfill the first rule of networking: it has to work.

The Hedge 34: Andrew Alston and the IETF

Complaining about how slow the IETF is, or how single vendors dominate the standards process, is almost a by-game in the world of network engineering going back to the very beginning. It is one thing to complain; it is another to understand the structure of the problem and make practical suggestions about how to fix it. Join us at the Hedge as Andrew Alston, Tom Ammon, and Russ White reveal some of the issues, and brainstorm how to fix them.

download