Agglutinating Problems Considered Harmful (RFC2915, Rule 5)

In the networking world, many equate simplicity with the fewest number of moving parts. According to this line of thinking, if there are 100 routers, 10 firewalls, 3 control planes, and 4 management systems in a network, then reducing the number of routers to 95, the number of firewalls to 8, the number of control planes to 1, and the number of management systems to 3 would make the system “much simpler.” Disregarding the reduction in the number of management systems, scientifically proven to always increase in number, it does seem that reducing the number of physical devices, protocols in use, etc., would tend to decrease the complexity of the network.

The wise engineers of the IETF, however, has a word of warning in this area that all network engineers should heed. According to RFC1925, rule 5: “It is always possible to agglutinate multiple separate problems into a single complex interdependent solution. In most cases this is a bad idea.” When “conventional wisdom” and the wisdom of engineers with the kind of experience and background as those who write IETF documents contradict one another, it is worth taking a deeper look.

A good place to begin is with other RFCs that might provide examples, or otherwise shed light on this situation. Two of particular interest are RFC1776 and RFC3093.

RFC1776 describes a very simplified transport protocol for use in the Internet and private networks. In normal packet formats there are many different components, such as a header and data sections. The header is normally made up of many different fields, such as the source address, the destination address, the quality of service, etc. The data section of the packet may also be divided into many different fields providing information for such functionality as error detection, flow control, and indicators of which application on the destination host this information is destined to (the port number is an example).

The authors of RFC1776 decided that the wisdom of making a single appliance which provides many services, the firewall being the classic example, and the wisdom of using a single protocol for everything, for instance using BGP for data center fabrics and interdomain connectivity, should be applied fully to the formatting of transport packets. In the spirit of agglutination common to all network engineering, RFC1776 recommends replacing the entire contents of a transport packet with a single address. The address must be a bit longer, of course, to carry the actual data, but using a single large field is inherently simpler than using many different fields. To accomplish this task, RFC1776 specifies a packet with 1696 octets (bytes) of address space. The number of octets originally selected is compatible with ATM, an older technology which uses a 53-octet cell but should also be compatible with all modern transport systems.

While the many advantages of this system are not fully described in the specification, it should be obvious packets containing a single field—the destination address—will be easier to hosts to generate and transmit, and easier for hosts to receive and process. The entire processing of the packet will just be transferring the address field directly into memory for consumption by any application running on the host that desires to consume it. The specification does note, however, that security is much simpler because there is no “user data” to secure.

RFC3093, a more recent example of agglutination in order to simplify network design and operation. This authors of this RFC note that applications are already moving to using a single port, 80, for all traffic, as most firewalls already pass traffic transmitted through this port without restrictions. The authors note the operation of the Internet would be much simpler if all applications ran over port 80. In this way, all applications could pass through firewalls without modification, while the firewalls themselves remain perfectly operational, fulfilling their intended purpose. Implementing this specification would also simplify the absolute mess of port and protocol numbers used in transporting data today, agglutinating them all down to a single port. As less is always simpler, this would create a simpler, easier to manage, global Internet.
The lessons to learn, after examining the options, may not be what was originally intended. Reducing the number of parts does not necessarily reduce the complexity of the overall system. If you haven’t found the tradeoffs, you haven’t looked hard enough.

The EXPERIENCE HAS SHOWN THAT Keyword (RFC1925, Rule 4)

The world of information technology is filled, often to overflowing, with those who “know better.” For instance, I was recently reading an introduction to networking in a very popular orchestration system that began with the declaration that routing was hard, and therefore this system avoided routing. The document then went on to describe a system of moving packets around using multiple levels of Network Address Translation (NAT) and centrally configured policy-based routing (or filter-based forwarding) that was clearly simpler than the distributed protocols used to run large-scale networks. I thought, for a moment, of writing the author and pointing out the system in question had merely reinvented routing in a rather inefficient and probably broken way, but I relented. Why? Because I know RFC1925, rule 4, by heart:

Some things in life can never be fully appreciated nor understood unless experienced firsthand. Some things in networking can never be fully understood by someone who neither builds commercial networking equipment nor runs an operational network.

Ultimately, the people who built this system will likely not listen to me; rather, they are going to have to experience the pain caused by large-scale failures for themselves before they will listen. Many network operators do wish for some way to get their experience across to users and application developers, however; one suggestion which has been made in the past is adding subliminal messages to the TELNET protocol. According to RFC1097, adding this new message type would allow operators to gently encourage users to upgrade the software they are using by displaying a message on-screen which the user’s mind can process, but is not consciously aware of reading. The uses of such a protocol extension, however, would be wide-ranging, such as informing application developers that the network is not cheap, and packets are not carried instantaneously from one host to another.

A further suggestion made in this direction is to find ways to more fully document operational experience in Internet Standards produced by the Internet Engineering Task Force (IETF). Currently, the standards for writing such standards (sometimes mistakenly called meta-standards, although these standards about standards are standards in their own right) only include a few keywords which authors of protocol standards can use to guide developers into creating well-developed implementations. For instance, according to RFC2119, a protocol designer can use the term MUST (note the uppercase, which means it must be shouted when reading the standard out loud) to indicate something an implementation must do. If the implementation does not do what it MUST, subliminal messaging (as described above) will be used to discourage the use of that implementation.

MUST NOT, SHOULD, SHOULD NOT, MAY, and MAY NOT are the remaining keywords defined by the IETF for use in standards. While these options do cover a number of situations, they do not express the full range of options available based on operational experience. RFC6919 proposed extensions to these keywords to allow a fuller range of intent which could be useful to express experience.
For instance, RFC6919 adds the keyword MUST (BUT WE KNOW YOU WON’T) to express operational frustration for those times when even subliminal messaging will not convince a user or application developer to create an implementation that will gracefully scale. The POSSIBLE keyword is also included to indicate what is possible in the real world, and the REALLY SHOULD NOT is included for those times when an application developer or user asks for the network operator to launch pigs into flight.

Of course, the keywords described in RFC6919 may, at some point, be extended further to include such keywords as EXPERIENCE HAS SHOWN THAT and THAT WILL NOT SCALE, but for now protocol developers and operators are still somewhat restricted in their ability to fully express the experience of operating large-scale networks.

Even with these additional keywords and the use of subliminal messaging, improper implementations will still slip out into the wild, of course.

And what about network operators who are just beginning to learn their craft, or have long experience but somehow still make mistakes in their deployments? Some have suggested in the past—particularly those who work in technical assistance centers—that all network devices be shipped according to the puzzle-box protocol.

All network devices should be shipped in puzzle boxes such that only those with an appropriate level of knowledge, experience, and intelligence can open the box and hence install the equipment. Some might argue the Command Line Interface (CLI) currently supplied with most networking equipment is the equivalent of a puzzle box, but given the state of most networks, it seems shipping network equipment with a complex and difficult-to-use CLI has not been fully effective.

The Dangers of Flying Pigs (RFC1925, rule 3)

There are many times in networking history, and in the day-to-day operation of a network, when an engineer has been asked to do what seems to be impossible. Maybe installing a circuit faster than a speeding bullet or flying over tall buildings to make it to a remote site faster than any known form of conveyance short of a transporter beam (which, contrary to what you might see in the movies, has not yet been invented).

One particular impossible assignment in the early days of network engineering was the common request to replicate the creation of the works of Shakespeare making use of the infinite number of monkeys (obviously) connected to the Internet. The creation of appropriate groups of monkeys, the herding of these groups, and the management of their output were once considered a nearly impossible task, similar to finding a token dropped on the floor or lost in the ether.

This problem proved so intractable that the IETF finally created an entire suite of management tools for managing the infinite monkeys used for these experiments, which is described in RFC2795. This RFC describes the Infinite Monkey Protocol Suite (IMPS), which runs on top of the Internet Protocol, the Infinite Threshold Accounting Gadget (I-TAG), and the KEEPER specification, which provides a series of messages to manage the infinite monkeys. The problem raised a number of problems about the construction of the experiment, such as whether the compilation of works should take place on a letter-by-letter or word-by-word basis. Ultimately, the problem was apparently solved through the creation of infinite monkey simulators, such as this one.

For those situations, such as assembling and managing an infinite suite of monkeys gathered for test, when a network engineer is asked to perform something which is apparently impossible, the first thing that is required is a lot of hot, caffeinated beverage. And there is no better way to make such beverages than through a hypertext-controlled hot beverage device. This device is so important, in fact, that the IETF described the interface and protocols for it fairly early, in RFC2324. While having a hypertext control interface to such devices is important, sometimes the making of caffeinated beverages should be automated; an interface which can be used for automation is described in RFC2325. If the engineer prefers some form of caffeine other than coffee, the procedures in RFC7168 should be followed.

Another common problem posed to network engineers is to make pigs fly. While it has often been reported that pigs cannot, in fact, fly, those who report this are apparently not well acquainted with engineers who have been given large amounts of a hot, caffeinated beverage. In fact, that which is probable, and yet impossible, is often more likely to occur than that which is possible, and yet improbable, once a network engineer has been given enough of this kind of beverage.

There is a danger, however, with attempting to perform the possible, no matter how good the intentions or plan. As RFC1925 states in rule 3: “With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead.” Network engineers plus hot caffeinated beverages may just achieve the impossible.

Or you might end up with a pig on your head. It’s hard to tell, so be careful what you ask for.

RFC1925 Rule 2

According to RFC1925, the second fundamental truth of networking is: No matter how hard you push and no matter what the priority, you can’t increase the speed of light.

However early in the world of network engineering this problem was first observed (see, for instance, Tanenbaum’s “station wagon example” in Computer Networks), human impatience is forever trying to overcome the limitations of the physical world, and push more data down the pipe than mother nature intended (or Shannon’s theory allows).

One attempt at solving this problem is the description of an infinitely fat pipe (helpfully called an “infan(t)”) described in RFC5984. While packets would still need to be clocked onto such a network, incurring serialization delay, the ability to clock an infinite number of packets onto the network at the same moment in time would represent a massive gain in a network’s ability, potentially reaching speeds faster than the speed of light. The authors of RFC5984 describe several attempts to build such a network, including black fiber, on which the lack of light implies data transmission. This is problematic, however, because a lack of information can be interpreted differently depending on the context. A pregnant pause has far different meaning than a shocked pause, for instance, or just a plain pause.

The team experimenting with faster than light communication also tried locking netcats up in boxes, but this seemed to work and not work at the same time. Finally, the researchers settled on ESP based forwarding, in which two people with a telepathic link transmit data over long distances. They compute the delay of such communication at around 350ms, regardless of the distance involved. This is clearly a potential faster than speed-of-light communication medium.

Another plausible option for building infinitely fat pipes is to broadcast everything. If you could reach an entire region in some way at once, it might be possible to build a full mesh of hosts, each transmitting to every other host in the region at the same time, ultimately constituting an infinitely fat pipe. Such a system is described in RFC6217, which describes the transmission of broadcast packets across entire regions using air as a medium. This kind of work is a logical extension of the stretched Ethernet segments often used between widely separated data centers and campuses, only using a more easily accessed medium (the air). The authors of this RFC note the many efficiencies gained from using broadcast only transmission modes, such as not needing destination addresses, the TCP three-way handshake process, and acknowledgements (which reportedly consume an inordinate amount of bandwidth).

Foreseeing the time when faster than speed-of-light networking would be possible, R. Hinden wrote a document detailing some of the design considerations for such networks which was published as RFC6921. This document is primarily concerned with the ability of the TCP three-way handshake to support an environment where the network’s speed of transmission is so much faster than the speed at which packets are processed or clocked onto the network that an acknowledgement is received before the original packet is transmitted. R. Hinden suggests that it might be possible to use packet drops in normal networks to emulate this behavior, and find some way to solve it in case faster than speed-of-light networks become generally available—such as the ESP network described in RFC5984.

More recent, and realistic, work in faster than speed-of-light networking has been undertaken by the proposed Quantum Networking Research Group in the IRTF. You can read the proposed architecture for a quantum Internet here.

It Has to Work

From time immemorial, humor has served to capture truth. This is no different in the world of computer networks. A notable example of using humor to capture truth is the April 1 RFC series published by the IETF. RFC1925, The Twelve Networking Truths, will serve as our guide.

According to RFC1925, the first fundamental truth of networking is: it has to work. While this might seem to be overly simplistic, it has proven—over the years—to be much more difficult to implement in real life than it looks like in a slide deck. Those with extensive experience with failures, however, can often make a better guess at what is possible to make work than those without such experience. The good news, however, is the experience of failure can be shared, especially through self-deprecating humor.

Consider RFC748, which is the first April First RFC published by the IETF, the TELNET RANDOMLY-LOSE Option. This RFC describes a set of additional signals in the TELNET protocol (for those too young to remember, TELNET is what people used to communicate with hosts before SSH and web browsers!) that instruct the server not to provide random losses through such things as “system crashes, lost data, incorrectly functioning programs, etc., as part of their services.” The RFC notes that many systems apparently have undocumented features that provide such losses, frustrating users and system administrators. The option proposed would instruct the server to disable features which cause these random losses.

Lesson learned? Although one of the general rules of application design is the network is not reliable, the counter rule suggested by RFC748 is the application is not reliable, either. This a key point in the race to Mean Time to Innocence (MTTI). RFC1882, published a few years after RFC748, is a veritable guidebook for finding problems in a network, including transceiver failures, databases with broken b-trees, unterminated contacts, and a plethora of other places to look.  Published just before Christmas, RFC1882 is an ideal guide for those who want to spend time with their families during the most festive times of the year.

Another common problem in large-scale networks is services that want to choose to operate from the safety and security of an anonymous connection. RFC6593 describes the Doman Pseudonym System, specifically designed to support services that do not wish to be discovered. The specification describes two parties to the protocol, the first being the seeker, or “it,” and the second being the service which is attempting to hide from it. The process used is for the seeker to send a transmission declaring the beginning of the search sequence called the “ready or not,” followed by a countdown during which “it” is not allowed to peek at a list of available services. During this countdown, the service may change its name or location, although it will be penalized if discovered doing so. This Domain Pseudonym System is the perfect counterpart to the Domain Name System normally used to discover services on large-scale networks, as shown by the many networks that already deploy such a hide-and-seek method to managing services.

What if all the above guidance for network operators fails, and you are stuck troubleshooting a problem? RFC2321 has an answer to this problem: RITA — The Reliable Internetwork Troubleshooting Agent. The typical RITA is described as 51.25cm in length, and yellow/orange in color. The first test the operator can perform with the RITA is placing it on the documentation for the suspect system, or on top of the suspect system itself. If the RITA eventually flies away, there is a greater than 90% chance there is a defect in the system tested. The odds of the defects in the tested system being the root cause of the problem the operator is currently troubleshooting is not guaranteed, however. The RITA has such a high success rate because it is believed that 100% of systems in operation do, in fact, contain defects. The 10% failure rate primarily occurs in cases where the RITA itself dies during the test, or decides to go to sleep rather than flying to some other location.

Each of these methods can help the network operator fulfill the first rule of networking: it has to work.

On the ‘Net: RFC1925 Rule 6A

The truth is, however, that while protocol designers may talk about these things, and network designers study them, very few networks today are built using any of these models. What is often used instead is what might be called the Infinitely Layered Functional Indirection (ILFI) model of network engineering. In this model, nothing is solved at a particular layer of the network if it can be moved to another layer, whether successfully or not.