The world of information technology is filled, often to overflowing, with those who “know better.” For instance, I was recently reading an introduction to networking in a very popular orchestration system that began with the declaration that routing was hard, and therefore this system avoided routing. The document then went on to describe a system of moving packets around using multiple levels of Network Address Translation (NAT) and centrally configured policy-based routing (or filter-based forwarding) that was clearly simpler than the distributed protocols used to run large-scale networks. I thought, for a moment, of writing the author and pointing out the system in question had merely reinvented routing in a rather inefficient and probably broken way, but I relented. Why? Because I know RFC1925, rule 4, by heart:

Some things in life can never be fully appreciated nor understood unless experienced firsthand. Some things in networking can never be fully understood by someone who neither builds commercial networking equipment nor runs an operational network.

Ultimately, the people who built this system will likely not listen to me; rather, they are going to have to experience the pain caused by large-scale failures for themselves before they will listen. Many network operators do wish for some way to get their experience across to users and application developers, however; one suggestion which has been made in the past is adding subliminal messages to the TELNET protocol. According to RFC1097, adding this new message type would allow operators to gently encourage users to upgrade the software they are using by displaying a message on-screen which the user’s mind can process, but is not consciously aware of reading. The uses of such a protocol extension, however, would be wide-ranging, such as informing application developers that the network is not cheap, and packets are not carried instantaneously from one host to another.

A further suggestion made in this direction is to find ways to more fully document operational experience in Internet Standards produced by the Internet Engineering Task Force (IETF). Currently, the standards for writing such standards (sometimes mistakenly called meta-standards, although these standards about standards are standards in their own right) only include a few keywords which authors of protocol standards can use to guide developers into creating well-developed implementations. For instance, according to RFC2119, a protocol designer can use the term MUST (note the uppercase, which means it must be shouted when reading the standard out loud) to indicate something an implementation must do. If the implementation does not do what it MUST, subliminal messaging (as described above) will be used to discourage the use of that implementation.

MUST NOT, SHOULD, SHOULD NOT, MAY, and MAY NOT are the remaining keywords defined by the IETF for use in standards. While these options do cover a number of situations, they do not express the full range of options available based on operational experience. RFC6919 proposed extensions to these keywords to allow a fuller range of intent which could be useful to express experience.
For instance, RFC6919 adds the keyword MUST (BUT WE KNOW YOU WON’T) to express operational frustration for those times when even subliminal messaging will not convince a user or application developer to create an implementation that will gracefully scale. The POSSIBLE keyword is also included to indicate what is possible in the real world, and the REALLY SHOULD NOT is included for those times when an application developer or user asks for the network operator to launch pigs into flight.

Of course, the keywords described in RFC6919 may, at some point, be extended further to include such keywords as EXPERIENCE HAS SHOWN THAT and THAT WILL NOT SCALE, but for now protocol developers and operators are still somewhat restricted in their ability to fully express the experience of operating large-scale networks.

Even with these additional keywords and the use of subliminal messaging, improper implementations will still slip out into the wild, of course.

And what about network operators who are just beginning to learn their craft, or have long experience but somehow still make mistakes in their deployments? Some have suggested in the past—particularly those who work in technical assistance centers—that all network devices be shipped according to the puzzle-box protocol.

All network devices should be shipped in puzzle boxes such that only those with an appropriate level of knowledge, experience, and intelligence can open the box and hence install the equipment. Some might argue the Command Line Interface (CLI) currently supplied with most networking equipment is the equivalent of a puzzle box, but given the state of most networks, it seems shipping network equipment with a complex and difficult-to-use CLI has not been fully effective.

Innovation has gained a sort-of mystical aura in our world. Move fast and break stuff. We recognize and lionize innovators in just about every way possible. The result is a general attitude of innovate or die—if you cannot innovate, then you will not progress in your career or life. Maybe it’s time to take a step back and bust some of the innovation myths created by this near idolization of innovation.

You can’t innovate where you are. Reality: innovation is not tied to a particular place and time. “But I work for an enterprise that only uses vendor gear… Maybe if I worked for a vendor, or was deeply involved in open source…” Innovation isn’t just about building new products! You can innovate by designing a simpler network that meets business needs, or by working with your vendor on testing a potential new product. Ninety percent of innovation is just paying attention to problems, along with a sense of what is “too complex,” or where things might be easier.

You don’t work in open source or open standards? That’s not your company’s problem, that’s your problem. Get involved. It’s not just about protocols, anyway. What about certifications, training, and the many other areas of life in information technology? Just because you’re in IT doesn’t mean you have to only invent new technologies.

Innovation must be pursued—it doesn’t “just happen.” We often tell ourselves stories about innovation that imply it “is the kind of thing we can accomplish with a structured, linear process.” The truth is the process of innovation is unpredictable and messy. Why, then, do we tell innovation stories that sound so purposeful and linear?

That’s the nature of good storytelling. It takes a naturally scattered collection of moments and puts them neatly into a beginning, middle, and end. It smoothes out all the rough patches and makes a result seem inevitable from the start, despite whatever moments of uncertainty, panic, even despair we experienced along the way.

Innovation just happens. Either the inspiration just strikes, or it doesn’t, right? You’re just walking along one day and some really innovative idea just jumps out at you. You’re struck by lightning, as it were. This is the opposite of the previous myth, and just as wrong in the other direction.

Innovation requires patience. According to Keith’s Law, any externally obvious improvement in a product is really the result of a large number of smaller changes hidden within the abstraction of the system itself. Innovation is a series of discoveries over months and even years. Innovations are gradual, incremental, and collective—over time.

Innovation often involves combining existing components. If you don’t know what’s already in the field (and usefully adjacent fields), you won’t be able to innovate. Innovation, then, requires a lot of knowledge across a number of subject areas. You have to work to learn to innovate—you can’t fake this.

Innovation often involves a group of people, rather than lone actors. We often emphasize lone actors, but they rarely work alone. To innovate, you have to inteniontally embed yourself in a community with a history of innovation, or build such a community yourself.

Innovation must take place in an environment where failure is seen as a good thing (at least your were trying) rather than a bad one.

Innovative ideas don’t need to be sold. Really? Then let’s look at Qiubi, which “failed after only 7 months of operation and after having received $2 billion in backing from big industry players.” The idea might have been good, but it didn’t catch on. The idea that you can “build a better mousetrap” and “the world will beat a path to your door,” just isn’t true, and it never has been.

The bootom line is…Innovation does require a lot of hard work. You have to prepare your mind, learn to look for problems that can be solved in novel ways, be inquisitive enough to ask why, and if there is a better way, stubborn enough to keep trying, and confident enough to sell your innovation to others. But you can innovate where you are—to believe otherwise is a myth.

How do you become a “senior engineer?” It’s a question I’m asked quite often, actually, and one that deserves a better answer than the one I usually give. Charity recently answered the question in a round-a-bout way in a post discussing the “trap of the premature senior.” She’s responding to an email from someone who is considering leaving a job where they have worked themselves into a senior role. Her advice?


This might seem to be counter-intuitive, but it’s true. I really wanted to emphasize this one line—

There is a world of distance between being expert in this system and being an actual expert in your chosen craft. The second is seniority; the first is merely .. familiarity

Exactly! Knowing the CLI for one vendor’s gear, or even two vendor’s gear, is not nearly the same as understanding how BGP actually works. Quoting the layers in the OSI model is just not the same thing as being able to directly apply the RINA model to a real problem happening right now. You’re not going to gain the understanding of “the whole ball of wax” by staying in one place, or doing one thing, for the rest of your life.

If I have one piece of advice other than my standard two, which are read a lot (no, really, A LOT!!!!) and learn the fundamentals, it has to be do something else.

Charity says this is best done by changing jobs—but this is a lot harder in the networking world than it is in the coding world. There just aren’t as many network engineering jobs as there are coding jobs. So what can you do?

First, do make it a point to try to work for both vendors and operators throughout your career. These are different worlds—seriously. Second, even if you stay in the same place for a long time, try to move around within that company. For instance, I was a Cisco for sixteen years. During that time, I was in tech support, escalation, engineering, and finally sales (yes, sales). Since then, I’ve worked in a team primarily focused on research at an operator, in engineering at a different vendor, then in an operationally oriented team at a provider, then marketing, and now (technically) software product management. I’ve moved around a bit, to say the least, even though I’ve not been at a lot of different companies.

Even if you can’t move around a lot like this for whatever reason, always take advantage of opportunities to NOT be the smartest person in the room. Get involved in the IETF. Get involved in open source projects. Run a small conference. Teach at a local college. I know it’s easy to say “but this stuff doesn’t apply to the network I’m actually working on.” Yes, you’re right. And yet—that’s the point, isn’t it? You don’t expand your knowledge by only learning things that apply directly to the problem you need to solve right now.

Of course, if you’re not really interested in becoming a truly great network engineer, then you can just stay “senior” in a single place. But I’m guessing that if you’re reading this blog, you’re interested in becoming a truly great network engineer.

Pay attention to the difference between understanding things and just being familiar with them. The path to being great is always hard, it always involves learning, and it always involves a little risk.

One of the common myths of the networking world is there were no “real” networks before the early days of packet-based networks. As myths go, this is not even a very good myth; the world had very large-scale voice and data networks long before distributed routing, before packet-based switching, and before any of the packet protocols such as IP. I participated in replacing a large scale voice and data network, including hundreds of inverse multiplexers that tied a personnel system together in the middle of the 1980’s. I also installed hundreds of terminal emulation cards in Zenith Z100 and Z150 systems in the same time frame to allow these computers to connect to mainframes and newer minicomputers on the campus.

All of these systems were run through circuit-switched networks, which simply means the two end points would set up a circuit over which data would travel before the data actually traveled. Packet switched networks were seen as more efficient at the time because the complexity of setting these circuits up, along with the massive waste of bandwidth because the circuits were always over provisioned and underused.

The problem, at that time, with packet-based networks was the sheer overhead of switching packets. While frames of data could be switched in hardware, packets could not. Each packet could be a different length, and each packet carried an actual destination address, rather than some sort of circuit identifier—a tag. Packet switching, however, was quickly becoming the “go to” technology solution for a lot of problems because of its efficient use of network resources, and simplicity of operation.
Asynchronous Transfer Mode, or ATM, was widely seen as a compromise technology that would provide the best circuit and packet switching in a single technology. Data would be input into the network in the form of either packets or circuits. The data would then be broken up into fixed sized cells, which would then be switched based on a fixed label-based header. This would allow hardware to switch the cells in a way that is like circuit switching, while retaining many of the advantages of a circuit switched network. In fact, ATM allowed for both circuit- and packet-switched paths to be both be used in the same network.
With all this goodness under one technical roof, why didn’t ATM take off? The charts from the usual prognosticators showed markets that were forever “up and to the right.”

The main culprit in the demise of ATM turned out to be the size of the cell. In order to support a good combination of voice and data traffic, the cell size was set to 53 octets. A 48-octet packet, then, should take up a single cell with a little left over. Larger packets, in theory, should be able to be broken into multiple cells and carried over the network with some level of efficiency, as well. The promise of the future was ATM to the desktop, which would solve the cell size overhead problem, since applications would generate streams pre-divided into the correctly sized packets to use the fixed cell size efficiently.
The reality, however, was far different. The small cell size, combined with the large overhead of carrying both a network layer header, the ATM header, and the lower layer data link header, caused ATM to be massively inefficient. Some providers at the time had research showing that while they were filling upwards of 80% of any given link’s bandwidth, the goodput, the amount of data being transmitted over the link, was less than 40% of the available bandwidth. There were problems with out of order cells and reassembly to add on top of this, causing entire packets worth of data, spread across multiple cells, to be discarded. The end was clearly near when articles appeared in popular telecommunications journals comparing ATM to shredding the physical mail, attaching small headers to each resulting chad, and reassembling the original letters at the recipient’s end of the postal system. The same benefits were touted—being able to pack mail trucks more tightly, being able to carry a wider array of goods over a single service, etc.

In the end, ATM to the desktop never materialized, and the inefficiencies of ATM on long-haul links doomed the technology to extinction.

Lessons learned? First, do not count on supporting “a little inefficiency” while the ecosystem catches up to a big new idea. Either the system has immediate, measurable benefits, or it does not. If it does not, it is doomed from the first day of deployment. Second, do not try to solve all the problems in the world at once. Build simple, use it, and then build it better over time. While we all hate being beta testers, sometimes real-world beta testing is the only way to know what the world really wants or needs. Third, up-and-to-the-right charts are easy to justify and draw. They are beautiful and impressive on glossy magazine pages, and in flashy presentations. But they should always be considered carefully. The underlying technology, and how it matches real-world needs, are more important than any amount of forecasting and hype.

I remember a time long ago—but then again, everything seems like it was “long ago” to me—when I was flying out to see an operator in a financial district. Someone working with the account asked me what I normally wear… which is some sort of button down and black or grey pants in pretty much any situation. Well, I will put on a sport jacket if I’m teaching in some contexts, but still, the black/grey pants and some sort of button down are pretty much a “uniform” for me. The person working on the account asked me if I could please switch to ragged shorts, a t-shirt, and grow a pony tail because … the folks at the operator would never believe I was an engineer if I dressed to “formal.”

Now I’ve never thought of what I wear as “formal…” it’s just … what I wear. Context, however, is king.

In other situation, I saw a sales engineer go to a store and buy an entirely new outfit because he came to the company’s building wearing a suit and tie … The company in question deals in outdoor gear, and the location was in a small midwestern town, so a suit, well, let’s just say it didn’t “fit.” Again, context is king.

We have—particularly in tech—become accustomed to casual dress codes. Coders, network engineers, and many others just wear whatever whenever. We are expected, in fact, to dress beyond casual—even to be ratty, as in the stories above. Is this really a good thing, though?

As a famous fashion historian has said: “Americans began the 20th century in bustles and bowler hats and ended it in velour sweatsuits and flannel shirts.” I’ve never worn bustles, bowler hats, or velour sweatsuits (do they even make such a thing???), but the flannel shirts I totally relate to. Although… I do only buy flannels that don’t have large-scale checks of any kind. It’s hard to find plain or subtle patterns, but I actively search them out.

The point of this  aside about clothes is this—one of the problems with casual clothing is it doesn’t create the kinds of limits, or lines, in your life that more formal clothing creates. If you wear the same clothes all the time, then you have the same mindset all the time. You’re always working, or you’re always playing—whatever it is.

According to Tara Lookabaugh, this is a major cause of stress in our lives. In the “old days,” salesmen wore a three-piece suit at work, then came home to a smoking jacket or perhaps jeans, or something similar. This was not only a matter of preventing dangerous chemicals from entering the home (think of a factory worker), it also helped create a mental barrier between work and home life.

A barrier that no longer exists.

Does removing this barrier increase stress? I think it does. Can it be replaced with some other kind of barrier? This is one reason I have an office in my house—or at least a clearly set-apart apace dedicated to working on the computer. It’s not that I won’t pick my laptop up and go outside on a beautiful day, or even down to the library at the Seminary just for a change of scenery. When I’m home, however, when I cross the “work-space threshold,” I somehow think about work things.

So I’ve attempted to replace the barrier between work and the rest of my world created by clothing with a physical “work-space barrier.” Does this work all the time? Not really, but it’s at least a start in the right direction. Of course, I also only have one computer, which is probably another place where I could create a barrier between work and “other things.”

But creating such a barrier is important for mental health, regardless of how you do it.

What do you think? Do you think “casual dress” has created more stress than it has resolved? Is Tara’s argument right? What other methods are there out there to create this sort of barrier?

What, really, is “technical debt?” It’s tempting to say “anything legacy,” but then why do we need a new phrase to describe “legacy stuff?” Even the prejudice against legacy stuff isn’t all that rational when you think about it. Something that’s old might also just be well-tested, or well-worn but still serviceable. Let’s try another tack.

Technical debt, in the software world, can be defined as working on a piece of software for long periods of time by only adding features, and never refactoring or reorganizing the code to meet current conditions. The general idea is that as new features are added on top of the old, two things happen. First, the old stuff becomes a sort of opaque box that no-one understands. Second, the stuff being added to the old increasingly relies on public behavior that might be subject to unintended consequences or leaky abstractions.

To resolve this problem in the software world, software is “refactored.” In refactoring, every use of a public API is examined, including what information is being drawn out, or what the expected inputs and outputs are. The old code is then “discarded,” in a sense, and a new underlying function written that meets the requirements discovered in the existing codebase. The refactoring process allows the calling functions and called functions, the clients and the servers, to “move together.” As the overall understanding of the system changes, the system itself can change with that understanding. This brings the implementation closer to the understanding of the current engineering team.

So technical debt is really the mismatch between the understanding of the current engineering team imposed onto the implementation of a prior engineering team with a different understanding of the requirements (even if the older engineering team is the same people at a different time).

How can we apply this to networking? In the networking world, we actively work against refactoring by future proofing.

Future proofing: building a network that will never need to be replaced. Or, perhaps, never needing to go to your manager and say “I need to buy new hardware,” because the hardware you bought last week does not have the functionality you need any longer. This sounds great in theory, but the problem with theory is it often does not hold up when the glove hits the nose (every has a plan until they are punched in the face). The problem with future proofing is you can’t see the punch that’s coming until the future actually throws it.

If something is future proofed, it either cannot be refactored, or there will be massive resistence to the refactoring process.

Maybe its time we tried to change our view of things so we can refactor networks. What would refactoring a network look like? Maybe examining all the configurations within a particular module, figuring what they do, and then trying to figure out what application or requirement led to that particular bit of configuration. Or, one step higher, looking at every protocol or “feature” (whatever a feature might be), figuring out what purpose it might serve, and then intentionally setting about finding some other way, perhaps a simpler way, to provide that same service.

One key point in this process is to begin by refusing to look at the network as a set of appliances, and starting to see it as a system made up of hardware and software. Mentally disaggregating the software and hardware can allow you to see what can be changed and what cannot, and inject some flexibility into the network refactoring process.

When you refactor a bit of code, what you typically end up with a simpler piece of software that more closely matches the engineering team’s understanding of current requirements and conditions. Aren’t simplicity and coherence goals for operational networks, too?

If so, ehen was the last time you refactored your network?

There are many times in networking history, and in the day-to-day operation of a network, when an engineer has been asked to do what seems to be impossible. Maybe installing a circuit faster than a speeding bullet or flying over tall buildings to make it to a remote site faster than any known form of conveyance short of a transporter beam (which, contrary to what you might see in the movies, has not yet been invented).

One particular impossible assignment in the early days of network engineering was the common request to replicate the creation of the works of Shakespeare making use of the infinite number of monkeys (obviously) connected to the Internet. The creation of appropriate groups of monkeys, the herding of these groups, and the management of their output were once considered a nearly impossible task, similar to finding a token dropped on the floor or lost in the ether.

This problem proved so intractable that the IETF finally created an entire suite of management tools for managing the infinite monkeys used for these experiments, which is described in RFC2795. This RFC describes the Infinite Monkey Protocol Suite (IMPS), which runs on top of the Internet Protocol, the Infinite Threshold Accounting Gadget (I-TAG), and the KEEPER specification, which provides a series of messages to manage the infinite monkeys. The problem raised a number of problems about the construction of the experiment, such as whether the compilation of works should take place on a letter-by-letter or word-by-word basis. Ultimately, the problem was apparently solved through the creation of infinite monkey simulators, such as this one.

For those situations, such as assembling and managing an infinite suite of monkeys gathered for test, when a network engineer is asked to perform something which is apparently impossible, the first thing that is required is a lot of hot, caffeinated beverage. And there is no better way to make such beverages than through a hypertext-controlled hot beverage device. This device is so important, in fact, that the IETF described the interface and protocols for it fairly early, in RFC2324. While having a hypertext control interface to such devices is important, sometimes the making of caffeinated beverages should be automated; an interface which can be used for automation is described in RFC2325. If the engineer prefers some form of caffeine other than coffee, the procedures in RFC7168 should be followed.

Another common problem posed to network engineers is to make pigs fly. While it has often been reported that pigs cannot, in fact, fly, those who report this are apparently not well acquainted with engineers who have been given large amounts of a hot, caffeinated beverage. In fact, that which is probable, and yet impossible, is often more likely to occur than that which is possible, and yet improbable, once a network engineer has been given enough of this kind of beverage.

There is a danger, however, with attempting to perform the possible, no matter how good the intentions or plan. As RFC1925 states in rule 3: “With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead.” Network engineers plus hot caffeinated beverages may just achieve the impossible.

Or you might end up with a pig on your head. It’s hard to tell, so be careful what you ask for.