The Grass is Always Greener

This last week I was talking to someone at a small startup that intends to eliminate all the complex routing from campus networks. In the past, when reading blog posts about Kubernetes, I’ve read about how it was designed to eliminate routing protocols because “routing protocols are so complex.”

Color me skeptical.

There are two reasons for complexity in a design. The first is you’re solving a hard problem. The second is you’ve made bad design choices in the past, and you’re pasting complexity on top to solve some perceived problem (whether perceived or real).

The problem with all this talk about building something that’s “less complex” is people tend to see complexity of the first kind and think, “we can get rid of that complexity if we start over.” Failing to understand the past before building the future is a recipe for repeated failures of the same kind. Building a network without a distributed routing protocol hasn’t been tried before either, right? Well, yes, it has … We either forget how it turned out, or we say “well, that’s not the same thing I’m talking about here” (just like “real socialism hasn’t ever been tried”).

Even worse, they think they get rid of second and third kinds of complexity by starting over, or getting the humans out of the decision-making loop, or focusing on the data. Our modern penchant for relying “the data,” without ever thinking about the source of the data or how the data has been shaped and interpreted, is truly breathtaking.

They look over the horizon, see an unspoiled field, and think “the grass really is greener on the other side.”

Get rid of all those complex dynamic routing protocols … get rid of all those humans making decisions, so the decisions are “data driven” … and everything will be so much better.

Adding complexity to solve hard real-world problems is just the way things are, and they will always be, so the first reason for complexity will always be with us. People make mistakes, don’t see into the future perfectly, or just don’t have a perfect understanding of the system (technical debt), so the second kind of complexity will always be with us. You can’t “fix” people—God save us from those who think they can. The grass isn’t always greener—it just always looks that way.

What’s the practical upshot? Networks are always going to be complex. It’s just the nature of the problem being solved.

We add complexity because we fail to ask the right questions, we don’t understand the system, or we fail to do good design. The solution isn’t to seek out a greener field “out there,” but rather to make the field we currently live in greener by asking the right questions and reducing complexity through good design. Sometimes you might even need to start over with a new network … but when you start thinking about starting over with a newly designed set of protocols because the old ones are “too complex,” you need to ask how those old ones got that way, and how you’re going to stop the new ones from getting to the same place.

The grass is always greener because you looking at it through green-colored lenses just as the new grass is in its full flush, and before the weeds have had a chance to take over.

Learn how old things worked before you fall for some new “modern wonder” that’s going to solve every problem. The complexity in old things will show you where you can expect to find complexity grow up in new things.

Hedge 94: Josh Slater and Quantum Networking

If you’re like me, you’ve heard a lot of hype about quantum—but you’ve never really been able to understand what quantum networking might be useful for. On this episode of the Hedge, Josh Slater, who works in the field of quantum networking, Ethan Banks, and Russ White discuss the current state of quantum networking and potential use cases for the technology. Things are farther along than you might think.

download

It Always Takes Too Long

It always takes longer to find a problem than it should. Moving through the troubleshooting process often feels like swimming in molasses—it’s never fast enough or far enough to get the application back up and running before that crucial deadline. The “swimming in molasses effect” doesn’t end when the problem is found out, either—repairing the problem requires juggling a thousand variables, most of which are unknown, combined with the wit and sagacity of a soothsayer to work with vendors, code releases, and unintended consequences.

It’s enough to make a network engineer want to find a mountain top and assume an all-knowing pose—even if they don’t know anything at all.
The problem of taking longer, though, applies in every area of computer networking. It takes too long for the packet to get there, it takes to long for the routing protocol to converge, it takes too long to support a new application or server. It takes so long to create and validate a network design change that the hardware, software and processes created are obsolete before they are used.

Why does it always take too long? A short story often told to me by my Grandfather—a farmer—might help.

One morning a farmer got early in the morning, determined to throw some hay down to the horses in the stable. While getting dressed, he noticed one of the buttons on his shirt was loose. “No time for that now,” he thought, “I’ll deal with it later.” Getting out to the barn, he climbed up the ladder to the loft, and picked up a pitchfork. When he drove the fork into the hay, the handle broke.

He sighed, took the broken pieces down the ladder, and headed over to his shed to replace the handle—but when he got there, he realized he didn’t have a new handle that would fit. Sighing again, he took the broken pieces to his old trusty truck and headed into town—arriving before the hardware store opened. “Well, I’m already here, might as well get some coffee,” he thought, so he headed to the diner. After a bit, he headed to the store to buy a handle—but just as he walked out past the door, the loose button caught on the handle, popping off.
It took a few minutes to search for the lost button, but he found it and headed over to the cleaners to have it sewn back on “real fast.” Well, he couldn’t wander around town in his undershirt, so he just stepped next door to the barber’s, where there were a few friendly games of checkers already in progress. He played a couple of games, then the barber came out to remind him that he needed a haircut (a thing barbers tend to do all the time for some reason), so he decided to have it done. “Might was well not waste the time in town now I’m here,” he thought.

The haircut finished, he went back to get his shirt, and realized it was just about lunch. Back to the diner again. Once he was done, he jumped in his truck and headed back to the farm. And then he realized—the horses were hungry, the hay hadn’t been pitched, and … his pitchfork was broken.

And this is why it always takes longer than it should to get anything done with a network. You take the call and listen to the customer talk about what the application is doing, which takes a half an hour. You then think about what might be wrong, perhaps kicking a few routers “just for good measure” before you start troubleshooting in earnest. You look for a piece of information you need to understand the problem, only to find the telemetry system doesn’t collect that data “yet”—so you either open a ticket (a process that takes a half an hour), or you “fix it now” (which takes several hours). Once you have that information, you form a theory, so you telnet into a network device to check on a few things… only to discover the device you’re looking at has the wrong version of code… This requires a maintenance window to fix, so you put in a request…

Once you even figure out what the problem is, you encounter a series of hurdles lined up in front of you. The code needs to be upgraded, but you have to contact the vendor to make certain the new code supports all the “stuff” you need. The configuration has to be changed, but you have to determine if the change will impact all the other applications on the network. You have to juggle a seemingly infinite number of unintended consequences in a complex maze of software and hardware and people and processes.

And you wonder, the entire time, why you just didn’t learn to code and become a backend developer, or perhaps a mountain-top guru.

So the next time you think it’s taking to long to fix the problem, or design a new addition to the network, or for the vendor to create that perfect bit of code, remember the farmer, and the button that left the horses hungry.

Hedge 93: Dinesh Dutt and Observability

We talk a lot of about telemetry in the networking world, but generally as a set of disconnected things we measure, rather than as an entire system. We also tend to think about what we can measure, rather than what is useful to measure. Dinesh Dutt argues we should be thinking about observability, and how to see the network as a system. Listen in as Dinesh, Tom, And Russ talk about observability, telemetry, and Dinesh’s open source network observability project.

download

Leveraging Similarities

We tend to think every technology and every product is roughly unique—so we tend to stay up late at night looking at packet captures and learning how to configure each product individually, and chasing new ones as if they are the brightest new idea (or, in marketing terms, the best thing since sliced bread). Reality check: they aren’t. This applies across life, of course, but especially to technology. From a recent article—

Whenever I start learning a new programming language, I focus on defining variables, writing a statement, and evaluating expressions. Once I have a general understanding of those concepts, I can usually figure out the rest on my own. Most programming languages have some similarities, so once you know one programming language, learning the next one is a matter of figuring out the unique details and recognizing the differences.

RFC1925 rule 11 states—

Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works.

Rule 11 isn’t just a funny saying—rule 11 is your friend. If want to learn new things quickly, learn rule 11 first. A basic understanding of the theory of networking will carry across all products, all marketing campaigns, and all protocols.

Hedge 92: The IETF isn’t the Standards Police

In most areas of life, where the are standards, there is some kind of enforcing agency. For instance, there are water standards, and there is a water department that enforces these standards. There are electrical standards, and there is an entire infrastructure of organizations that make certain the fewest number of people are electrocuted as possible each year. What about Internet standards? Most people are surprised when they realize there is no such thing as a “standards police” in the Internet.

Listen in as George Michaelson, Evyonne Sharp, Tom Ammon, and Russ White discuss the reality of standards enforcement in the Internet ecosystem.

download

Whatever it is, you need more (RFC1925 rule 9)

There is never enough. Whatever you name in the world of networking, there is simply not enough. There are not enough ports. There is not enough speed. There is not enough bandwidth. Many times, the problem of “not enough” manifests itself as “too much”—there is too much buffering and there are too many packets being dropped. Not so long ago, the Internet community decided there were not enough IP addresses and decided to expand the address space from 32 bits in IPv4 to 128 bits in IPv6. The IPv6 address space is almost unimaginably huge—2 to the 128th power is about 340 trillion, trillion, trillion addresses. That is enough to provide addresses to stacks of 10 billion computers blanketing the entire Earth. Even a single subnet of this space is enough to provide addresses for a full data center where hundreds of virtual machines are being created every minute; each /64 (the default allocation size for an IPv6 address) contains 4 billion IPv4 address spaces.

But… what if the current IPv6 address space simply is not enough? Engineers working in the IETF have created two different solutions over the years for just this eventuality. In 1994 RFC1606 provided a “letter from the future” describing the eventual deployment of IPv9, which was (in this eventual future) coming to the end of its useful lifetime because the Internet was running out of numbers. In RFC1606, it is noted that IPv9’s 49 levels of hierarchy had proven popular, but not all the levels had found a use. The highest level in use seems to be level 39, which was being used to address individual subatomic particles. Part of the dwindling address space considered in RFC1606 was the default allocation of about 1 billion addresses to each household. As the number of homes built increased globally, the IPv9 address space came under increasing pressure. The allocation of groups of addresses to recyclable items was not helpful, either, regardless of the ability to multicast “all cardboard items” in a recycling bin.

An alternate proposal, written many years later, is RFC8135, which considers complex addressing in IPv6. RFC8135 begins by describing the different ways in which a set of numbers, such as the 128-bit space in the IPv6 address, can be represented, including integers, prime, and composite. Each of these are considered in some detail, but eventually rejected for various reasons. For instance, integer (or fixed point) addresses are rejected because the location of a host is not fixed, so fixed point addresses are a poor representation of the host. Prime addresses are likewise rejected because they take too long to compute, and composite addresses are rejected because they are too difficult to differentiate from prime addresses.

RFC8135 proposes a completely different way of looking at the 128-bits available in the IPv6 address space. Rather than treating IPv6’s address space as a simple integer, this specification advocates for treating it as a floating number. This allows for a much larger space, particularly as aggregation can be indicated through scientific notation. The main problem the authors note with this proposal is users may believe that when they assign a floating address to their device, the device itself thereby becomes waterproof and floating. The authors advice users count on a waterproofing app, available in most app stores, for this function, rather than counting on the floating address. The authors also note duct tape can be used to permanently attach a floating address to a fixed device, if needed.

The danger, of course, is that in the quest for “more,” network designers, network operators, and protocol designers could end up embracing the ridiculous. It all brings to mind the point Andrew Tanenbaum made in a standard work on Networking, Computer Networks. Tanenbaum calculates the bandwidth of a station wagon full of magnetic tape (specifically VHS format) backups. After considering the amount of time it would take to drive the station wagon across the continental United States, he concludes the vehicle has more bandwidth than any link available at that time. A similar calculation could be made with a mid-sized shipping box available from any overnight package carrier, filled with SSD drives (or similar). The conclusion, according to Dr. Tanenbaum, is networks are a sop to human impatience.

As there is no bound to human impatience, no matter how much you have, as RFC1925 says, you will always need more.