Archive for 2021
It Always Takes Too Long
It always takes longer to find a problem than it should. Moving through the troubleshooting process often feels like swimming in molasses—it’s never fast enough or far enough to get the application back up and running before that crucial deadline. The “swimming in molasses effect” doesn’t end when the problem is found out, either—repairing the problem requires juggling a thousand variables, most of which are unknown, combined with the wit and sagacity of a soothsayer to work with vendors, code releases, and unintended consequences.
It’s enough to make a network engineer want to find a mountain top and assume an all-knowing pose—even if they don’t know anything at all.
The problem of taking longer, though, applies in every area of computer networking. It takes too long for the packet to get there, it takes to long for the routing protocol to converge, it takes too long to support a new application or server. It takes so long to create and validate a network design change that the hardware, software and processes created are obsolete before they are used.
Why does it always take too long? A short story often told to me by my Grandfather—a farmer—might help.
One morning a farmer got early in the morning, determined to throw some hay down to the horses in the stable. While getting dressed, he noticed one of the buttons on his shirt was loose. “No time for that now,” he thought, “I’ll deal with it later.” Getting out to the barn, he climbed up the ladder to the loft, and picked up a pitchfork. When he drove the fork into the hay, the handle broke.
He sighed, took the broken pieces down the ladder, and headed over to his shed to replace the handle—but when he got there, he realized he didn’t have a new handle that would fit. Sighing again, he took the broken pieces to his old trusty truck and headed into town—arriving before the hardware store opened. “Well, I’m already here, might as well get some coffee,” he thought, so he headed to the diner. After a bit, he headed to the store to buy a handle—but just as he walked out past the door, the loose button caught on the handle, popping off.
It took a few minutes to search for the lost button, but he found it and headed over to the cleaners to have it sewn back on “real fast.” Well, he couldn’t wander around town in his undershirt, so he just stepped next door to the barber’s, where there were a few friendly games of checkers already in progress. He played a couple of games, then the barber came out to remind him that he needed a haircut (a thing barbers tend to do all the time for some reason), so he decided to have it done. “Might was well not waste the time in town now I’m here,” he thought.
The haircut finished, he went back to get his shirt, and realized it was just about lunch. Back to the diner again. Once he was done, he jumped in his truck and headed back to the farm. And then he realized—the horses were hungry, the hay hadn’t been pitched, and … his pitchfork was broken.
And this is why it always takes longer than it should to get anything done with a network. You take the call and listen to the customer talk about what the application is doing, which takes a half an hour. You then think about what might be wrong, perhaps kicking a few routers “just for good measure” before you start troubleshooting in earnest. You look for a piece of information you need to understand the problem, only to find the telemetry system doesn’t collect that data “yet”—so you either open a ticket (a process that takes a half an hour), or you “fix it now” (which takes several hours). Once you have that information, you form a theory, so you telnet into a network device to check on a few things… only to discover the device you’re looking at has the wrong version of code… This requires a maintenance window to fix, so you put in a request…
Once you even figure out what the problem is, you encounter a series of hurdles lined up in front of you. The code needs to be upgraded, but you have to contact the vendor to make certain the new code supports all the “stuff” you need. The configuration has to be changed, but you have to determine if the change will impact all the other applications on the network. You have to juggle a seemingly infinite number of unintended consequences in a complex maze of software and hardware and people and processes.
And you wonder, the entire time, why you just didn’t learn to code and become a backend developer, or perhaps a mountain-top guru.
So the next time you think it’s taking to long to fix the problem, or design a new addition to the network, or for the vendor to create that perfect bit of code, remember the farmer, and the button that left the horses hungry.
Hedge 93: Dinesh Dutt and Observability

We talk a lot of about telemetry in the networking world, but generally as a set of disconnected things we measure, rather than as an entire system. We also tend to think about what we can measure, rather than what is useful to measure. Dinesh Dutt argues we should be thinking about observability, and how to see the network as a system. Listen in as Dinesh, Tom, And Russ talk about observability, telemetry, and Dinesh’s open source network observability project.
Leveraging Similarities
We tend to think every technology and every product is roughly unique—so we tend to stay up late at night looking at packet captures and learning how to configure each product individually, and chasing new ones as if they are the brightest new idea (or, in marketing terms, the best thing since sliced bread). Reality check: they aren’t. This applies across life, of course, but especially to technology. From a recent article—
RFC1925 rule 11 states—
Rule 11 isn’t just a funny saying—rule 11 is your friend. If want to learn new things quickly, learn rule 11 first. A basic understanding of the theory of networking will carry across all products, all marketing campaigns, and all protocols.
Hedge 92: The IETF isn’t the Standards Police
In most areas of life, where the are standards, there is some kind of enforcing agency. For instance, there are water standards, and there is a water department that enforces these standards. There are electrical standards, and there is an entire infrastructure of organizations that make certain the fewest number of people are electrocuted as possible each year. What about Internet standards? Most people are surprised when they realize there is no such thing as a “standards police” in the Internet.
Listen in as George Michaelson, Evyonne Sharp, Tom Ammon, and Russ White discuss the reality of standards enforcement in the Internet ecosystem.
Whatever it is, you need more (RFC1925 rule 9)
There is never enough. Whatever you name in the world of networking, there is simply not enough. There are not enough ports. There is not enough speed. There is not enough bandwidth. Many times, the problem of “not enough” manifests itself as “too much”—there is too much buffering and there are too many packets being dropped. Not so long ago, the Internet community decided there were not enough IP addresses and decided to expand the address space from 32 bits in IPv4 to 128 bits in IPv6. The IPv6 address space is almost unimaginably huge—2 to the 128th power is about 340 trillion, trillion, trillion addresses. That is enough to provide addresses to stacks of 10 billion computers blanketing the entire Earth. Even a single subnet of this space is enough to provide addresses for a full data center where hundreds of virtual machines are being created every minute; each /64 (the default allocation size for an IPv6 address) contains 4 billion IPv4 address spaces.
But… what if the current IPv6 address space simply is not enough? Engineers working in the IETF have created two different solutions over the years for just this eventuality. In 1994 RFC1606 provided a “letter from the future” describing the eventual deployment of IPv9, which was (in this eventual future) coming to the end of its useful lifetime because the Internet was running out of numbers. In RFC1606, it is noted that IPv9’s 49 levels of hierarchy had proven popular, but not all the levels had found a use. The highest level in use seems to be level 39, which was being used to address individual subatomic particles. Part of the dwindling address space considered in RFC1606 was the default allocation of about 1 billion addresses to each household. As the number of homes built increased globally, the IPv9 address space came under increasing pressure. The allocation of groups of addresses to recyclable items was not helpful, either, regardless of the ability to multicast “all cardboard items” in a recycling bin.
An alternate proposal, written many years later, is RFC8135, which considers complex addressing in IPv6. RFC8135 begins by describing the different ways in which a set of numbers, such as the 128-bit space in the IPv6 address, can be represented, including integers, prime, and composite. Each of these are considered in some detail, but eventually rejected for various reasons. For instance, integer (or fixed point) addresses are rejected because the location of a host is not fixed, so fixed point addresses are a poor representation of the host. Prime addresses are likewise rejected because they take too long to compute, and composite addresses are rejected because they are too difficult to differentiate from prime addresses.
RFC8135 proposes a completely different way of looking at the 128-bits available in the IPv6 address space. Rather than treating IPv6’s address space as a simple integer, this specification advocates for treating it as a floating number. This allows for a much larger space, particularly as aggregation can be indicated through scientific notation. The main problem the authors note with this proposal is users may believe that when they assign a floating address to their device, the device itself thereby becomes waterproof and floating. The authors advice users count on a waterproofing app, available in most app stores, for this function, rather than counting on the floating address. The authors also note duct tape can be used to permanently attach a floating address to a fixed device, if needed.
The danger, of course, is that in the quest for “more,” network designers, network operators, and protocol designers could end up embracing the ridiculous. It all brings to mind the point Andrew Tanenbaum made in a standard work on Networking, Computer Networks. Tanenbaum calculates the bandwidth of a station wagon full of magnetic tape (specifically VHS format) backups. After considering the amount of time it would take to drive the station wagon across the continental United States, he concludes the vehicle has more bandwidth than any link available at that time. A similar calculation could be made with a mid-sized shipping box available from any overnight package carrier, filled with SSD drives (or similar). The conclusion, according to Dr. Tanenbaum, is networks are a sop to human impatience.
As there is no bound to human impatience, no matter how much you have, as RFC1925 says, you will always need more.
Hedge 91: Leslie Daigle and IP Addresses Acting Badly
What if you could connect a lot of devices to the Internet—without any kind of firewall or other protection—and observe attackers trying to find their way “in?” What might you learn from such an exercise? One thing you might learn is a lot of attacks seem to originate from within a relatively small group of IP addresses—IP addresses acing badly. Listen in as Leslie Daigle of Thinking Cat and the Techsequences podcast, Tom Ammon, and Russ White discuss just such an experiment and its results.
Free Speech is More than Words
A couple of weeks ago, I joined Leslie Daigle and Alexa Reid on Techsequences to talk about free speech and the physical platform—does the right to free speech include the right to build and operate physical facilities like printing presses and web hosting? I argue it does. Listen in if you want to hear my argument, and how this relates to situations such as the “takedown” of Parler.
