Archive for 2021
Hedge 97: Low Context DevOps
Language is deeply contextual—one of my favorite sayings from the theological world is if you take the text out of its context, you are just left with the con. What does context have to do with development and operations, though? Can there be low and high context situations in the daily life of building and running systems? Thomas Limoncelli joins Tom Ammon and Russ White to discuss the idea of low context devops, and the larger issue of context in managing projects and teams, on this episode of the Hedge.
It always takes longer than you think

Everyone is aware that it always takes longer to find a problem in a network than it should. Moving through the troubleshooting process often feels like swimming in molasses—you’re pulling hard, and progress is being made, but never fast enough or far enough to get the application back up and running before that crucial deadline. The “swimming in molasses effect” doesn’t end when the problem is found out, either—repairing the problem requires juggling a thousand variables, most of which are unknown, combined with the wit and sagacity of a soothsayer to work with vendors, code releases, and unintended consequences.
It’s enough to make a network engineer want to find a mountain top and assume an all-knowing pose—even if they don’t know anything at all.
The problem of taking longer, though, applies in every area of computer networking. It takes too long for the packet to get there, it takes to long for the routing protocol to converge, it takes too long to support a new application or server. It takes so long to create and validate a network design change that the hardware, software and processes created are obsolete before they are used.
Why does it always take too long? A short story often told to me by my Grandfather—a farmer—might help.
One morning a farmer got early in the morning, determined to throw some hay down to the horses in the stable. While getting dressed, he noticed one of the buttons on his shirt was loose. “No time for that now,” he thought, “I’ll deal with it later.” Getting out to the barn, he climbed up the ladder to the loft, and picked up a pitchfork. When he drove the fork into the hay, the handle broke.
He sighed, took the broken pieces down the ladder, and headed over to his shed to replace the handle—but when he got there, he realized he didn’t have a new handle that would fit. Sighing again, he took the broken pieces to his old trusty truck and headed into town—arriving before the hardware store opened. “Well, I’m already here, might as well get some coffee,” he thought, so he headed to the diner. After a bit, he headed to the store to buy a handle—but just as he walked out past the door, the loose button caught on the handle, popping off.
It took a few minutes to search for the lost button, but he found it and headed over to the cleaners to have it sewn back on “real fast.” Well, he couldn’t wander around town in his undershirt, so he just stepped next door to the barber’s, where there were a few friendly games of checkers already in progress. He played a couple of games, then the barber came out to remind him that he needed a haircut (a thing barbers tend to do all the time for some reason), so he decided to have it done. “Might was well not waste the time in town now I’m here,” he thought.
The haircut finished, he went back to get his shirt, and realized it was just about lunch. Back to the diner again. Once he was done, he jumped in his truck and headed back to the farm. And then he realized—the horses were hungry, the hay hadn’t been pitched, and … his pitchfork was broken.
And this is why it always takes longer than it should to get anything done with a network. You take the call and listen to the customer talk about what the application is doing, which takes a half an hour. You then think about what might be wrong, perhaps kicking a few routers “just for good measure” before you start troubleshooting in earnest. You look for a piece of information you need to understand the problem, only to find the telemetry system doesn’t collect that data “yet”—so you either open a ticket (a process that takes a half an hour), or you “fix it now” (which takes several hours). Once you have that information, you form a theory, so you telnet into a network device to check on a few things… only to discover the device you’re looking at has the wrong version of code… This requires a maintenance window to fix, so you put in a request…
Once you even figure out what the problem is, you encounter a series of hurdles lined up in front of you. The code needs to be upgraded, but you have to contact the vendor to make certain the new code supports all the “stuff” you need. The configuration has to be changed, but you have to determine if the change will impact all the other applications on the network. You have to juggle a seemingly infinite number of unintended consequences in a complex maze of software and hardware and people and processes.
And you wonder, the entire time, why you just didn’t learn to code and become a backend developer, or perhaps a mountain-top guru.
So the next time you think it’s taking to long to fix the problem, or design a new addition to the network, or for the vendor to create that perfect bit of code, remember the farmer, and the button that left the horses hungry.
Hedge 96: Mark Nottingham and the Future of Standardization

It often seems like the IETF is losing steam—building standards, particularly as large cloud-scale companies a reducing their participation in standards bodies and deploying whatever works for them. Given these changes, what is the future of standards bodies like the IETF? Mark Nottingham joins Tom Ammon and Russ White in a broad-ranging discussion around this topic.
The Centralization of the Internet
My article on Internet centralization just published over at The Public Discourse—
Hedge 95: Mike Bushong and Agile
We’ve all been told agile is better … but as anyone who’s listened here long enough knows, if you haven’t found the tradeoffs, you haven’t looked hard enough. What is agile better for? Are there time when agile is better, and times when more traditional project management processes are better? Mike Bushong joins Tom Ammon, Eyvonne Sharp, and Russ White on this, the 95th episode of the Hedge, to discuss his experience with implementing agile, where it works, and where it doesn’t.
The Grass is Always Greener

This last week I was talking to someone at a small startup that intends to eliminate all the complex routing from campus networks. In the past, when reading blog posts about Kubernetes, I’ve read about how it was designed to eliminate routing protocols because “routing protocols are so complex.”
Color me skeptical.
There are two reasons for complexity in a design. The first is you’re solving a hard problem. The second is you’ve made bad design choices in the past, and you’re pasting complexity on top to solve some perceived problem (whether perceived or real).
The problem with all this talk about building something that’s “less complex” is people tend to see complexity of the first kind and think, “we can get rid of that complexity if we start over.” Failing to understand the past before building the future is a recipe for repeated failures of the same kind. Building a network without a distributed routing protocol hasn’t been tried before either, right? Well, yes, it has … We either forget how it turned out, or we say “well, that’s not the same thing I’m talking about here” (just like “real socialism hasn’t ever been tried”).
Even worse, they think they get rid of second and third kinds of complexity by starting over, or getting the humans out of the decision-making loop, or focusing on the data. Our modern penchant for relying “the data,” without ever thinking about the source of the data or how the data has been shaped and interpreted, is truly breathtaking.
They look over the horizon, see an unspoiled field, and think “the grass really is greener on the other side.”
Get rid of all those complex dynamic routing protocols … get rid of all those humans making decisions, so the decisions are “data driven” … and everything will be so much better.
Adding complexity to solve hard real-world problems is just the way things are, and they will always be, so the first reason for complexity will always be with us. People make mistakes, don’t see into the future perfectly, or just don’t have a perfect understanding of the system (technical debt), so the second kind of complexity will always be with us. You can’t “fix” people—God save us from those who think they can. The grass isn’t always greener—it just always looks that way.
What’s the practical upshot? Networks are always going to be complex. It’s just the nature of the problem being solved.
We add complexity because we fail to ask the right questions, we don’t understand the system, or we fail to do good design. The solution isn’t to seek out a greener field “out there,” but rather to make the field we currently live in greener by asking the right questions and reducing complexity through good design. Sometimes you might even need to start over with a new network … but when you start thinking about starting over with a newly designed set of protocols because the old ones are “too complex,” you need to ask how those old ones got that way, and how you’re going to stop the new ones from getting to the same place.
The grass is always greener because you looking at it through green-colored lenses just as the new grass is in its full flush, and before the weeds have had a chance to take over.
Learn how old things worked before you fall for some new “modern wonder” that’s going to solve every problem. The complexity in old things will show you where you can expect to find complexity grow up in new things.
Hedge 94: Josh Slater and Quantum Networking

If you’re like me, you’ve heard a lot of hype about quantum—but you’ve never really been able to understand what quantum networking might be useful for. On this episode of the Hedge, Josh Slater, who works in the field of quantum networking, Ethan Banks, and Russ White discuss the current state of quantum networking and potential use cases for the technology. Things are farther along than you might think.
