This last week I was talking to someone at a small startup that intends to eliminate all the complex routing from campus networks. In the past, when reading blog posts about Kubernetes, I’ve read about how it was designed to eliminate routing protocols because “routing protocols are so complex.”
Color me skeptical.
It always takes longer to find a problem than it should. Moving through the troubleshooting process often feels like swimming in molasses—it’s never fast enough or far enough to get the application back up and running before that crucial deadline. The “swimming in molasses effect” doesn’t end when the problem is found out, either—repairing the problem requires juggling a thousand variables, most of which are unknown, combined with the wit and sagacity of a soothsayer to work with vendors, code releases, and unintended consequences.
We tend to think every technology and every product is roughly unique—so we tend to stay up late at night looking at packet captures and learning how to configure each product individually, and chasing new ones as if they are the brightest new idea (or, in marketing terms, the best thing since sliced bread). Reality check: they aren’t. This applies across life, of course, but especially to technology.
There is never enough. Whatever you name in the world of networking, there is simply not enough. There are not enough ports. There is not enough speed. There is not enough bandwidth. Many times, the problem of “not enough” manifests itself as “too much”—there is too much buffering and there are too many packets being dropped. Not so long ago, the Internet community decided there were not enough IP addresses and decided to expand the address space from 32 bits in IPv4 to 128 bits in IPv6.
While reading a research paper on address spoofing from 2019, I ran into this on NAT (really PAT) failures—
In the first failure mode, the NAT simply forwards the packets with the spoofed source address (the victim) intact … In the second failure mode, the NAT rewrites the source address to the NAT’s publicly routable address, and forwards the packet to the amplifier. When the server replies, the NAT system does the inverse translation of the source address, expecting to deliver the packet to an internal system. However, because the mapping is between two routable addresses external to the NAT, the packet is routed by the NAT towards the victim.
What is the first thing almost every training course in routing protocols begin with? Building adjacencies. What is considered the “deep stuff” in routing protocols? Knowing packet formats and processes down to the bit level. What is considered the place where the rubber meets the road? How to configure the protocol.
I’m not trying to cast aspersions at widely available training, but I sense we have this all wrong—and this is a sense I’ve had ever since my first book was released in 1999. It’s always hard for me to put my finger on why I consider this way of thinking about network engineering less-than-optimal, or why we approach training this way.
It’s not unusual in the life of a network engineer to go entire weeks, perhaps even months, without “getting anything done.” This might seem odd for those who do not work in and around the odd combination of layer 1, layer 3, layer 7, and layer 9 problems network engineers must span and understand, but it’s normal for those in the field. For instance, a simple request to support a new application might require the implementation of some feature, which in turn requires upgrading several thousand devices, leading to the discovery that some number of these devices simply do not support the new software version, requiring a purchase order and change management plan to be put in place to replace those devices, which results in … The chain of dominoes, once it begins, never seems to end.
Fear sells. Fear of missing out, fear of being an imposter, fear of crime, fear of injury, fear of sickness … we can all think of times when people we know (or worse, a people in the throes of madness of crowds) have made really bad decisions because they were afraid of something. Bruce Schneier has documented this a number of times. For instance: “it’s smart politics to exaggerate terrorist threats” and “fear makes people deferential, docile, and distrustful, and both politicians and marketers have learned to take advantage of this.”
I cannot count the number of times I’ve heard someone ask these two questions—
- What are other people doing?
- What is the best common practice?
While these questions have always bothered me, I could never really put my finger on why. I ran across a journal article recently that helped me understand a bit better. The root of the problem is this—what does best common mean, and how can following the best common produce a set of actions you can be confident will solve your problem?
Last week I began discussing why AS Path Prepend doesn’t always affect traffic the way we think it will. Two other observations from the research paper I’m working off of were:
- Adding two prepends will move more traffic than adding a single prepend
- It’s not possible to move traffic incrementally by prepending; when it works, prepending will end up moving most of the traffic from one inbound path to another
A slightly more complex network will help explain these two observations.