WRITTEN – rule 11 reader

Fast Following Fails

Russ — Sat, 09 Aug 2025 11:24:31 +0000

Fast following fails.

Whenever I hear a leader in a technology business say, “We’re going to fast follow because it’s the most profitable place to be,” I know I’m looking at a failed organization. I didn’t come to this conclusion by thinking about it. I came to this conclusion by observing it repeatedly.

After observing it, however, I wanted to understand why this particular strategy fails so consistently and spectacularly. Why? To understand my theory, we need to start in a somewhat different place than business—we need to start with the nature of goals and humans.

You can place goals into two buckets: first things and second things.

First things are foundational. If you are a technology company, the first thing is building a stable, resilient, and flexible platform (or foundation). The products you sell will only be as stable as your platform. The innovation you achieve will only be as consistent as your platform is.

Second things are goals you can only achieve once you’ve built the first things.

Here’s the hard truth no one wants to hear: Generating revenue is a second thing.

Humans become what they do.

We all want to believe we can become what we desire—but we actually become what we do. In Aristotelian philosophy, this is called the virtue ethic. You become physically virtuous by exercising your body. You become intellectually virtuous by thinking about hard things.

Companies are the same way. A company can only become innovative by innovating. Innovating becomes a habit—or it doesn’t.

What does this have to do with fast following?

The theory of the “fast follower” is: “I’m going to let other people spend money on research and development, I’m going to let them carry the burden of innovating and making all the mistakes, then I’m going to jump in and scoop up their innovation.”

This seems sound at first glance. It’s a compelling story.

It doesn’t work, however, because you are chasing another organization’s success without building their platform. You’ve placed a second thing—revenue generation—in first place, and first things—building a platform and innovating—in second place.

When you put building a platform and innovating on top of that platform in second place—when you “fast follow”—you lose the habit of building a solid platform and the habit of innovating.

Building a platform on which you can actually ship innovative products—no matter who invented them—and cultivating a mindset that seeks out good innovation creates a culture of innovation. When you build the mental habit of waiting until someone else’s innovation succeeds and then building “just enough platform to make it work here, too,” you are building an unstable platform and killing innovation.

“But what about all those fast-following success stories?”

One reason “fast following” success stories abound is that you can make a lot of money for a little while with the fast-following strategy. Another is that when an organization first moves to fast following, they have the leftover platform and innovation culture to carry them for a little while.

But time will out all fast following organizations. When the market shifts, fast followers will have neither the platform to shift with it nor the innovation to change with the market.

By putting second things first, the fast follower loses the first things that make the second thing possible.

“But I’ll make a lot of money until it fails, right? I don’t care about the future, just making a lot of money quickly!”

Sure, if that’s the life you want to lead, go for it. If you want to live a life devoid of community, and you want to lie on your deathbed and say, “I don’t care what damage I caused,” if sheer wealth is all that matters, feel free to fast follow.

If you want to build something, however, go build it.

Fast following gives up building platforms and innovating for immediate success, and winds up failing to innovate or succeed.

Architecture and Process

Russ — Fri, 12 Apr 2024 14:37:52 +0000

Driving through some rural areas east of where I live, I noticed a lot of collections of buildings strung together being used as homes. The process seems to start when someone takes a travel trailer, places it on blocks (a foundation of sorts) and builds a spacious deck just outside the door. Over time, the deck is covered, then screened, then walled, becoming a room.

Once the deck becomes a room, a new deck is built, and the process begins anew. At some point, the occupants decide they need a place to store some sort of equipment, so they build a shed. Later, the shed is connected to the deck, the whole thing becomes an extension of the living space, and a new shed is built.

These … interesting … places to live are homes to the people who live in them. They are often, I assume, even happy homes.

But they are not houses in the proper sense of the word. There is no unifying theme, no thought of how traffic should flow and how people should live. They are a lot like the paths crisscrossing a campus—built where the grass died.

Our networks are like these homes—they are not houses so much as historical records of every new idea and vendor marketing drive. There is no architecture, there are many architectures strung together with a set of tightly wound and closely followed processes.

We need to support some new application or service? Throw a new overlay on top. There was a massive failure last night? Let’s spend hours closely examining our process and find some way to prevent the failure by adding a few new steps.

We never ask if our goals are realistic because we don’t have any goal beyond: “Let’s solve this problem right now.” We never ask if there is some future goal might be better served by using this solution or that—the future will take care of itself.

Why do we fail to attend to architecture?

Architecture is hard, and we often fail to correctly anticipate the future. This perceives architecture as a detailed plan—but there’s no reason it should be. An architecture can be a rough, and slow-changing, outline of how the network is laid out, a set of services the network supports, and a set of technologies the network will use to support those services. An architecture recognizes and defines limits as well as capabilities.

Processes are comforting. When things fail, we can always take comfort in saying: “I followed the process!”

We live in a culture of now. All problems take two hours, two days, two weeks, or too long. There is no history, there is no future, there is only an ever-present now. If I cannot have it now, it is not worth having at all.

These problems are hard to solve because they are cultural rather than technical—and the network engineering world has a strong bias towards “don’t tell me how it works, tell me how to configure it.” We present this as a problem-solving mentality, even though it causes more problems than it solves.

We need to rebalance the way we think about architectures and processes—perhaps we would get better results by combining lightweight architectures with lightweight processes, instead of relying on heavy processes with no architecture to build maintainable networks and sustainable lives.

AI Assistants

Russ — Mon, 18 Mar 2024 14:13:04 +0000

I have written elsewhere about the danger of AI assistants leading to mediocrity. Humans tend to rely on authority figures rather strongly (see Obedience to Authority by Stanley Milgram as one example), and we often treat “the computer” as an authority figure.

The problem is, of course, Large Language Models—and AI of all kinds—are mostly pattern-matching machines or Chinese Rooms. A pattern-matching machine can be pretty effective at many interesting things, but it will always be, in essence, a summary of “what a lot of people think.” If you choose the right people to summarize, you might get close to the truth. Finding the right people to summarize, however, is beyond the powers of a pattern-matching machine.

Just because many “experts” say the same thing does not mean the thing is true, valid, or useful.

AI assistants can make people more productive, at least in terms of sheer output. Someone using an AI assistant will write more words per minute than someone who is not. Someone using an AI assistant will write more code daily than someone who is not.

But is it just more, or is it better?

Measuring the mediocratic effect of using AI systems, even as an assistant, is difficult. We have the example of drivers using a GPS, never really learning how to get anyplace (and probably losing all larger sense of geography), but these things are hard to measure.

However, a recent research paper on programming and security has shown at least one place where this effect can be measured. Noting that most kinds of social research are problematic (they are hard to replicate, it’s hard to infer valid results accurately, etc.), this one seems well set up and executed, so I’m inclined to put at least some trust in the results.

The researchers asked programmers worldwide to write software to perform six different tasks. They constructed a control group that did not use AI assistants and a test group that did.

The result? In almost every case, participants using the AI assistant wrote much less secure code, including mistakes in building encryption functions, creating a sandbox, allowing SQL injection attacks, local pointers, and integer overflows. Participants made about the same number of mistakes in randomness—a problem not many programmers have taken the time to study—and fewer mistakes in buffer overflows.

It is possible, of course, for companies to create programming-specific AI assistants that might resolve these problems. Domain-specific AI assistants will always be more accurate and useful than general-purpose assistants.

Relying on AI assistants improves productivity but also seems to create mediocre results. In many cases, mediocre results will be “good enough.”

But what about when “good enough” isn’t … good enough?

Humans are creatures of habit. We do what we practice. If you want to become a better coder, you need to practice coding—and remember that practice does not make perfect. Perfect practice makes perfect.

On Writing Complexity

Russ — Mon, 05 Feb 2024 22:57:06 +0000

I’ve been on a bit of a writer’s break after finishing the CCST book, but it’s time to rekindle my “thousand words a day” habit. As always, one part of this is thinking about how I write—is there anything I need to change? Tools, perhaps, or style?

What about the grade level complexity of my writing? I’ve never really paid attention to this, but I’m working on contributing to a site regularly that does. So maybe I should.

I tend to write to the tenth or eleventh-grade level, even when writing “popular material,” like blog posts. The recommended level is around the eighth-grade level. Is this something I need to change?

It seems the average person considers anything above the eighth-grade reading level “too hard” to read, so they give up. Every reading level calculation I’ve looked at essentially uses word and sentence length as proxies for complexity. Long words and sentences intimidate people.

On the other hand, measuring the reading grade level can seem futile. There are plenty of complex concepts described by one- and two-syllable words. Short sentences can still have lots of meaning.

Further, the reading grade level does not tell you if the sentence makes sense. A famous politician recently said, “… it’s time for us to do what we have been doing, and that time is every day.” The reading grade level of this sentence is in the sixth grade—but saying nothing is still saying nothing, even if you say it at a sixth-grade level.

While reading level complexity might be important, it is more important to say something.

Sometimes, using long words and sentences stops people from paying attention to your words. However, replacing long words and sentences with shorter ones sometimes removes your words’ real meaning (or at least flavor). I am not, at this point, certain how to balance these. I suspect I will have to consider the tradeoff in every situation.

When you write—and if you are doing your job as a network engineer well, you do write—you might want to consider the complexity of your writing. I will use the grade level as “another tool” in my set, which means I’ll be thinking about writing complexity more—but I’m not going to allow it to drive my writing style. If I can reduce the complexity of my writing without losing meaning, I may … sometimes … or I might not.

Looking at the other side of the coin—what about reading grade level from a reader’s point of view? Should we only read easy-to-read things? The answer should be obvious: no.

There is a bit of a feeling that text above a certain reading level is “sheer nonsense.” Again, though, the grade level has nothing to do with the value of the content. Sometimes, saying complex things just requires complex text. Readers (all of us) need to learn to read complex text.

Reading grade level is a good tool in many situations—but it is one tool among many.

Making Networking Cool Again? (2)

Russ — Tue, 16 Jan 2024 18:00:03 +0000

Network engineering is not “going away.” Network engineering is not less important than it was yesterday, last year, or even a decade ago.

But there still seems to be a gap somewhere. There are fewer folks interested than we need. We need more folks who want to work as full-time network engineers, and more folks with network engineering skills diffused within the larger IT community. More of both is the right answer if we’re going to continue building large-scale systems that work. The real lack of enthusiasm for learning network engineering is hurting all of IT, not just network engineering.

How do we bridge this gap? We’re engineers. We solve problems. This seems to be a problem we might be able to solve (unlike human nature). Let’s try to solve it.

As you might have guessed, I have some ideas. These are not the only ideas in the world—feel free to think up more!

If you walk into a robotics class, even an introductory robotics class, you see people … building robots. If you walk into a coding class, even an introductory one, you see people … writing software. If you walk into a network network engineering class you see … someone lecturing about the OSI model, packet formats, or how to configure BGP.

What problems are people learning to solve robotic engineering? How to build a robot and get it to do something to solve a real-world problem. What problems are people learning to code solving? How to tackle some real-world problem.

Sure, the problems being solved at an introductory level might be trivial, like: “Read this file and spit out a sum of the numbers in the fourth column.” But they are still starting, right from the beginning, by taking requirements and converting them into solutions.

What problems are network engineers learning how to solve? How to choose hardware, string it all together, and configure BGP.

Do you see the difference?

All engineers solve problems—it’s the nature of engineering. But are we creating a mindset in prospective network engineers, or even adjacent fields, that we solve real-world problems? Or are we giving them the impression that we solve whiteboard problems by talking about bits, bytes, configurations, and cable types?

Have you ever seen the glazing over of eyes while explaining how you put four transport protocols on top of one another (look at all the pretty tunnels)? How about when you create a chart showing how TCP and QUIC can be “kind-of sort-of” forced into the OSI model? Or when you spin out your BGP packet format charts, showing how we’ve (mis)used address families to carry everything anyone can imagine?

I’ve been teaching this stuff for years (okay, decades). Over time, I’ve moved away from teaching configurations and packet formats. I’ve gone from Advanced IP Network Design to Computer Networking Problems and Solutions. These are very different ways of looking at network engineering.

Focusing on real-world problems would help connect business and other IT folks to the network, connect theory to practice, and people to network engineering. Going home at the end of the day saying, “I solved a problem,” can be satisfying. Going home at the end of the day saying, “I configured BGP?”

Another thing adopting the mindset of solving real-world problems might do is help us lose unnecessary complexity. I know complexity is necessary to build resilient systems; we cannot build what we build without creating and encountering complexity.

But we often run ourselves into the ditch on both sides of the road.

We unintentionally build too complex because we try to make it too simple. Quick, which is simpler: building a data center fabric with one routing protocol or two? A single chassis system or several smaller fixed format devices? A proprietary system or something built on open standards?

How many balloons fit in a bag? (thanks, Don)

Failing to start with the tradeoffs, and thinking through what problem we’re actually trying to solve, leads to unnecessary complexity. Such designs might not immediately fail, but they will fail, and “it’s so complex” just isn’t an excuse.

Don’t even try to tell me there aren’t any tradeoffs. If you think there aren’t any tradeoffs, that just means you haven’t looked hard enough. Go find them, think about them, and document them.

We also build complex things because we think it offers job security, or it’s neat, or we like to feel like the kid who says to the world, “look what I built!”

I know it’s exciting to hear stories about that time someone rescued a network from a major failure—after all, that’s solving a real-world problem. Building a network that just works might be “boring,” but it solves many more real-world problems than raising a network from the dead.

We love our fashionable capes, but … capes can get caught in a nearby jet engine. Lose the cape. In the long run, it’ll make network engineering more attractive as a career field and field of knowledge.

The Bottom Line

No, the sky is not falling. We still need networks, and we still need network engineers.

Yes, there is a problem. Too many companies are going “to the cloud” because they cannot find people qualified to build and maintain their very complex networks. There’s too much centralization and too little oppeness.

So maybe let’s stop saying “we don’t need network engineers.” And maybe let’s really think about how we’re building things. And maybe let’s focus on solving real-world problems, starting from day one in network engineering classrooms.

Network engineering is still cool—let’s go out there believing—and selling—that idea to the world.

Making Networking Cool Again? (1)

Russ — Tue, 09 Jan 2024 18:00:31 +0000

Is network engineering still cool?

It certainly doesn’t seem like it, does it? College admissions seem to be down in the network engineering programs I know of, and networking certifications seem to be down, too. Maybe we’ve just passed the top of the curve, and computer networking skills are just going the way of coopering. Let’s see if we can sort out the nature of this malaise and possible solutions. Fair warning—this is going to take more than one post.

Let’s start here: It could be that computer networking is a solved problem, and we just don’t need network engineers any longer.

I’ve certainly heard people say these kinds of things—for instance, one rather well-known network engineer said, just a few years back, that network engineers would no longer be needed in five years. According to this view, the entire network should be like a car. You get in, turn the key, and it “just works.” There shouldn’t be any excitement or concern about a commodity like transporting packets. Another illustration I’ve heard used is “network bandwidth should just be like computer memory—if you need more, add it.”

Does this really hold, though? Even if we accept the car and computer memory illustrations and individual routers like these things, is an entire network system like a car? A closer analogy for a network in the world of cars would be an entire transportation system.

You have different kinds of physical transport (rail, over-the-road trucks, air travel, ships, etc.), each with its characteristics, and all of which must be connected to move physical objects from one place to another. There must be some kind of “control plane” that coordinates, shared addressing, formatting rules, etc.

While a single car might, in some sense, be a commodity at this point (and I’ll bet there aren’t many car owners who would wholly agree with that characterization), I don’t see how we could call an entire transportation system a commodity—especially if we want to say “the skills needed to build a transportation system just aren’t needed any longer, there’s nothing more to learn, this is so … boring …”

Let’s dispense with this idea that networks just aren’t needed any longer. We must still build networks that carry traffic between servers, cities, countries, and continents. Building these networks is still a hard problem. Even if there is less room to improve these things than ten or twenty years ago, the problems are still hard. Even if many problems are solved at a broad level, not every problem is solved in every network in the universe.

A more reasonable take on this perspective is that networking skills are diffusing into a larger information technology (IT) skill set. Perhaps IT, in its relative “youth,” divided too sharply and finely—we created too many career fields. What is happening right now, then, is just a kind of right sizing in the market.

Network engineering skills, in fact, do seem to be dispersing to one degree or another. But let’s put this in perspective.

The first point is I’m not convinced there are fewer network engineers. Instead, it’s more likely there are just as many network engineers as there ever have been, if not more. Perhaps, though, “real” network engineering has been growing linearly while all the other IT fields have been growing at a rate faster than linear (I don’t want to say exponential, just something more than linear).

In a world that counts lack of growth as a failure, networking growing at a slower pace than, say, programming seems like a failure from the outside. People like to follow winners; growing is winning; network engineering is not growing as fast as other things, so network engineering is failing.

I dislike the modern progressive mindset—but while I’m working on something in this area, this isn’t the time or place to dive into this topic. Let’s agree that we must let go of the idea: “Growing slower is a failure.”

Returning to the idea of transportation—I will just about bet automobile designers built entire departments in the early days of car manufacturing. Today, there might be just as many automobile designers as ever. They’re just buried in large car manufacturing, servicing, etc., companies, so it feels like there are a lot fewer than there were.

Just because most new engineers must learn many different things, and network engineering skills are diffusing into many different areas of IT, does not mean network engineering is dying, regardless of what it might look like from the outside.

Second, there is nothing wrong with network engineering skills diffusing into the larger IT skill set. Has anyone reading this ever really been a “pure” network engineer? If so, I don’t know whether to envy or feel sorry for you.

When building networks in the military, I had to deal with all the politics of customer relationships and understanding mission needs. When taking cases in technical support, I had to deal with time management and customer-facing skills—and I needed to learn or use coding skills to be an effective network engineer. Today, I do network engineering, like I always have, but I work on security, privacy, DNS, coding, and all sorts of other things.

I cannot think of a time in my career when I would have considered myself a “pure network engineer.” I’ve always had to find and build adjacent skills to design, build, and maintain networks. I would say this is truer today than ever, but I do not believe my skills as a network engineer are any less useful than they have ever been.

Where does all of this leave us?

Let’s continue the discussion in part 2 next week.

Simple or Complex?

Russ — Tue, 19 Sep 2023 19:00:22 +0000

A few weeks ago, Daniel posted a piece about using different underlay and overlay protocols in a data center fabric. He says:

There is nothing wrong with running BGP in the overlay but I oppose to the argument of it being simpler.

One of the major problems we often face in network engineering—and engineering more broadly—is confusing that which is simple with that which has lower complexity. Simpler things are not always less complex. Let me give you a few examples, all of which are going to be controversial.

When OSPF was first created, it was designed to be a simpler and more efficient form of IS-IS. Instead of using TLVs to encode data, OSPF used fixed-length fields. To process the contents of a TLV, you need to build a case/switch construction where each possible type a separate bit of code. You must count off the correct length for the type of data, or (worse) read a length field and count out where you are in the stream.

Fixed-length fields are just much easier to process. You build a structure matching the layout of the fixed-length fields in memory, then point this structure at the packet contents in-memory. From there, you can just use the structure’s contents to directly access the data.

Over time, however, as new requirements have been pushed into IGPs, OSPF has become much more complex while IS-IS has remained relatively constant (in terms of complexity). IS-IS went through a bit of a mess when transitioning from narrow to wide metrics, but otherwise the IS-IS we use today is the same protocol we used when I first started working on networks (back in the early 1990s).

OSPF’s simplicity, in the end, did not translate into a less complex protocol.

Another example is the way we transport data in BGP. A lot of people do not know that BGP’s original design allowed for carrying information other than straight reachability in the protocol. BGP speakers can negotiate multiple sessions, with each session carrying a different kind of information. Rather than using this mechanism, however, BGP has consistently been extended using address families—because it is simpler to create a new address family than it is to define a new kind of data parallel with address families.

This has resulted in AFs that are all over the place, magic numbers, and all sorts of complexity. The AF solution is simpler, but ultimately more complex.

Returning to Daniel’s example, running a single protocol for underlay and overlay is simpler, while running two different protocols is less simple. However, I’ve observed—many times—that running different protocols for underlay and overlay is less complex.

Why? Daniel mentions a couple of reasons, such as each protocol has a separate purpose, and we’re pushing features into BGP to make it serve the role of an IGP (which is, in the end, going to cause some major outages—if it hasn’t already).

Consider this: is it easier to troubleshoot infrastructure reachability separately from vrf reachability? The answer is obvious—yes! What about security? Is it easier to secure a fabric when the underlay never touches any attached workload? Again—yes!

We get this tradeoff wrong all the time. A lot of times this is because we are afraid of what we do not know. Ten years ago I struggled to convince large operators to run BGP in their networks. Today no-one runs anyone other than BGP—and they all say “but we don’t have anyone who knows OSPF or IS-IS.” I’ve no idea what happened to old-fashioned network engineering. Do people really only have one “protocol slot” in their brains? Can people really only ever learn one protocol?

Or maybe we’ve become so fixated on learning features that we no longer no protocols?

I don’t know the answer to these questions, but I will say this—over the years I’ve learned that simpler is not always less complex.

Route Servers and Loops

Russ — Tue, 16 Aug 2022 17:00:15 +0000

From the question pile: Route servers (as opposed to route reflectors) don’t change anything about a BGP route when re-advertising it to a peer, whether iBGP or eBGP. Why don’t route servers cause routing loops (or other problems) in a BGP network?

Route servers are often used by Internet Exchange Points (IXPs) to distribute routes between connected BGP speakers. BGP route servers

Don’t change anything about a received BGP route when advertising the route to its peers (other BGP speakers)
Don’t install routes received through BGP into the local routing table

Shouldn’t using route servers in a network—pontentially, at least—cause routing loops or other BGP routing issues? Maybe a practical example will help.

Assume b, e, and s are all route servers in their respective networks. Starting at the far left, a receives some route, 101::/64, and sends it on to b,, which then sends the unmodified route to c. When c receives traffic destined to 101::/64, what will happen? Regardless of whether these routers are running iBGP or eBGP, b will not change the next hop, so when c receives the route, a is still the next hop. If there’s no underlying routing protocol, c won’t know how to reach A, so it will ignore the route and drop the traffic. Even if there is an underlying routing protocol, c’s route to 101::/64’s route passes through b, and b isn’t installing any routing information learned from BGP into its local routing table (because it’s a route server). b is going to drop traffic destined to 101::/64.

We can solve this simple problem by adding a new link between the two clients of the route server, as shown in the center diagram. Here, d sends 101::/64 to e, which then sends the unchanged route to g. Since g has a direct connection to d, we can assume g will send traffic destined to 101::/64 directly to d, where it will be forwarded to the destination. Why wouldn’t d and g peer directly instead of counting on e to carry routes between them? In most cases this kind of indirect peering is done to increase network scale. If there are thousand routes like d and g, it will be simpler for them all to peer to e than to build a full mesh of connections.

Why not use a route reflector rather than a route server in this situation? Route reflectors can only be used to carry routes between iBGP peers. If d, e, and g are all in different autonomous systems, route reflectors cannot be used to solve this problem.

But this brings us back to the original question—route reflectors use the cluster list to prevent loops within an AS (the cluster list is similar in form and function to the AS path carried between autonomous systems, but it uses router ID’s rather than AS numbers to describe the path)?

If you have multiple route servers connected to one another you can, in fact, form routing loops.

In this network, a is sending 101::/64 to b, which is then sending the route, unmodified, to e. Because of some local policy, e is choosing the path through a, which means e forwards traffic destined to 101::/64 to c. At the same time, e is advertising 101::/64 to b, which is then sending the route (unmodified) to a, and a is choosing the path through c. In this case, a permanent (persistent) routing loop is formed through the control plane, primarily because no single BGP speaker has a complete view of the topology. The two route servers, by hiding the real path to 101::/64, makes is possible to form a routing loop.

The deploy route servers without forming these kinds of loops—

BGP speakers learning routes from route servers should be directly connected—there should not be destinations reachable via some “hidden” intermediate hop
Route servers should send all the routes they learn from clients; they should not use bestpath to choose which routes to send to clients

These restrictions prevent routing loops from forming when deploying route servers—but they also restrict the use of route servers to situations like carrying routes between BGP speakers connected to a single fabric.

Cisco filed a patent some time back describing a method to prevent routing loops when using BGP route servers; it makes interesting reading for folks who want to dive a little deeper.

RFC9199: Lessons in Large-scale Service Deployment

Russ — Mon, 08 Aug 2022 18:51:55 +0000

While RFC9199 (are we really in the 9000’s?) is targeted at large-scale DNS deployments–specifically root zone operators–so it might seem the average operator won’t find a lot of value here.

This is, however, far from the truth. Every lesson we’ve learned in deploying large-scale DNS root servers applies to any other large-scale user-facing service. Internally deployed DNS recursive servers are an obvious instance, but the lessons here might well apply to a scheduling, banking, or any other multi-user application accessed from a lot of places by a lot of different users. There are some unique points in DNS, such as the relatively slower pace of database synchronization across nodes, but the network-side lessons can still be useful for a lot of applications.

What are those lessons?

First, using anycast dramatically improves performance for these kinds of services. For those who aren’t familiar with the concept, anycase turns an IP address into a service identifier. Any host with a copy (or instance) or a given service advertises the same address, causing the routing table to choose the (topologically) closest instance of the service. If you’re using anycast, traffic destined to your service will automatically be forwarded to the closest server running the application, providing a kind of load sharing among multiple instances through routing. If there are instances in New York, California, France, and Taipei, traffic from users in North Carolina will be routed to New York and traffic from users in Singapore will be routed to Taipei.

You can think of an anycast address something like a cell tower; users within a certain desintance will be “captured” by a particular instance. The more copies of the service you deploy, the smaller the geographic region the service will support. Hence you can control the number of users using a particular copy of the service by controlling the number and location of service copies.

To understand where and how to deploy service instances, create anycast catchment maps. Again, just like a wifi signal coverage map, or a cellphone tower coverage map, it’s important to understand which users will be directed to which instances. Using a catchment map will help you decide where new instances need to be deployed, which instances need the fastest links and hardware, etc. The RIPE ATLAS pobes and looking glass servers are good ways to start building such a map. If the application supports a large number of users, you might be able to convince the application developer to include some sort of geographic information in requests to help build these maps.

Third, when deploying service instance, pay as much attention to routing and connectivity as you do the number of instances deployed. As the authors note, sometimes eight instances will provide the same level of service as several thousand instances. The connectivity available into each instance of the service–bandwidth, delay, availability, etc.–still has a huge impact on service speed.

Fourth, reduce the speed at which the database needs to be synchronized where possible. Not every piece of information needs to be synchronized at the same rate. The less data being synchronized, the more consistent the view from multiple users is going to be.

RFC9199 is well worth reading, even for the average network engineer.

Learning to Ride

Russ — Mon, 01 Aug 2022 17:00:06 +0000

Have you ever taught a kid to ride a bike? Kids always begin the process by shifting their focus from the handlebars to the pedals, trying to feel out how to keep the right amount of pressure on each pedal, control the handlebars, and keep moving … so they can stay balanced. During this initial learning phase, the kid will keep their eyes down, looking at the pedals, the handlebars, and . . . the ground.

After some time of riding, though, managing the pedals and handlebars are embedded in “muscle memory,” allowing them to get their head up and focus on where they’re going rather than on the mechanical process of riding. After a lot of experience, bike riders can start doing wheelies, or jumps, or off-road riding that goes far beyond basic balance.
Network engineer—any kind of engineering, really—is the same way.

At first, you need to focus on what you are doing. How is this configured? What specific output am I looking for in this show command? What field do I need to use in this data structure to automate that? Where do I look to find out about these fields, defects, etc.?

The problem is—it is easy to get stuck at this level, focusing on configurations, automation, and the “what” of things.

You’re not going to be able to get your head up and think about the longer term—the trail ahead, the end-point you’re trying to reach—until you commit these things to muscle memory.
The point, with technology, is learning to stop focusing on the pedals, the handlebars, and the ground, and start focusing on the goal—whether its nailing this jump or conquering this trail or making it there.

Transitioning is often hard, of course, but its just like riding a bike. You won’t make the transition until you trust your muscle memory a bit at a time.

Learning the theory of how and why things work the way they are is a key point in this transition. Configuration is just the intersection of “how this works” with “what am I trying to do…” If you know how (and why) protocols work, and you know what you’re trying to do, configuration and automation will become a matter of asking the right questions.

Learn the theory, and riding the bike will become second nature—rather than something you must focus on constantly.

On Building a Personal Brand

Russ — Mon, 13 Jun 2022 17:00:02 +0000

How do you balance loyalty to yourself and loyalty to the company you work for?

This might seem like an odd question, but it’s an important component of work/life balance many of us just don’t think about any longer because, as Pete Davis says in Dedicated, we live in a world of infinite browsing. We’re afraid of sticking to one thing because it might reduce our future options. If we dedicate ourselves to something bigger than ourselves, then we might lose control of our direction. In particular, we should not dedicate ourselves to any single company, especially for too long. As a recent (excellent!) blog post over at the ACM says:

Loyalty is generally a good trait, but extreme loyalty to the organization or mission may cause you to stay in the same job for too long.

The idea that we should control our own destiny, never getting lost in anything larger than ourselves, is ubitiquos like water is to a fish. We don’t question it. We don’t argue. It is just true. We assume there are three people who are going to look after “me:” me, myself, and I.

I get it. Honestly, I do. I’ve been there more times than I want to think about. I was the scapegoat in an argument between people far above my pay grade early in my career, causing much angst and pain. I’ve been laid off,—I cared about a company that simply didn’t care about me. Most recently, the family I’d dedicated more than twenty years of my life to ended through a divorce.

I can see why you might ask yourself hard questions about dedicating yourself to anything or anyone.

The problem, as Pete Davis points out, is that the human person was not designed for the kind of digital nomad life represented by the phrase “live for yourself.” We can try to substitute an online community. We can try to replace community with a string of novel experiences. But the truth is it will eventually catch up with you. When you’re young it’s hard to see how it will ever catch up with you, but it will.

Returning to the top—the author of the ACM article advises balancing between dedicating yourself to a company and dedicating yourself to your career. This is wise advice, but it leaves me wondering “how?” Let me lay out some thoughts here. They may not be all of the answer, but they will, I hope, point in the right direction.

First, resist seeing these two choices as orthogonal. They might be at odds in some companies—there are publishers who want your content to build their brand, and they specifically work at preventing you from building your brand. There are companies that explicitly want to own “your whole professional life.” They don’t want you blogging, going to conferences to speak, etc. Avoid these companies.

Instead, find companies that understand your personal brand is an asset to the company. Having a lot of people with strong personal brands in a company makes the company stronger, not weaker. People with strong brands will form communities around themselves. This community is a pool of people from which to recruit top-flight talent. This community allows them to collect new ideas that can be directly applied to problems in the organization. People with strong personal brands will have greater influence when they walk into a room to meet with a customer, a supplier, or just about anyone else. A company full of people with strong personal brands is stronger than one where everyone is faceless, consumed by/hiding behind the company logo.

Second, learn to manage your time effectively. I understand it’s possible to spend so much time building your brand that you don’t get your job done. As an individual, you need to be sensitive to this and learn how to manage your time effectively.

Third, seek out the win/win. Don’t think of every situation through the lens of “it’s either my brand or my employer’s.” There may be times when you cannot do something because, while it would help your brand immensely, it would harm your company’s. There may be times where you need to have a delicate discussion with your manager because you’ve been asked to do something that would be great for the company but would harm your brand. There is almost always a win/win, you just have to find it.

Fourth, seek out a community that’s not attached to work and dedicate yourself to it. Find something larger than yourself. A community that’s not tied to work will be your lifeline when things go wrong.

Finally, expect to get hurt. I know I have (an old saying in my community—never trust a man who doesn’t walk with a limp). You can be the nicest, humblest person in the world. Someone is still going to take advantage of you. In fact, the nicer and humbler you are—the more you care, the more likely it is people are going to take advantage of you. I am amazed at how much people seem to enjoy hurting one another when they believe there won’t be any consequences.

But … if you expect your life to be perfect, you were born in the wrong world. Build up the mental reserves to deal with this. Build a community that will help carry you through. There is nothing better than sitting down and sharing your hurt over a cup of coffee with a good friend (except I don’t drink coffee).

I get it—the world has moved into a YOLO/FOMO phase. If you don’t “grab it,” and right now! you risk missing something really important. We pile up alternative possibilities in our minds, wondering what might have happened if we’d chosen otherwise. We have deep angst over our personal brand, overthinking the concept to the point of diminishing returns.

The solution, though, is not to draw into yourself, to become self-centered. The solution is to find the balance, seek the win/win, dedicate yourself to something bigger than yourself, and find the right way to build your personal brand.

Revisiting BGP Convergence

Russ — Mon, 06 Jun 2022 17:00:16 +0000

My video on BGP convergence elicited a lot of . . . feedback, mainly concerning the difference between convergence in a data center fabric and convergence in the DFZ. Let’s begin here—BGP hunt and the impact of the MRAI are very real in the DFZ. Withdrawing a route can take several minutes.

What about the much more controlled environment of a data center fabric?

Several folks pointed out that the MRAI is often set to 0 in DC fabrics (and many implementations by default). Further, almost all implementations will use an MRAI of 0 for the first received update, holding the second and subsequent advertisements by the MRAI. Several folks also pointed out that all the paths through a DC fabric are the same length, so the second part of the equation is also very small.

These are good points—how do they impact BGP convergence? Let’s use the network below, a small slice of a five-stage butterfly fabric, to think it through. Assume every router is in a different AS, so all the peering sessions are eBGP.

Start with A losing its connection to 101::/64—

T1: A withdraws its route from B and C
T2: B withdraws its route from D and E, C withdraws its route from F and G
T3: D and E withdraw their routes from H, F and G withdraw their routes from K
T4: H and K withdraw their routes from L

Note that L cannot receive one withdraw to remove the route from its local table; it must receive withdraws from both H and K. There’s no way at L to tell whether a withdraw from H means 101::/64 is no longer reachable at all or it is no longer reachable through H. For path-vector protocols, like distance-vector, the neighbor through each path must be considered independently.

What does an MRAI of 0 do? Each of the routers in the network will process the withdraw as soon as they receive it and send a withdraw to their peers as soon as they’re done processing it. The process still takes the same number of steps but each step is much faster.

What is the impact of all the paths’ equal length? So long as every router processes the withdraw at around the same speed, there is no hunt. If H and K send their withdraws simultaneously, L should receive them simultaneously and remove the route to 101::/64 from its table rather than switching from one path to the other. Even if they send their withdraws at different times, L removes entries from its ECMP table until it receives the last withdrawal.

If MRAI slows down convergence, why set it to anything other than 0? Because it’s improbable that every router in the network will process each withdraw simultaneously.

Before 101::/64 is withdrawn, H will be using the paths through D and E for ECMP, but it is only going to be advertising one of these two routes to L—say the path through E. When B sends withdraws to D and E, assume E processes the withdraw just a little faster than D. When H receives D’s withdraw, it will send an implicit withdraw to L, updating the AS path to include D rather than E. A few moments later, D sends a withdraw. H processes this withdraw and sends a withdraw to L.

L has received one implicit withdraw and one withdraw from H because of processing time differentials. In a larger fabric, with a much larger fan-out, the likelihood of differences in timing is much higher and spread across a broader range of possibilities. You can (generally) expect H to send about half as many implicit withdraws as it has paths towards the destination before sending an actual withdraw. If there are eight paths between B and H, H would likely send 3 or 4 implicit withdraws before sending a withdraw.

What if the MRAI were set to 1 second at H? H would receive E’s withdrawal and set the MRAI timer. Assuming D’s withdraw arrives within that 1-second MRAI, H will receive D’s withdraw, squash the implicit withdraw, and send a single withdraw to L instead. Setting the MRAI to something other than 0 reduces the number of updates and reduces processing.

Setting the MRAI to 1 second, and forcing it to trigger across all updates, might improve convergence time—or not. Without experimenting with setting the MRAI to different values at different places in a real network, it is hard to know. Replacing the routers, link speeds, changing processor load, and increasing memory can all have an impact on the “best” settings for optimal convergence.

the bottom line

There will be no hunt in BGP convergence in a network with multiple equal-length/equal-cost paths. This is what we should expect. Because the maximum path length minus the best (current) path length will always be 0, the network will converge as quickly as each router can process and advertise withdraws, bounded by the MRAI.

Setting the MRAI to 0 improves convergence speed at the cost of additional updates, especially in wide fan-out data center fabrics. It’s hard to know whether setting the MRAI to 0 or 1 will give you better convergence speeds; you have to try it to see.

I still think we should be moving away from BGP as our underlay protocol in all but the largest data center fabrics. IGPs (like IS-IS and RIFT) will converge more quickly, are easier to configure and manage, and using different protocols for the underlay and overlay breaks up failure and security domains in useful ways. I know I’m tilting at a windmill on this point, but still …

BGP Policy (Part 7)

Russ — Mon, 09 May 2022 17:00:52 +0000

At the most basic level, there are only three BGP policies: pushing traffic through a specific exit point; pulling traffic through a specific entry point; preventing a remote AS (more than one AS hop away) from transiting your AS to reach a specific destination. In this series I’m going to discuss different reasons for these kinds of policies, and different ways to implement them in interdomain BGP.

In this post—the last post in this series—I’m going to cover do not transit options from the perspective of AS65001 in the following network—

There are cases where an operator does not traffic to be forwarded to them through some specific AS, whether directly connected or multiple hops away. For instance, AS65001 and AS65005 might be operated by companies in politically unfriendly nations. In this case, AS65001 may be legally required to reject traffic that has passed through the nation in which AS65005 is located. There are at least three mechanisms in BGP that are used, in different situations, to enforce this kind of policy.

Do Not Advertise Communities (Provider Specific)

Many providers supply communities a customer can use to block the advertisement of their routes to a particular AS. For instance, if AS65002 were NTT, according to the NTT customer communities site, if AS65001 advertises 100::/64 with the community 65500:65005, NTT would advertise 100::/64 to all its other peers, but not to AS65005.

Note: NTT is not AS65002; this is only used as an illustration of using a community to block advertisement to a peer’s peer.

The operator at AS65001 might reasonably expect that blocking AS65002 from advertising 100::/64 to AS65005 will block all traffic traveling through AS65005—but the vagaries of the global Internet routing table may well cause traffic to be forwarded through AS65005 anyway in some instances.

If AS65006 has a default route pointing to AS65005, traffic destined to 100::/64 may still be forwarded to AS65005. If AS65005 happens to have a covering aggregate route, or learned of the route via AS65004, it might still carry traffic destined for 100::/64.

It is almost impossible to block all traffic to a given reachable destination from being forwarded through a given autonomous system.

AS Path Injection

An alternate, widely used mechanism is to intentionally inject an AS Path loop when advertising a route to prevent some AS from accepting the route. For instance, AS65001 might advertise 100::/64 with the AS Path [65005,65001] to AS65002. AS65005 would then reject this advertisement because the local AS is already in the AS Path.
While this might appear to “break the rules” of BGP, the reality is the AS Path was never really intended to be a “true record” of the path of an “update” (in fact, there is no such thing as an “update” that travels from one router to the next—the “update” is constructed at each hop based on local tables). This technique is problematic in providing “path security” in BGP, but it does not intrinsically break any BGP rules.

Note: For more information about this technique, refer to this episode of the Hedge.

Again, note it is almost impossible to block all traffic to a given reachable destination from being forwarded through a given autonomous system.

Do Not Advertise Communities (Well Known)

Three further well-known communities, although they are not widely used, are worth considering.

When a route is marked with NO-PEER, the AS should only advertise the route to its customers and never its peers. For instance, if AS65001 advertises 100::/64 to AS65003 with NO-PEER, AS65003 will advertise the destination to AS6507 and AS65008 (assuming these are customers), and not to AS65002 or AS65004 (because both of these autonomous systems transit traffic to and from AS65003).

When a route is marked with NO-EXPORT, the AS should not advertise the reachable destination to any other AS. For instance, if AS65001 advertises 100::/64 to AS65003 with NO-EXPORT, AS65003 will not advertise this reachable destination to any other AS, including AS65007, AS65008, AS65002, or AS65004.

When a route is marked with NO-ADVERTISE, the receiving BGP speaker should not advertise the route to any other BGP speaker, including internal and external connections.

BGP Policy (Part 6)

Russ — Tue, 03 May 2022 17:00:43 +0000

In this post I’m going to cover local preference via communities, longer prefix match, and conditional advertisement from the perspective of AS65001 in the following network—

Communities an Local Preference
As noted above, MED is the tool “designed into” BGP for selecting an entrance point into the local AS for specific reachable destinations. MED is not very effective, however, because a route’s preference will always win over MED, and because it is not carried between autonomous systems.
Some operators provide an alternate for MED in the form of communities that set a route’s preference within the AS. For instance, assume 100::/64 is geographically closer to the [65001,65003] link than either of the [65001,65002] links, so AS65001 would prefer traffic destined to 100::/64 enter through AS65003.
In this case, AS65001 can advertise 100::/64 with a community that makes AS65001 prefer the route through AS65003 over the direct route to AS65001 (see 2914:450 on NTT’s list of customer set communities as an example).

Note: Many of the communities described here have regional versions for more specific use cases. These operate on the same principles, just in a more restricted topological or geographical area.

Longer Prefix Match

While MED is often not effective, and using communities is both restricted in range and complex to configure and manage, advertising a longer-prefix match always works, is simple to configure, and easy to deploy.

For instance, if AS65001 would like traffic destined to 100::/64 to only enter from AS65003, it may advertise an aggregated route, say 2001:db8:3e8100::/63 to both AS65003 and AS65002, and then advertise 100::/64 only to AS65003. Because all routing systems will select the prefix with the longest match first, the /64 through AS65003 will be selected over the /63 through AS65003 and AS65003, so the traffic always enters AS65001 the way the operator desires.
The overlapping, or covering, aggregate is advertised to provide backup reachability. If the [AS65001,AS65003] link (or peering) fails for any reason, traffic destined to 100::/64 will follow the /63 route, entering from AS65002. This is not optimal from the perspective of AS65001, but it keeps connectivity in place while any problems can be traced down and repaired.
According to Geoff Huston, a large percentage of the routes in the current global table are advertised for traffic engineering—to manipulate the point at which traffic destined to specific reachable destinations enters an AS.

Note: The use of longer prefix routes to control inbound route flows represents a “tragedy of the commons” problem to the global Internet. Work has been put into various mechanisms designed to remove these more specific routes from the routing table when they are no longer needed, but little progress has been made in implementing them, not have any of these solutions achieved widespread adoption and deployment.

Conditional Advertisement

What if AS65001 has signed a contract with AS65003 to carry traffic only if both its links to AS65002 fails? In this case, AS65001 could advertise many more longer prefix specifics through AS65002 and one shorter covering route through AS6503.

This strategy, however, has two flaws. First, it requires AS6501 to manage the more specifics and covering routes as a set, making certain the pairs are correctly configured. Second, it could be that AS65001 does not want anyone to know about this backup arrangement unless and until it is used. This is sometimes the case when two competitors agree to back one another up, and neither wants anyone to know what their backup arrangements are.

To resolve these (and other) policy problems, operators can use conditional advertisement.

Conditional advertisement is conceptually simple; if a router does not have some route, x, in its routing table, it advertises some other route (given the route is in the local tables so it can be advertised). For instance, AS65001 might configure the router at C to advertise 100::/64 only when it does not have some other route.
The hardest part of configuring conditional advertisement is knowing when to trigger the advertisement of the alternate path. Using the lack of reachability to the destination itself (100::/64 in this case) as the trigger will fail in some circumstances, and will always require the global table to converge before the alternate path is advertised. Instead, conditional advertisement is often triggered by the lack of a route to between the BGP speakers being “watched” (in this case, the two [65001,65002] links) learned through from within the AS (within AS65001, rather than through the global routing table).

Triggering on the internal state of a link directly connected to a router managed by the local operator, and carried through internal convergence, removes external convergence from the time required to begin advertising the alternate path.

BGP Policies (Part 5)

Russ — Mon, 25 Apr 2022 17:00:54 +0000

In this post I’m going to cover AS Path Prepending from the perspective of AS65001 in the following network—

Since the length of the AS Path plays a role in choosing which path to use when forwarding traffic towards a given reachable destination, many (if not most) operators prepend the AS Path when advertising routes to a peer. Thus an AS Path of [65001], when advertised towards AS65003, can become [65001,65001] by adding one prepend, [65001,65001,65001] by adding two prepends, etc. Most BGP implementations allow an operator to prepend as many times as they would like, so it is possible to see twenty, thirty, or even higher numbers of prepends.
Note: The usefulness of prepending is generally restricted to around two or three, as the average length of an AS Path in the global Internet is around 4 hops.

If AS65001 would like traffic destined to 100::/64 to enter from AS65003 rather than AS65002, it can prepend the AS Path at every peering point with AS65002 (A and B) with two hops (sending [65001,65001,65001] to AS65002). If preference, MED, and all other metrics are equal, AS65002 would then prefer the path with the shorter AS Path through AS65003, rather than the path directly into AS65001 (either through A or B).

That all metrics are equal is not likely, however. AS65002 will probably have preference set so routes learned directly from customers (such as AS65001) are selected over routes learned from peers (such as AS65003). The impact of prepending on route selection by directly connected peers is, therefore, uncertain.

Moving one step out in the network, consider the routes received by AS65004 to reach 100::/64. There will be one route along [65002,65001,65001,65001], and another with an AS Path of [65003,65001]. All other things being equal (same preference, etc.), AS65004 will choose to send traffic destined to 100::/64 through AS65003 rather than AS65002. How likely is it all the other BGP metrics will be equal at AS65004? So long as the peering between AS65004, AS65003, and AS65002 are all of the same type, the odds are high—so prepending can help move some (not all) traffic from one inbound link to another.

Because AS Path prepending has variable results over time, operators using this technique often “just try it” to see what the effect will be. There’s no real way to predict how effective prepending any number of times will be in moving traffic from one inbound link to another.

What if AS65001 does not want traffic destined to 100::/64 to traverse AS6505? For instance, suppose AS6506 s on across an ocean, mountain range, or other difficult-to-cross geographic feature. AS65005 crosses this geography via a satellite link, while AS65004 crosses the same geography via an optical cable. Sine optical cable runs can provide better delay and jitter than a satellite link, AS65001 may desire to choose which of these two autonomous systems is traversed to reach 100::/64.

This cannot be directly accomplished using AS Path prepend, as both AS65004 and AS65005 will both receive the same prepended path.

To express this kind of policy, some operators allow their customers to set communities that cause the operator to remotely prepend a given route advertisement. For instance, NTT allows their customers to set a community that will cause NTT to prepend specific routes when those routes are advertised to specific autonomous systems—in this case, AS65001 could add the community 65421:65005 to the advertisement for 100::/64, which would cause NTT to prepend AS65001 when advertising 100::64 to AS65005, and not prepend anything when advertising 100::/64 to AS65004.

This technique is subject to the same caveats as using AS Path prepend locally—it may work in some situations, or it may not—because the local operator does not have visibility into the policies of the operators they are trying to influence.

BGP Policies (Part 4)

Russ — Mon, 04 Apr 2022 18:57:16 +0000

In this post, I’ll cover the first of a few ways to give surrounding autonomous systems a hint about where traffic should enter a network. Note this is one of the most vexing problems in BGP policy, so there will be a lot of notes across the next several posts about why some solutions don’t work all that well, or when they will and won’t work.

There are at least three reasons an operator may want to control the point at which traffic enters their network, including:

Controlling the inbound load on each link. It might be important to balance inbound and outbound load to maintain settlement-free peering, or to equally use all available inbound bandwidth, or to ensure the quality of experience is not impacted by overusing a single link.
Accounting for geographically dispersed entry points. For instance, while the two entry points into AS65001 might appear to be topologically close, they might be geographically diverse, with one being in South America and the other being in North America.
Ensuring flows requiring symmetric paths are properly handled. A common use case is the use of stateful packet filters or port address translators, both of which require inbound and outbound traffic to be routed through a single device.

All these reasons apply to all kinds of network operators, so this section will examine the various techniques used to control traffic entry points from the perspective of AS65001 in the following network—

Policies designed to control the point at which traffic enters an operator’s network will often conflict with policies designed to control the point at which traffic exits some other operator’s network. For instance, AS65001’s policy that all traffic destined to 100::/64 enter the network from AS65002 may conflict with AS6500’2 policy that all traffic destined to 100::/64 leave its network by being forwarded to AS65003.

This effect is not just seen between directly connected autonomous systems. For instance, AS65001’s policy that all traffic destined to 100::/64 enter the network through AS65002 may conflict with AS65004’s policy that all traffic to that same destination exit the network by being forwarded to AS65003.

The original intent of BGP policy was the policy of the sender overrides the policy of the receiver, as expressed in the design of the metrics (the multiple exit discriminator, or MED, has a lower priority than the preference). In real deployments, however, exit and entry policies are more fluid and entangled. These relationships will be considered in each of the sections below, each of which describes a different way to influence or control how traffic destined to a single reachable destination.

Let’s begin with the Multiple Exist Discriminator, or MED.

MED is a suggestion or request to neighboring autonomous systems to forward traffic for reachable destination along a particular path. For instance, AS65001 may desire for traffic being sent to 100::/64 be sent to B in the network diagram, rather than to A or through its link to AS65003.

However, the MED is not a transitive attribute of a BGP route. This means that if AS65001 sets the MED so that entry B is preferred, and sends this MED to AS65003, AS65003 will strip (or reset) the MED before advertising 100::/64 to either AS65004 or AS65002.

MED, in this case, would be useful to help AS65002 determine whether to send this traffic to A or B, but not whether to send the traffic to AS65001 or AS65003. AS65002 will, instead, rely on local policy, primarily preference, to determine which exit point to use. If AS65002 determines the best path to 100::/64 is through one of its direct connections to AS65001 (either A or B), and there is no other reason for AS65002 to choose one path over the other, the MED will be used to determined which path to use.

Because AS65003 only has one connection to AS65001, the MED will not impact its bestpath decision at all. Because AS65001’s MED has been reset or stripped in all the routes to 100::/64 AS65004 receives, AS65001’s MED will not play a role in any bestpath decision there, either (AS65002 or AS65003 may set the MED when sending routes to AS65004, which may influence the path AS65004 chooses, but again only when choosing between multiple connections to the same peering AS).

Because MED is only considered nominally useful, it is often stripped off routes when they are received from another AS.

BGP Policies (Part 3)

Russ — Mon, 28 Mar 2022 17:00:27 +0000

There are many reasons an operator might want to select which neighboring AS through which to send traffic towards a given reachable destination (for instance, 100::/64). Each of these examples assumes the AS in question has learned multiple paths towards 100::/64, one from each peer, and must choose one of the two available paths to forward along.

In the following network—

From AS65001’s perspective

Assume AS65001 is some form of content provider, which means it offers some service such as bare metal compute, cloud services, search engines, social media, etc. Customers from AS65006 are connecting to its servers, located on the 100::/64 network, which generates a large amount of traffic returning to the customers.
From the perspective of AS hops, it appears the path from AS65001 to AS65006 is the same length—if this is true, AS65001 does not have any reason to choose one path or another (given there is no measurable performance difference, as in the cases described above from AS65006’s perspective). However, the AS hop count does not accurately describe the geographic distances involved:

The geographic distance between 100::/64 and the exit towards AS65003 is very short
The geographic distance between AS100::/64 and the exits towards AS65002 is very long
The total geographic distance packets travel when following either path is about the same

In this case, AS65001 can either choose to hold on to packets destined to customers in AS65006 for a longer or shorter geographic distance.
While carrying the traffic over a longer geographic distance is more expensive, AS65001 would also like to optimize for the customer’s quality of experience (QoE), which means AS65001 should hold on to the traffic for as long as possible.

Because customers will use AS65001’s services in direct relation to their QoE (the relationship between service usage and QoE is measurable in the real world), AS65001 will opt to carry traffic destined to customers as long as possible—another instance of cold potato routing.
This is normally implemented by setting the preference for all routes equal and relying on the IGP metric part of the BGP bestpath decision process to control the exit point. IGP metrics can then be tuned based on the geographic distance from the origin of the traffic within the network and the exit point closest to the customer.

An alternative, more active, solution would be to have a local controller monitor the performance of individual paths to a given reachable destination, setting the preferences on individual reachable destinations and tuning IGP metrics in near-real-time to adjust for optimal customer experience.
Another alternative is to have a local controller monitor the performance individual paths and use MPLS, segment routing, or some other mechanism to actively engineer or steer the path of traffic through the network.

Some content providers may directly peer with transit and edge providers to reach customers more quickly, to reduce costs, and to increase their control over customer-facing traffic. For instance, if AS65001 is a content provider that transits traffic through [65002,65005] to reach customers in AS65006. To avoid transiting multiple autonomous systems, AS65001 can run a link directly to AS65005.

In some cases, content providers will build long-haul fiber optics (including undersea cable operations, see this site for examples) to avoid transiting multiple autonomous systems.

While the operator can end up paying a lot to build and operate long-haul optical links, this cost is offset is offset by decreasing paying transit providers for high levels of asymmetric traffic flows. Beyond this, content providers can control user experience more effectively the longer they control the user’s traffic. Finally, content providers can gain more information by connecting closer to users, feeding into Kai-Fu Lee’s virtuous cycle.

Note: content providers peering directly with edge providers and through IXPs is one component of the centralization of the Internet.

A failed alternative to the techniques described here was the use of automatic disaggregation at the content provider’s autonomous system borders. For instance, if a customer connected to a server in 100::/64 by sending traffic via the [65003,65001] link, an automated system will examine the routing table to see which route is currently being used to reach the customer’s reachable destination. If traffic forwarded to this customer’s address would normally pass through one of the [65001,65002] links, a local host route is created and distributed into AS65001 to draw this traffic to the exit connected to AS65003.

The theory behind this automatic disaggregation was that the customer will always take the shortest path from their perspective to reach the service. This assumption fails, in practice, however, so this scheme was ultimately abandoned.

BGP Policies (Part 2)

Russ — Mon, 14 Mar 2022 17:00:50 +0000

In the following network—

From AS65004’s perspective…

Transit providers primarily choose the most optimal exit from their AS to reduce the amount of peering settlement they are paying by using and maintaining settlement-free peering where possible and reducing the amount of time and distance traffic is carried through their network (through hot potato routing, discussed in more detail below).
If, for instance, AS65004 has a paid peering relationship with AS65002, and a contract with AS65003 which is settlement-free so long as the traffic between AS65004 and AS65003 is roughly symmetric. AS65004 has two roughly equal-cost paths (both have the same AS Path length) towards 100::/64. In this situation, AS 65004 is going to direct traffic towards AS65003 to maintain symmetrical traffic flows and direct any remaining traffic towards AS65002.

This kind of balancing is normally done through a controller or network management system that monitors the balance of traffic with AS65003, adjusting the preference of sets of routes to attain the correct balance with AS65003 while reducing the costs of using the link to AS65002 to the minimum possible.

From AS65005’s perspective…

AS65005 can either send traffic originating in AS65001, received from AS65002, and destined to AS65006, to either AS65004—a peer—or AS65006—a customer. The internal path between the entry point for this traffic is longer if the traffic is carried to AS65006, and shorter if the traffic is carried to AS65004. These longer and shorter paths give rise to the concepts of hot and cold potato routing.

If AS65006 is paying AS65005 for transit, AS65005 would normally carry traffic across the longer path to its border with AS65006. This is cold potato routing. AS65005’s reason for choosing this option is to maximize revenue from the customer. First, as the link between AS65005 and AS65006 becomes busier, AS65006 is likely to upgrade the link, generating additional revenue for AS65005. Even if the traffic level is not increasing, steady traffic flow encourages the customer to maintain the link, which protects revenue. Second, AS65005 can control the quality-of-service AS65006 receives by keeping the traffic within its network for as long as possible, improving the customer’s perception of the service they are receiving.
Cold potato routing is normally implemented by setting the preference on routes learned from customers, so these routes are preferred over all routes learned from peers.
If AS6006 is not paying AS65005 for transit, it is to AS65005’s advantage to carry the traffic as short a distance as possible. In this case, although AS65005 is directly connected to AS65006, and the destination is in AS65006, AS65005 will choose to direct the traffic towards its border with AS65004 (because there is a valid route learned for this reachable destination from AS65004).

This is hot potato routing—like the kids’ game, you want to hold on to the traffic for as short an amount of time as possible. Hot potato routing is normally implemented by setting the preference on routes to the same and relying on the IGP metric component of the BGP bestpath decision process to find the closest exit point.

Next week I’ll continue this series on BGP interdomain policies… feel free to leave a comment if you think I’ve explained something incorrectly, etc.

BGP Policies (part 1)

Russ — Mon, 07 Mar 2022 18:00:43 +0000

In the following network—

Examining this from AS65006’s Perspective …

Assuming AS65006 is an edge operator (commonly called enterprise, but generally just originating and terminating traffic, and never transiting traffic), there are several reasons the operator may prefer one exit point (through an upstream provider), including:

An automated system may determine AS65004 has some sort of brownout; in this case, the operator at 65006 has configured the system to prefer the exit through AS65005
The traffic destined to 100::/64 may require a class of service (such as video transport) AS65004 cannot support (for instance, because the link between AS65006 and 65005 has low bandwidth, high delay, or high jitter)

The most common way this kind of policy would be implemented is by setting the BGP LOCAL_PREFERENCE (called preference throughout the rest of this document) on routes learned from AS65005 higher than the preference on routes learned from AS65004.

Another common case is AS65006 would prefer to send traffic to AS65005 only when the destination is in an AS directly connected to AS65005 itself, while sending all other traffic through AS65004. This is common when a one provider has good local and poor global coverage, while the other provider has good global but poor local coverage.

For instance, if AS65006 is in a somewhat isolated part of the world, such as some parts of the South Pacific or Central America, there may be a local provider, such as AS65004, that has solid connectivity to most of the other edge operators in the local geographic region but charges a high cost for transiting to the rest of the global Internet. A second provider, such as AS65005, charges less to reach destinations beyond the local geographic region but is relatively expensive to use when sending traffic to other edge operators within the local region.

Preference, by itself, would be difficult to use in this case, because the operator would need to distinguish between geographically local and geographically distant routes. To implement this kind of policy, the operator would accept partial routes from the geographically local provider (AS65004 in this case) and set a high preference on these routes. Partial routes are typically those the local provider learns only from other directly connected autonomous systems, and hence would only include operators in the local geographic region. The operator would then accept full routes, or the entire Internet global routing table, from the second provider (AS65005 in this case) and set a lower preference.

An alternative way to implement geographic preference is using communities. Many transit providers mark individual reachable destinations with information about where the route originated. NTT, for instance, describes their geographic marking here. An operator can create filters using regular expressions to change the preference of a route based on its geographic origin.

This is not a common way to solve the problem because the filtering rules involved can become complex—but it might be deployed if local providers do not offer partial routes for some reason.

Another alternate to implement geographic preference is to use a regular expression filter to set the preference for each reachable destination based on the length of the AS Path. Theoretically, routes originating within the local region should have an AS Path of one or two hops, while those originating outside a region should have longer AS Paths.
This generally does not work for two reasons. First, the average length of an AS Path (after prepending is factored out) is about 4 hops in the entire global Internet—and it is easy to reach four hops even within a local region in some situations. Second, many operators prepend the AS Path to manage inbound entry point preference; these prepended hops must be factored out to use this method.

Next week I’ll continue this series on BGP interdomain policies… feel free to leave a comment if you think I’ve explained something incorrectly, etc.

Quality is (too often) the missing ingredient

Russ — Mon, 03 Jan 2022 18:00:32 +0000

Software Eats the World?

I’m told software is going to eat the world very soon now. Everything already is, or will be, software based. To some folks, this sounds completely wonderful, but—leaving aside the privacy issues—I still see an elephant in the room with this vision of the future.

Quality.

Let me give you some recent examples.

First, ceiling fans. Modern ceiling fans, in case you didn’t know, don’t rely on the wall switch and pull chains. Instead, they rely on remote controls. This is brilliant—you can dim the light, change the speed of the fan, etc., from a remote control. No unsightly chains hanging from the ceiling.

Well, it’s brilliant so long as it works. I’ve replaced three of the four ceiling fans in my house. Two of the remote controls have somehow attached themselves to two of the three fans. It’s impossible to control one of the fans without also controlling the other. They sometimes get into this entertaining mode where turning one fan off turns the other one on.

For the third one—the one hanging from a 13-foot ceiling—the remote control sometimes operates one of the other fans, and sometimes the fan its supposed to operate. Most of the time it doesn’t seem to do much of anything.

The fan manufacturer—a large, well-known company—mentions this situation in their instructions and points to a FAQ that doesn’t exist. Searching around online I found instructions for solving this problem that involve unwiring the fans and repeating a set of steps 12 times for each fan to correct the situation. These instructions, needless to say, don’t work.

There is no way to reset the remote, nor the connection between the remote and the fan. There is no way to manually select some dip switch so the remote has a specific fan it talks to. Just some mystical software that’s supposed to work (but doesn’t) and no real instructions on how to resolve the problem. The result will be a multi-hour wait on a customer support line, spending hours of my time to sort the problem out, and the joy of climbing (tall) ladders to unwire and wire ceiling fans in four different rooms.

Thinking through possible problems and building software interfaces that take those situations into account … might be a bit more important than we think they are if software is really going to eat the world.

Second, the retailer’s web site—a large retailer with thousands of physical stores across the United States. Twice I’ve ordered from this site, asking to have the item held in the local store so I can pick it up. The site won’t let you order the item for store pickup unless they have it in stock.

The first time they called me to say they couldn’t find the item I ordered, but they found a “newer model” that was a lot less expensive. It was a lot less expensive because it wasn’t the same item. They never did find the item I originally ordered.

The second time they called me to say they couldn’t find the item I ordered. I asked if they could just ship the item to my house when it’s back in stock. “I’m sorry, our system doesn’t allow us to do that …” Several hours later, they called back to tell me they found it, but they cannot reinstate my order—I must place a new order.

Again, software quality strikes … what should be a simple process just isn’t. There will always be mismatches between the state in software and the state in the real world—but design the system so it’s possible to adapt when this happens, rather than shutting down the process and starting over.

Third, I own a car that has all the “bells and whistles,” including an adaptive cruise control system. There are certain situations, however, where this adaptive control does the wrong thing, producing potentially dangerous results. There is no way to set the car to use the non-adaptive cruise control permanently (I called and waited on the phone for several hours to discover this). You can set the non-adaptive cruise control on a per-use basis by going through set of menus to change the settings … while driving.

Software quality anyone?

Software eats the world might be someone’s ultimate dream—but I suspect that software quality will always be the fly in the ointment. People are not perfect (even in crowds); software is created by people; hence software will always suffer from quality problems.

Maybe a little humility about our ability to make things as complex as we might like because “we can always have software do that bit” would be a good thing—even in the networking world.

Thoughts on Auto Disaggregation and Complexity

Russ — Mon, 15 Nov 2021 18:00:54 +0000

Way in the past, the EIGRP team (including me) had an interesting idea–why not aggregate routes automatically as much as possible, along classless bounds, and then deaggregate routes when we could detect some failure was causing a routing black hole? To understand this concept better, consider the network below.

In this network, B and C are connected to four different routers, each of which is advertising a different subnet. In turn, B and C are aggregating these four routes into 2001:db8:3e8:10::/60, and advertising this aggregate towards A. From a control plane state perspective, this is a major win. The obvious gain is that the amount of state is reduced from four routes to one. The less obvious gain is A doesn’t need to know about any changes in the state for the four destinations aggregated into the /60. Depending on how often these links change state, the reduction in the rate of change is, perhaps, more important than the reduction in the amount of control plane state.

We always know there will be a tradeoff when reducing state; what is the tradeoff here? If C somehow loses its connection to one of the four routers, say the router advertising 11::/64, C’s 10::/60 aggregate will not change. Since A thinks C still has a route to every subnet within 10::/60, it will continue sending traffic destined to addresses in the 11::/64 towards both B and C. C will not have a route towards these destinations, so it will drop the traffic.

We have a routing black hole.

for more information on aggregation in networks, take a look at my livelesson on abstraction in computer networks

This much is pretty simple. The harder part is figuring out to eliminate this routing black hole. Our first choice is to just not aggregate these routes. While you might be cringing right now, this isn’t such a bad option in many networks. We often underestimate the amount of state and the speed of state change modern routing protocols running on modern processors can support. I’ve seen networks running IS-IS in a single flooding domain with tens of thousands of routes and thousands of nodes running “in the wild.” I’ve seen IS-IS networks with thousands of nodes and hundreds of thousands of routes running in lab environments. These networks still converge.

But what if we really think we need to reduce the amount and speed of state, so we really need to aggregate these routes?

One solution that has been proposed a number of times through the years is auto disaggregation.

In this case, suppose D somehow realizes C cannot reach one of the components of a shared aggregate route. D could simply stop advertising the aggregate, advertising each of the components instead. The question here might be: is this a good idea? Looking at this from the perspective of the SOS triad, the aggregation replaced four routes with a single route. In the auto disaggregation case, the single route change is replaced by four route changes. The amount of state is variable, and in some cases the rate of change in state is actually higher than without the aggregation.

So…

I don’t hold that auto disaggregation is either good nor bad—it just presents a different set of challenges to the network designer. Instead of designing for average rates of change and given table sizes, you can count on much smaller tables, but you might find there are times when the rate of change is dramatically higher than you expect. A good question to ask, before deploying this kind of technology, might be: can I forsee a chain of events that will cause a high enough rate of state change that auto disaggregation is actually more destabilizing than just not summarizing at all in this network?

A real danger with auto disaggregation, by the way, is using summarization to dramatically reduce table sizes without understanding how a goldilocks failure (what we used to call in telco a mother’s day event, or perhaps a black swan) can cascade into widespread failures. If you’re counting on particular devices in your network only have a dozen or two dozen table entries, but just the right set of failures can cause them to have several thousand entries because of auto disaggregation, what kinds of failures modes should you anticipate? Can you anticipate or mitigate this kind of problem?

The idea of automatically summarizing and disaggregating routes is an interesting study in complexity, state, and optimization. It’s a good brain exercise in thinking through what-if situations, and carefully thinking about when and where to deploy this kind of thing.

What do you think about this idea? When would you deploy it, where, and why? When and where would you be cautious about deploying this kind of technology?

Keith’s Law (1)

Russ — Tue, 28 Sep 2021 18:23:04 +0000

I sometimes reference Keith’s Law in my teaching, but I don’t think I’ve ever explained it. Keith’s Law runs something like this:

Any large external step in a system’s capability is the result of many incremental changes within the system.

The reason incremental changes within a system appear as a single large step to outside observers is the smaller changes are normally hidden by abstraction. This is, in fact, the purpose of abstraction—to hide small changes inside a system from external view. Keith’s law is closely related to Clarke’s third law that “Any sufficiently advanced technology is indistinguishable from magic.” What looks like magic from the outside is really just a bunch of smaller things—each easier to understand on its own—combined into one single “thing” through abstraction.
If you’ve read this far, you’re probably thinking—what does this have to do with network engineering?
Well, several things, really.

First—the network is just an abstraction that moves packets to its users. Moving packets seems so … simple … to network users. You put data in here, and data comes out over there. All the little stuff that goes into making a network work are lost in the abstraction of the virtual connection between two hosts.

If you want users to understand why building a network is hard, you’re going to have to work hard at it. And you’re not likely to succeed—it’s often better just to live with the reality that users aren’t going to understand. Of course, this isn’t necessarily a bad thing, at least until it’s time to buy hardware and software to make all this magic work.

Second—no-one outside the network is ever going to understand the refactoring, simplification, and new features you’re trying to build into the network on their own. Users will only understand these things when they are related to some bigger picture, something they can see beyond the abstraction the network presents.

If you’re going to justify doing new things, you need to do so in terms of “larger things,” things that can be seen from outside the abstraction.

Third—no-one is going to pat you on the back for all the little things that need to be done to deploy a new major service. From the outside, that new service, or new cost savings, or whatever—it’s all just indistinguishable from magic.

Keith’s law is both good and bad. But it also means you need to learn how to frame your work in a way that users, who don’t have access to the inner workings of the network, can understand why you’re doing what you’re doing.

Turning this around, this also means you shouldn’t accept the “magic” of vendor products. That brilliant new capability your vendor is showing you is really made up of a lot of smaller components. The abstraction is just that—an abstraction. If you really want to understand the positive and negative consequences of deploying something new, you need to look beyond the abstraction.

Thoughts on the Collapsed Spine

Russ — Tue, 21 Sep 2021 17:00:25 +0000

One of the designs I’ve been encountering a lot of recently is a “collapsed spine” data center network, as shown in the illustration below.

In this design, and B are spine routers, while C-F are top of rack switches. The terminology is important here, because C-F are just switches—they don’t route packets. When G sends a packet to H, the packet is switched by C to A, which then routes the packet towards F, which then switches the packet towards H. C and F do not perform an IP lookup, just a MAC address lookup. A and B are responsible for setting the correct next hop MAC address to forward packets through F to H.
What are the positive aspects of this design? Primarily that all processing is handled on the two spine routers—the top of rack switches don’t need to keep any sort of routing table, nor do any IP lookups. This means you can use very inexpensive devices for your ToR. In brownfield deployments, so long as the existing ToR devices can switch based on MAC addresses, existing hardware can be used.

This design also centralizes almost all aspects of network configuration and management on the spine routers. There is little (if anything) configured on the ToR devices.
What about negative aspects? After all, if you haven’t found the tradeoffs, you haven’t looked hard enough. What are they here?

First, I’m struggling to call this a “fabric” at all—it’s more of a mash-up between a traditional two-layer hierarchical design with a routed core and switched access. Two of the points behind a fabric are the fabric doesn’t have any intelligence (all ports are undifferentiated Ethernet) and all the devices in the fabric are the same.

I suppose you could say the topology itself makes it more “fabric-like” than “network-like,” but we’re squinting a bit either way.

The second downside of this design is that it impacts the scaling properties of the fabric. This design assumes you’ll have larger/more intelligent devices in the spine, and smaller/less intelligent devices in the ToR. One of my consistent goals in designing fabrics has always been to push as close to single-sku as possible—use the same device in every position in the fabric. This greatly simplifies instrumentation, troubleshooting, and supply chain management.

One of the primary points of moving from a network in the more traditional sense to a “true fabric” is to radically simplify the network—this design doesn’t seem like it’s as “simple,” on the network side of things, as it could be. Again, something of a “mash-up” of a simpler fabric and a more traditional two-layer hierarchical routed/switched network.

Scale-out is problematic in this design, as well. You’d need to continue pushing cheap/low-intelligence switches along the edge, and adding larger devices in the spine to make this work over time. At some point, say when you have eight or sixteen spines, you’d be managing just as much configuration—and configuration that’s necessarily more complex because you’re essentially managing remote ports rather than local ones—as you would by just moving routing down to the ToR devices. There’s some scale point here with this design where it’s adding overhead and unnecessary complexity to save a bit of money on ToR switches.

When making the choice between OPEX and CAPEX, we should all know which one to pick.

Where would I use this kind of design? Probably in a smaller network (small enough not to use chassis devices in the spine) which will never need to be scaled out. I might use it as a transition mechanism to a full fabric at some point in the future, but I would want a well-designed planned to transition—and I would want it written in stone that this would not be scaled in the future beyond a specific point.

There’s nothing more permanent in the world than temporary government programs and temporary network designs.
If anyone has other thoughts on this design, please leave them in the comments below.

Russ’ Rules of Network Design

Russ — Tue, 14 Sep 2021 17:00:54 +0000

We have the twelve truths of networking, and possibly Akin’s Laws, but is there a set of rules for network design? I couldn’t find one, so I decided to create one, containing 18 laws I’ve listed below.

Russ’ Rules of Network Design

If you haven’t found the tradeoffs, you haven’t looked hard enough.
Design is an iterative process. You probably need one more iteration than you’ve done to get it right.
A design isn’t finished when everything needed is added, it’s finished when everything possible is taken away.
Good design isn’t making it work, it’s making it fail gracefully.
Effective, elegant, efficient. All other orders are incorrect.
Don’t fix blame; fix problems.
Local and global optimization are mutually exclusive.
Reducing state always reduces optimization someplace.
Reducing state always creates interaction surfaces; shallow and narrow interaction surfaces are better than deep and broad ones.
The easiest place to improve or screw up a design is at the interaction surfaces.
The optimum is almost always in the middle someplace; eschew extremes.
Sometimes its just better to start over.
There are a handful of right solutions; there is an infinite array of wrong ones.
You are not immensely smarter than anyone else in networking.
A bad design with a good presentation is doomed eventually; a good design with a bad presentation is doomed immediately.
You can only know your part of the system and a little bit about the parts around your part. The rest is rumor and pop psychology.
To most questions the correct initial answer should be “how many balloons fit in a bag?”
Virtual environments still have hard physical limits.

You can find a handy printable version here.

Marketing Wins

Russ — Mon, 30 Aug 2021 18:07:49 +0000

Off-topic post for today …

In the battle between marketing and security, marketing always wins. This topic came to mind after reading an article on using email aliases to control your email—

For example, if you sign up for a lot of email newsletters, consider doing so with an alias. That way, you can quickly filter the incoming messages sent to that alias—these are probably low-priority, so you can have your provider automatically apply specific labels, mark them as read, or delete them immediately.

One of the most basic things you can do to increase your security against phishing attacks is to have two email addresses, one you give to financial institutions and another one you give to “everyone else.” It would be nice to have a third for newsletters and marketing, but this won’t work in the real world. Why?

Because it’s very rare to find a company that will keep two email addresses on file for you, one for “business” and another for “marketing.” To give specific examples—my mortgage company sends me both marketing messages in the form of a “newsletter” as well as information about mortgage activity. They only keep one email address on file, though, so they both go to a single email address.

A second example—even worse in my opinion—is PayPal. Whenever you buy something using PayPal, the vendor gets the email address associated with the account. That’s fine—they need to send me updates on the progress of the item I ordered, etc. But they also use this email address to send me newsletters … and PayPal sends any information about account activity to the same email address.

Because of the way these things are structured, I cannot separate information about my account from newsletters, phishing attacks, etc. Since modern Phishing campaigns are using AI to create the most realistic emails possible, and most folks can’t spot a Phish anyway, you’d think banks and financial companies would want to give their users the largest selection of tools to fight against scams.

But they don’t. Why?

Because—if your financial information is mingled with a marketing newsletter, you’ll open the email to see what’s inside … you’ll pay attention. Why spend money helping your users not pay attention to your marketing materials by separating them from “the important stuff?”

When it comes to marketing versus security, marketing always wins. Somehow, we in IT need to do better than this.

It always takes longer than you think

Russ — Mon, 23 Aug 2021 17:00:43 +0000

Everyone is aware that it always takes longer to find a problem in a network than it should. Moving through the troubleshooting process often feels like swimming in molasses—you’re pulling hard, and progress is being made, but never fast enough or far enough to get the application back up and running before that crucial deadline. The “swimming in molasses effect” doesn’t end when the problem is found out, either—repairing the problem requires juggling a thousand variables, most of which are unknown, combined with the wit and sagacity of a soothsayer to work with vendors, code releases, and unintended consequences.

It’s enough to make a network engineer want to find a mountain top and assume an all-knowing pose—even if they don’t know anything at all.
The problem of taking longer, though, applies in every area of computer networking. It takes too long for the packet to get there, it takes to long for the routing protocol to converge, it takes too long to support a new application or server. It takes so long to create and validate a network design change that the hardware, software and processes created are obsolete before they are used.

Why does it always take too long? A short story often told to me by my Grandfather—a farmer—might help.
One morning a farmer got early in the morning, determined to throw some hay down to the horses in the stable. While getting dressed, he noticed one of the buttons on his shirt was loose. “No time for that now,” he thought, “I’ll deal with it later.” Getting out to the barn, he climbed up the ladder to the loft, and picked up a pitchfork. When he drove the fork into the hay, the handle broke.

It took a few minutes to search for the lost button, but he found it and headed over to the cleaners to have it sewn back on “real fast.” Well, he couldn’t wander around town in his undershirt, so he just stepped next door to the barber’s, where there were a few friendly games of checkers already in progress. He played a couple of games, then the barber came out to remind him that he needed a haircut (a thing barbers tend to do all the time for some reason), so he decided to have it done. “Might was well not waste the time in town now I’m here,” he thought.

The haircut finished, he went back to get his shirt, and realized it was just about lunch. Back to the diner again. Once he was done, he jumped in his truck and headed back to the farm. And then he realized—the horses were hungry, the hay hadn’t been pitched, and … his pitchfork was broken.

And this is why it always takes longer than it should to get anything done with a network. You take the call and listen to the customer talk about what the application is doing, which takes a half an hour. You then think about what might be wrong, perhaps kicking a few routers “just for good measure” before you start troubleshooting in earnest. You look for a piece of information you need to understand the problem, only to find the telemetry system doesn’t collect that data “yet”—so you either open a ticket (a process that takes a half an hour), or you “fix it now” (which takes several hours). Once you have that information, you form a theory, so you telnet into a network device to check on a few things… only to discover the device you’re looking at has the wrong version of code… This requires a maintenance window to fix, so you put in a request…

Once you even figure out what the problem is, you encounter a series of hurdles lined up in front of you. The code needs to be upgraded, but you have to contact the vendor to make certain the new code supports all the “stuff” you need. The configuration has to be changed, but you have to determine if the change will impact all the other applications on the network. You have to juggle a seemingly infinite number of unintended consequences in a complex maze of software and hardware and people and processes.

And you wonder, the entire time, why you just didn’t learn to code and become a backend developer, or perhaps a mountain-top guru.

So the next time you think it’s taking to long to fix the problem, or design a new addition to the network, or for the vendor to create that perfect bit of code, remember the farmer, and the button that left the horses hungry.

The Centralization of the Internet

Russ — Mon, 16 Aug 2021 17:00:09 +0000

My article on Internet centralization just published over at The Public Discourse—

Most of the Internet’s traffic now flows through the networks of a few large companies rather than a multitude of small transit providers, and the Internet’s physical infrastructure is being reshaped to meet this new reality. But relying on a few providers to host all the content on the Internet makes it possible for just a few companies to shut down entire services or control speech.

The Grass is Always Greener

Russ — Mon, 09 Aug 2021 18:10:15 +0000

This last week I was talking to someone at a small startup that intends to eliminate all the complex routing from campus networks. In the past, when reading blog posts about Kubernetes, I’ve read about how it was designed to eliminate routing protocols because “routing protocols are so complex.”

Color me skeptical.

There are two reasons for complexity in a design. The first is you’re solving a hard problem. The second is you’ve made bad design choices in the past, and you’re pasting complexity on top to solve some perceived problem (whether perceived or real).

The problem with all this talk about building something that’s “less complex” is people tend to see complexity of the first kind and think, “we can get rid of that complexity if we start over.” Failing to understand the past before building the future is a recipe for repeated failures of the same kind. Building a network without a distributed routing protocol hasn’t been tried before either, right? Well, yes, it has … We either forget how it turned out, or we say “well, that’s not the same thing I’m talking about here” (just like “real socialism hasn’t ever been tried”).

Even worse, they think they get rid of second and third kinds of complexity by starting over, or getting the humans out of the decision-making loop, or focusing on the data. Our modern penchant for relying “the data,” without ever thinking about the source of the data or how the data has been shaped and interpreted, is truly breathtaking.

They look over the horizon, see an unspoiled field, and think “the grass really is greener on the other side.”

Get rid of all those complex dynamic routing protocols … get rid of all those humans making decisions, so the decisions are “data driven” … and everything will be so much better.

Adding complexity to solve hard real-world problems is just the way things are, and they will always be, so the first reason for complexity will always be with us. People make mistakes, don’t see into the future perfectly, or just don’t have a perfect understanding of the system (technical debt), so the second kind of complexity will always be with us. You can’t “fix” people—God save us from those who think they can. The grass isn’t always greener—it just always looks that way.

What’s the practical upshot? Networks are always going to be complex. It’s just the nature of the problem being solved.

We add complexity because we fail to ask the right questions, we don’t understand the system, or we fail to do good design. The solution isn’t to seek out a greener field “out there,” but rather to make the field we currently live in greener by asking the right questions and reducing complexity through good design. Sometimes you might even need to start over with a new network … but when you start thinking about starting over with a newly designed set of protocols because the old ones are “too complex,” you need to ask how those old ones got that way, and how you’re going to stop the new ones from getting to the same place.

The grass is always greener because you looking at it through green-colored lenses just as the new grass is in its full flush, and before the weeds have had a chance to take over.

Learn how old things worked before you fall for some new “modern wonder” that’s going to solve every problem. The complexity in old things will show you where you can expect to find complexity grow up in new things.

It Always Takes Too Long

Russ — Mon, 02 Aug 2021 17:00:24 +0000

It always takes longer to find a problem than it should. Moving through the troubleshooting process often feels like swimming in molasses—it’s never fast enough or far enough to get the application back up and running before that crucial deadline. The “swimming in molasses effect” doesn’t end when the problem is found out, either—repairing the problem requires juggling a thousand variables, most of which are unknown, combined with the wit and sagacity of a soothsayer to work with vendors, code releases, and unintended consequences.

Why does it always take too long? A short story often told to me by my Grandfather—a farmer—might help.

One morning a farmer got early in the morning, determined to throw some hay down to the horses in the stable. While getting dressed, he noticed one of the buttons on his shirt was loose. “No time for that now,” he thought, “I’ll deal with it later.” Getting out to the barn, he climbed up the ladder to the loft, and picked up a pitchfork. When he drove the fork into the hay, the handle broke.

He sighed, took the broken pieces down the ladder, and headed over to his shed to replace the handle—but when he got there, he realized he didn’t have a new handle that would fit. Sighing again, he took the broken pieces to his old trusty truck and headed into town—arriving before the hardware store opened. “Well, I’m already here, might as well get some coffee,” he thought, so he headed to the diner. After a bit, he headed to the store to buy a handle—but just as he walked out past the door, the loose button caught on the handle, popping off.
It took a few minutes to search for the lost button, but he found it and headed over to the cleaners to have it sewn back on “real fast.” Well, he couldn’t wander around town in his undershirt, so he just stepped next door to the barber’s, where there were a few friendly games of checkers already in progress. He played a couple of games, then the barber came out to remind him that he needed a haircut (a thing barbers tend to do all the time for some reason), so he decided to have it done. “Might was well not waste the time in town now I’m here,” he thought.

And you wonder, the entire time, why you just didn’t learn to code and become a backend developer, or perhaps a mountain-top guru.

Leveraging Similarities

Russ — Mon, 26 Jul 2021 17:00:50 +0000

We tend to think every technology and every product is roughly unique—so we tend to stay up late at night looking at packet captures and learning how to configure each product individually, and chasing new ones as if they are the brightest new idea (or, in marketing terms, the best thing since sliced bread). Reality check: they aren’t. This applies across life, of course, but especially to technology. From a recent article—

Whenever I start learning a new programming language, I focus on defining variables, writing a statement, and evaluating expressions. Once I have a general understanding of those concepts, I can usually figure out the rest on my own. Most programming languages have some similarities, so once you know one programming language, learning the next one is a matter of figuring out the unique details and recognizing the differences.

RFC1925 rule 11 states—

Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works.

Rule 11 isn’t just a funny saying—rule 11 is your friend. If want to learn new things quickly, learn rule 11 first. A basic understanding of the theory of networking will carry across all products, all marketing campaigns, and all protocols.

Whatever it is, you need more (RFC1925 rule 9)

Russ — Mon, 19 Jul 2021 17:00:08 +0000

There is never enough. Whatever you name in the world of networking, there is simply not enough. There are not enough ports. There is not enough speed. There is not enough bandwidth. Many times, the problem of “not enough” manifests itself as “too much”—there is too much buffering and there are too many packets being dropped. Not so long ago, the Internet community decided there were not enough IP addresses and decided to expand the address space from 32 bits in IPv4 to 128 bits in IPv6. The IPv6 address space is almost unimaginably huge—2 to the 128^th power is about 340 trillion, trillion, trillion addresses. That is enough to provide addresses to stacks of 10 billion computers blanketing the entire Earth. Even a single subnet of this space is enough to provide addresses for a full data center where hundreds of virtual machines are being created every minute; each /64 (the default allocation size for an IPv6 address) contains 4 billion IPv4 address spaces.

But… what if the current IPv6 address space simply is not enough? Engineers working in the IETF have created two different solutions over the years for just this eventuality. In 1994 RFC1606 provided a “letter from the future” describing the eventual deployment of IPv9, which was (in this eventual future) coming to the end of its useful lifetime because the Internet was running out of numbers. In RFC1606, it is noted that IPv9’s 49 levels of hierarchy had proven popular, but not all the levels had found a use. The highest level in use seems to be level 39, which was being used to address individual subatomic particles. Part of the dwindling address space considered in RFC1606 was the default allocation of about 1 billion addresses to each household. As the number of homes built increased globally, the IPv9 address space came under increasing pressure. The allocation of groups of addresses to recyclable items was not helpful, either, regardless of the ability to multicast “all cardboard items” in a recycling bin.

An alternate proposal, written many years later, is RFC8135, which considers complex addressing in IPv6. RFC8135 begins by describing the different ways in which a set of numbers, such as the 128-bit space in the IPv6 address, can be represented, including integers, prime, and composite. Each of these are considered in some detail, but eventually rejected for various reasons. For instance, integer (or fixed point) addresses are rejected because the location of a host is not fixed, so fixed point addresses are a poor representation of the host. Prime addresses are likewise rejected because they take too long to compute, and composite addresses are rejected because they are too difficult to differentiate from prime addresses.

RFC8135 proposes a completely different way of looking at the 128-bits available in the IPv6 address space. Rather than treating IPv6’s address space as a simple integer, this specification advocates for treating it as a floating number. This allows for a much larger space, particularly as aggregation can be indicated through scientific notation. The main problem the authors note with this proposal is users may believe that when they assign a floating address to their device, the device itself thereby becomes waterproof and floating. The authors advice users count on a waterproofing app, available in most app stores, for this function, rather than counting on the floating address. The authors also note duct tape can be used to permanently attach a floating address to a fixed device, if needed.

The danger, of course, is that in the quest for “more,” network designers, network operators, and protocol designers could end up embracing the ridiculous. It all brings to mind the point Andrew Tanenbaum made in a standard work on Networking, Computer Networks. Tanenbaum calculates the bandwidth of a station wagon full of magnetic tape (specifically VHS format) backups. After considering the amount of time it would take to drive the station wagon across the continental United States, he concludes the vehicle has more bandwidth than any link available at that time. A similar calculation could be made with a mid-sized shipping box available from any overnight package carrier, filled with SSD drives (or similar). The conclusion, according to Dr. Tanenbaum, is networks are a sop to human impatience.

As there is no bound to human impatience, no matter how much you have, as RFC1925 says, you will always need more.

NATs, PATs, and Network Hygiene

Russ — Tue, 13 Jul 2021 17:00:14 +0000

While reading a research paper on address spoofing from 2019, I ran into this on NAT (really PAT) failures—

In the first failure mode, the NAT simply forwards the packets with the spoofed source address (the victim) intact … In the second failure mode, the NAT rewrites the source address to the NAT’s publicly routable address, and forwards the packet to the amplifier. When the server replies, the NAT system does the inverse translation of the source address, expecting to deliver the packet to an internal system. However, because the mapping is between two routable addresses external to the NAT, the packet is routed by the NAT towards the victim.

The authors state 49% of the NATs they discovered in their investigation of spoofed addresses fail in one of these two ways. From what I remember way back when the first NAT/PAT device (the PIX) was deployed in the real world (I worked in TAC at the time), there was a lot of discussion about what a firewall should do with packets sourced from addresses not indicated anywhere.

If I have an access list including 192.168.1.0/24, and I get a packet sourced from 192.168.2.24, what should the NAT do? Should it forward the packet, assuming it’s from some valid public IP space? Or should it block the packet because there’s no policy covering this source address?

This is similar to the discussion about whether BGP speakers should send routes to an external peer if there is no policy configured. The IETF (though not all vendors) eventually came to the conclusion that BGP speakers should not advertise to external peers without some form of policy configured.

My instinct is the NATs here are doing the right thing—these packets should be forwarded—but network operators should be aware of this failure mode and configure their intentions explicitly. I suspect most operators don’t realize this is the way most NAT implementations work, and hence they aren’t explicitly filtering source addresses that don’t fall within the source translation pool.

In the real world, there should also be a box just outside the NATing device that’s running unicast reverse path forwarding checks. This would resolve these sorts of spoofed packets from being forwarding into the DFZ—but uRPF is rarely implemented by edge providers, and most edge connected operators (enterprises) don’t think about the importance of uRPF to their security.

All this to say—if you’re running a NAT or PAT, make certain you understand how it works. Filters are tricky in the best of circumstances. NAT and PATs just make filters trickier.

Details and Complexity

Russ — Tue, 29 Jun 2021 01:00:49 +0000

What is the first thing almost every training course in routing protocols begin with? Building adjacencies. What is considered the “deep stuff” in routing protocols? Knowing packet formats and processes down to the bit level. What is considered the place where the rubber meets the road? How to configure the protocol.

I’m not trying to cast aspersions at widely available training, but I sense we have this all wrong—and this is a sense I’ve had ever since my first book was released in 1999. It’s always hard for me to put my finger on why I consider this way of thinking about network engineering less-than-optimal, or why we approach training this way.

This, however, is one thing I think is going on here—

The typical program aims to counter the inherent complexity of the decision by providing in-depth information. By providing such extremely detailed and complex information, these interventions try to enable people to make perfect decisions.

We believe that by knowing ever-deeper reaches of detail about a protocol, we are not only more educated engineers, but we will be able to make better decisions in the design and troubleshooting spaces.

To some degree, we think we are managing the complexity of the protocol by “making our knowledge practical”—by knowing the bits, bytes, and configurations. This natural tendency to “dig in,” to learn more detail, turns out to be counterproductive. Continuing from the same article—

The scientific opinion of many psychologists and behavioral scientists suggests the key to time-sensitive decision making in complex and chaotic situations is simplicity, not complexity. Simple-to-remember rules of thumb, or heuristics, speed the cognitive process, enabling faster decisionmaking and action. Recognizing that heuristics have limitations and are not a substitute for basic research and analysis, they nevertheless help break complexity-induced paralysis and support the development of good plans that can achieve timely and acceptable results. The best heuristics capture useful information in an intuitive, easy-to-recall way. Their utility is in assisting decision makers in complex and chaotic situations to make better and timelier decisions.

Knowing why a protocol works the way it does—understanding what it’s doing and why—from an abstract perspective is, I believe, a more important skill for the average network engineer than knowing the bits and bytes—or the configuration.

Abstract correctly—but abstract more. Get back to the basics and know why things work the way they do. It’s easier to fill in the details if you know the how and why.

It’s Most Complicated than You Think (RFC1925, Rule 8)

Russ — Mon, 14 Jun 2021 17:00:47 +0000

It’s not unusual in the life of a network engineer to go entire weeks, perhaps even months, without “getting anything done.” This might seem odd for those who do not work in and around the odd combination of layer 1, layer 3, layer 7, and layer 9 problems network engineers must span and understand, but it’s normal for those in the field. For instance, a simple request to support a new application might require the implementation of some feature, which in turn requires upgrading several thousand devices, leading to the discovery that some number of these devices simply do not support the new software version, requiring a purchase order and change management plan to be put in place to replace those devices, which results in … The chain of dominoes, once it begins, never seems to end.

Or, as those who have dealt with these problems many times might say, it is more complicated than you think. This is such a useful phrase, in fact, it has been codified as a standard rule of networking in RFC1925 (rule 8, to be precise).

Take, for instance, the problem of sending documents through electronic mail—in the real world, there are various mechanisms available to group documents, so the recipient understands what documents go together as a set, which ones are separate—staples, paperclips, binders, folders, etc. In the virtual world, however, documents are just a big blob of bits. How does anyone know which documents go with which in this situation? The obvious solution is to create electronic versions of staples and paperclips, as described in RFC1927. This only seems simple, however; it is more complicated than you think.

For instance, how do you know someone along the document transmission path has not altered the staples and/or paper clips? To prevent staple tampering, electronic staples must be cryptographically signed in some way. In the real world, paper clips (in particular) are removed from documents and re-used to save money and resources. Likewise, there must be some process to discover unused digital document sets so the paper clips may be removed and placed in some form of storage for reuse. Some people like to use differently colored staples or paperclips; how should these be represented in the digital world? RFC1927 describes MIME labels to resolve most of these problems, but there is one final problem that brings the complexity of grouping electronic documents to an entirely new level: metadata creep. What happens when the amount of data required to describe the staple or paperclip becomes larger than the documents being grouped?

Something as simple as representing characters in a language can often be more complex than it might initially seem. RFC5242 attempts to resolve the complexity of the many available encoding schemes with a single coding scheme. Rather than assigning each symbol within a language to a single number within a number space, like ASCII and UNICODE do, however, RFC5242 suggests creating a set of codes which describe how a character looks, rather than what it stands for. This allows the authors to use four principles—if it looks alike, it is alike; if it is the same thing, it is the same thing; san-serif is preferred; combine characters rather than creating new ones where possible—to create a simplified way to describe any possible character in virtually any “Latin” language. The result requires a bit more space to store in some cases, and is more difficult to process, but it is simpler at least from some perspective.

RFC5242 reminds me of a protocol custom-developed for an application I once had to troubleshoot—the entire protocol was sent in actual ASCII text. At least it was simpler to read on the network packet capture tool. There are, of course, many other examples of things being more complex than initially thought in the networking world—which is probably a good thing, because it means those many reports of the demise of the network engineer are probably greatly exaggerated.

Illusory Correlation and Security

Russ — Mon, 31 May 2021 17:00:53 +0000

Fear sells. Fear of missing out, fear of being an imposter, fear of crime, fear of injury, fear of sickness … we can all think of times when people we know (or worse, a people in the throes of madness of crowds) have made really bad decisions because they were afraid of something. Bruce Schneier has documented this a number of times. For instance: “it’s smart politics to exaggerate terrorist threats” and “fear makes people deferential, docile, and distrustful, and both politicians and marketers have learned to take advantage of this.” Here is a paper comparing the risk of death in a bathtub to death because of a terrorist attack—bathtubs win.

But while fear sells, the desire to appear unafraid also sells—and it conditions people’s behavior much more than we might think. For instance, we often say of surveillance “if you have done nothing wrong, you have nothing to hide”—a bit of meaningless bravado. What does this latter attitude—“I don’t have anything to worry about”—cause in terms of security?

Several attempts at researching this phenomenon have come to the same conclusion: average users will often intentionally not use things they see someone they perceive as paranoid using. According to this body of research, people will not use password managers because using one is perceived as being paranoid in some way. Theoretically, this effect is caused by illusory correlation, where people associate an action with a kind of person (only bad/scared people would want to carry a weapon). Since we don’t want to be the kind of person we associate with that action, we avoid the action—even though it might make sense.

This is just the flip side of fear sells, of course. Just like we overestimate the possibility of a terrorist attack impacting our lives in a direct, personal way, we also underestimate the possibility of more mundane things, like drowning in a tub, because we either think can control it, or because we don’t think we’ll be targeted in that way, or because we want to signal to the world that we “aren’t one of those people.”

Even knowing this is true, however, how can we counter this? How can we convince people to learn to assess risks rationally, rather than emotionally? How can we convince people that the perception of control should not impact your assessment of personal security or safety?

Simplifying design and use of the systems we build would be one—perhaps not-so-obvious—step we can take. The more security is just “automatic,” the more users will become accustomed to deploying security in their everyday lives. Another thing we might be able to do is stop trying to scare people into using these technologies.

In the meantime, just be aware that if you’re an engineer, your use of a technology “as an example” to others can backfire, causing people to not want to use those technologies.

Is it really the best just because its the most common?

Russ — Mon, 24 May 2021 17:00:38 +0000

I cannot count the number of times I’ve heard someone ask these two questions—

What are other people doing?
What is the best common practice?

While these questions have always bothered me, I could never really put my finger on why. I ran across a journal article recently that helped me understand a bit better. The root of the problem is this—what does best common mean, and how can following the best common produce a set of actions you can be confident will solve your problem?

Bellman and Oorschot say best common practice can mean this is widely implemented. The thinking seems to run something like this: the crowd’s collective wisdom will probably be better than my thinking… more sets of eyes will make for wiser or better decisions. Anyone who has studied the madness of crowds will immediately recognize the folly of this kind of state. Just because a lot of people agree it’s a good idea to jump off a cliff does not mean it is, in fact, a good idea to jump off a cliff.

Perhaps it means something closer to this is no worse than our competitors. If that’s the meaning, though, it’s a pretty cynical result. It’s saying, “I don’t mind condemning myself to mediocrity so long as I see everyone else doing the same thing.” It doesn’t sound like much of a way to grow a business.

The authors do provide their definition—

For a given desired outcome, a “best practice” is a means intended to achieve that outcome, and that is considered to be at least as “good” as the best of other broadly considered means to achieve that same outcome.

The thinking seems to run something like this—it’s likely that everyone else has tried many different ways of doing this; that they have all settled on doing this, this way, means all those other methods are probably not as good as this one for some reason.

Does this work? There’s no way to tell without further investigation. How many of the other folks doing “this” spent serious time trying alternatives, and how many just decided the cheapest way was the best no matter how poor the result might be? In fact, how can we know what the results of doing things “this way” have in all those other networks? Where would we find this kind of information?

In the end, I can’t ever make much sense out of the question, “what is everyone else doing?” Discovering what everyone else is doing might help me eliminate possibilities (that didn’t work for them, so I certainly don’t want to try it), or it might help me understand the positive and negative attributes of a given solution. Still, I don’t understand why “common” should infer “best.”

The best solution for this situation is simply going to be the best solution. Feel free to draw on many sources, but don’t let other people determine what you should be doing.

The Effectiveness of AS Path Prepending (2)

Russ — Mon, 17 May 2021 17:00:02 +0000

Last week I began discussing why AS Path Prepend doesn’t always affect traffic the way we think it will. Two other observations from the research paper I’m working off of are:

Adding two prepends will move more traffic than adding a single prepend
It’s not possible to move traffic incrementally by prepending; when it works, prepending will end up moving most of the traffic from one inbound path to another

A slightly more complex network will help explain these two observations.

Assume AS65000 would like to control the inbound path for 100::/64. I’ve added a link between AS65001 and 65002 here, but we will still find prepending a single AS to the path won’t make much difference in the path used to reach 100::/64. Why?

Because most providers will have a local policy configured—using local preference—that causes them to choose any available customer connection over other paths. AS65001, on receiving the route to 100::/64 from AS65000, will set the local preference so it will prefer this route over any other route, including the one learned from AS65002. So while the cause is a little different in this case than the situation covered in the first post, the result is the same.

We can, of course, prepend twice onto the AS Path rather than once. What impact would that have here? It still won’t impact the traffic originating in 65005 because AS65001 is the only path available towards 100::64 from their perspective. Prepending cannot change anything if there’s only one path.

However, if most of the traffic destined to 100::/64 coming from AS65006, 7, and 8 rather than from AS65005, prepending two times will allow AS65000 to shift the traffic from the path through AS65002 to the path through AS65001. This example might seem a little contrived. Still, it’s pretty similar to networks that have one connection to some local provider (a cable company or something similar) and one connection to a more prominent national or international provider. Any time you are connected to two different providers who have different ranges of connectivity, prepending two autonomous systems on the AS Path will probably be able to shift traffic from one inbound link to another.

What about prepending more than two hops to the AS Path? Each additional prepend going to shift smaller amounts of traffic. It makes sense that increasing the number of prepends doesn’t shift much more because the further away you get from the edge of the Internet, the more fully connected the autonomous systems are, and the more likely you are to run into some other policy that will override the AS Path in determining the best path. The average length of the AS Path in the Internet is around four; prepending more than this normally won’t have much of an effect on traffic flow

The second question above can also be answered by looking at this network. Why can’t you shift traffic incrementally by prepending onto the AS Path? Because the connectivity close to the edge is probably not meshy enough. You can’t shift over just the traffic from one AS or another; you can only shift traffic from the entire set of autonomous systems behind your upstream from one inbound link to another. You can adjust traffic on a per-prefix basis, however, which can be useful for balancing between two inbound links.

What can you do to control inbound traffic with more certainty? Take a look at this older post for thoughts on using communities and de-aggregation to steer traffic.

The Effectiveness of AS Path Prepending (1)

Russ — Mon, 10 May 2021 17:00:53 +0000

Just about everyone prepends AS’ to shift inbound traffic from one provider to another—but does this really work? First, a short review on prepending, and then a look at some recent research in this area.

What is prepending meant to do?

Looking at this network diagram, the idea is for AS6500 (each router is in its own AS) to steer traffic through AS65001, rather than AS65002, for 100::/64. The most common method to trying to accomplish this is AS65000 can prepend its own AS number on the AS Path Multiple times. Increasing the length of the AS Path will, in theory, cause a route to be less preferred.

In this case, suppose AS65000 prepends its own AS number on the AS Path once before advertising the route towards AS65001, and not towards AS65002. Assuming there is no link between AS65001 and AS65002, what would we expect to happen? What we would expect is AS65001 will receive one route towards 100::/64 with an AS Path of 2 and use this route. AS65002 will, likewise, receive one route towards 100::/64 with an AS Path of 1 and use this route.

AS65003, however, will receive two routes towards 100::/64, one with an AS Path of 3 through AS65001, and one with an AS Path of 2 through AS65002. All other things being equal (local preference, etc.), AS65003 will prefer the route with the shorter AS Path through AS65002, and select that path to reach 100::/64. AS65004 will only receive one path towards 100::/64, the one through AS65002, because AS65003 will only advertise its best path to AS65004.

The obvious question—how much good does this really do? The only impact on the best path is two hops away, as AS65003, and beyond. The route chosen by AS65001 and AS65002 will not be affected by the prepending.

A recent paper found—

We observe that the effectiveness of prepending can strongly depend on the location (for around 20% of cases, ASPP has moved no targets, while for another 20% , it moved almost all targets).

You might expect As Path prepending to have a much more consistent effect on inbound traffic. Why doesn’t it?

What might not be obvious (the danger of simplified diagrams): if autonomous systems directly attached to AS65001 originate most of the traffic destined to 100::/64, no amount of prepending is going to make any difference in the inbound traffic flow. Assume AS5001 has a connection to some cloud service, AS65002 does not have a connection to the same cloud service, and 100::64 is a local server that communicates with this cloud service on a regular basis. Since AS65001 is the only AS transiting traffic from the cloud service to the server located on the 100::/64 subnet, and AS65001 only has one route to 100::/64, you are not going to be able to shift traffic off that single path no matter how many times you prepend.

The first rule of prepending is location matters. You have to know where the traffic you want to shift is originating, and whether or not it can be shifted.

In my next post on this topic, I’ll continue exploring AS path prepending more in light of the results of the research paper above.

Ambiguity and complexity: once more into the breach

Russ — Mon, 03 May 2021 17:00:55 +0000

Recent research into the text of RFCs versus the security of the protocols described came to this conclusion—

While not conclusive, this suggests that there may be some correlation between the level of ambiguity in RFCs and subsequent implementation security flaws.

This should come as no surprise to network engineers—after all, complexity is the enemy of security. Beyond the novel ways the authors use to understand the shape of the world of RFCs (you should really read the paper; it’s really interesting), this desire to increase security by decreasing the ambiguity of specifications is fascinating. We often think that writing better specifications requires having better requirements, but down this path only lies despair.

Better requirements are the one thing a network engineer can never really hope for.

It’s not just that networks are often used as a sort of “complexity sink,” the place where every hard problem goes to be solved. It’s also the uncertainty of the environment in which the network must operate. What new application will be stuffed on top of the network this week? Will anyone tell the network folks about this new application, or just open a ticket when it doesn’t work right? What about all the changes developers are making to applications right now, and their impact on the network? There are link failures, software failures, hardware failures, and the mean time between mistakes. There is the pace of innovation (which I tend to think is a bit overblown—rule11, after all—we are often talking about new products rather than new ideas).

What the network is supposed to do—just provide IP transport between two devices—turns out to be hard. It’s hard because “just transporting packets” isn’t ever enough. These packets must be delivered consistently (jitter and drops) across an ever-changing landscape.

To this end—

[C]omplexity is most succinctly discussed in terms of functionality and its robustness. Specifically, we argue that complexity in highly organized systems arises primarily from design strategies intended to create robustness to uncertainty in their environments and component parts.

Uncertainty is the key word here. What can we do about all of this?

We can reduce uncertainty. There are three ways to reduce uncertainty. First, you can obfuscate it—this is harmful. Second, you can reduce the scope of the job at hand, throwing some of the uncertainty (and therefore complexity) over the cubicle way. This can be useful in some situations, but remember that the less work you’re doing, the less value you add. Beware of self-commodifying.

Finally, you can manage the uncertainty. This generally means using modularization intelligently to partition off problems into smaller sets. It’s easier to solve a set of well-scope problems with little uncertainty than to solve one big problem with unknowable uncertainty.

This might all sound great in theory, but how do we do this in real life? Where does the rubber hit the road? This is what Ethan and I tried to show in Problems and Solutions—how to understand the problems that need to be solved, and then how to solve each of those problems within a larger system. This is also what many parts of The Art of Network Architecture are about, and then again what Jeff and I wrote about in Navigating Network Complexity.

I know it often seems like it’s not worth learning the theory; it’s so much easier to focus on the day-to-day, the configuration of this device, or the shiny thing that vendor just created. It’s easier to assume that if I can just hide all the complexity behind intent or automation, I can get my weekends back.

The truth is that we’re paid to solve hard problems, and solving hard problems involves complexity. We can either try to cover that up, or we can learn to manage it.

Complexity Reduction?

Russ — Mon, 19 Apr 2021 17:00:31 +0000

Back in January, I ran into an interesting article called The many lies about reducing complexity:

Reducing complexity sells. Especially managers in IT are sensitive to it as complexity generally is their biggest headache. Hence, in IT, people are in a perennial fight to make the complexity bearable.

Gerben then discusses two ways we often try to reduce complexity. First, we try to simply reduce the number of applications we’re using. We see this all the time in the networking world—if we could only get to a single pane of glass, or reduce the number of management packages we use, or reduce the number of control planes (generally to one), or reduce the number of transport protocols … but reducing the number of protocols doesn’t necessarily reduce complexity. Instead, we can just end up with one very complex protocol. Would it really be simpler to push DNS and HTTP functionality into BGP so we can use a single protocol to do everything?

Second, we try to reduce complexity by hiding it. While this is sometimes effective, it can also lead to unacceptable tradeoffs in performance (we run into the state, optimization, surfaces triad here). It can also make the system more complex if we need to go back and leak information to regain optimal behavior. Think of the OSPF type 4, which just reinjects information lost in building an area summary, or even the complexity involved in the type7 to type 5 process required to create not-so-stubby areas.

It would seem, then, that you really can’t get rid of complexity. You can move it around, and sometimes you can effectively hide it, but you cannot get rid of it.

This is, to some extent, true. Complexity is a reaction to difficult environments, and networks are difficult environments.

Even so, there are ways to actually reduce complexity. The solution is not just hiding information because it’s messy, or munging things together because it requires fewer applications or protocols. You cannot eliminate complexity, but if you think about how information flows through a system you might be able to reduce the amount of complexity, and even create boundaries where state (hence complexity) can be more effectively hidden.

As an instance, I have argued elsewhere that building a DC fabric with distinct overlay and underlay protocols can actually create a simpler overall design than using a single protocol. Another instance might be to really think about where route aggregation takes place—is it really needed at all? Why? Is this the right place to aggregate routes? Is there any way I can change the network design to reduce state leaking through the abstraction?

The problem is there are no clear-cut rules for thinking about complexity in this way. There’s no rule of thumb, there’s no best practices. You just have to think through each individual situation and consider how, where, and why state flows, and then think through the state/optimization/surface tradeoffs for each possible way of reducing the complexity of the system. You have to take into account that local reductions in complexity can cause the overall system to be much more complex, as well, and eventually make the system brittle.

There’s no “pat” way to reduce complexity—that there is, is perhaps one of the biggest lies about complexity in the networking world.

Loose Lips

Russ — Mon, 12 Apr 2021 17:00:57 +0000

When I was in the military we were constantly drilled about the problem of Essential Elements of Friendly Information, or EEFIs. What are EEFis? If an adversary can cast a wide net of surveillance, they can often find multiple clues about what you are planning to do, or who is making which decisions. For instance, if several people married to military members all make plans to be without their spouses for a long period of time, the adversary can be certain a unit is about to be deployed. If the unit of each member can be determined, then the strength, positioning, and other facts about what action you are taking can be guessed.

Given enough broad information, an adversary can often guess at details that you really do not want them to know.

What brings all of this to mind is a recent article in Dark Reading about how attackers take advantage of publicly available information to form Spear Phishing attacks—

Most security leaders are acutely aware of the threat phishing scams pose to enterprise security. What garners less attention is the vast amount of publicly available information about organizations and their employees that enables these attacks.

Going back further in time, during World War II, we have—

What does all of this mean for the average network engineer concerned about security? Probably nothing different than being just slightly paranoid about your personal security in the first place (way too much modern security is driven by an anti-paranoid mindset, a topic for a future post). Things like—

Don’t let people know, either through your job description or anything else, that you hold the master passwords for your company, or that your account holds administrator rights.
Don’t always go to the same watering holes, and don’t talk about work while there to people you’ve just met, or even people you see there all the time.
Don’t talk about when and where you’re going on vacation. You can talk about it, and share pictures, once you’re back.

If an attacker knows you are going to be on vacation, it’s a lot easier to create a fake “emergency,” tempting you to give out information about accounts, people, and passwords you shouldn’t. Phishing is primarily a matter of social engineering rather than technical acumen. Countering social engineering is also a social skill, rather than a technical one. You can start by learning to just say less about what you are doing, when you are doing it, and who holds the keys to the kingdom.

Time and Mind Savers: RSS Feeds

Russ — Mon, 05 Apr 2021 17:00:04 +0000

I began writing this post just to remind readers this blog does have a number of RSS feeds—but then I thought … well, I probably need to explain why that piece of information is important.

The amount of writing, video, and audio being thrown at the average person today is astounding—so much so that, according to a lot of research, most people in the digital world have resorted to relying on social media as their primary source of news. Why do most people get their news from social media? I’m pretty convinced this is largely a matter of “it saves time.” The resulting feed might not be “perfect,” but it’s “close enough,” and no-one wants to spend time seeking out a wide variety of news sources so they will be better informed.

The problem, in this case, is that “close enough” is really a bad idea. We all tend to live in information bubbles of one form or another (although I’m fully convinced it’s much easier to live in a liberal/progressive bubble, being completely insulated from any news that doesn’t support your worldview, than it is to live in a conservative/traditional one). If you think about the role of social media and the news feed on social media services, this makes some kind of sense. The social media service tries to guess at what will keep you interested (engaged, and therefore coming back to the service), but at the same time each social media service also has a worldview they want to promote. The service largely attempts to both cater to what keeps you there and to pull you towards what the service, itself, believes.

The solution is stop getting your news from social media. period, full stop, end of sentence (although I’ve seen a recent paper indicating people find periods and other punctuation marks offensive in some way—when you find a period offensive, maybe it’s time to grow a little thicker skin).

So how should you get information instead? There are a lot of ways, from email based newsletters to watching television (please don’t, television turns everything into entertainment, including things that are not meant to entertain). My suggestion is, however, is through RSS feeds. Grab an account on Feedly or some other service, find the RSS feeds for the sites you find informative, and subscribe to their feeds. Some services have a learning mechanism that tries to accomplish the same thing as social media feeds—building intelligent filters to emphasize things you find important. I don’t tend to use these things; I have learned to just glance at the headline and first paragraph and make a quick decision about whether I think the post is worth reading.

Following RSS feeds can help you stop binging, jumping from place to place on a single site—essentially wasting time. It works against the mechanisms designers use to “increase engagement,” which often just means to consume more of your attention and time than you intended to give away. Following RSS feeds can also help you gain a broader view of the world if you intentionally subscribe to feeds from sites and people you don’t always agree with. It’s healthy to regularly read “the other side.” Following strong, well-written arguments from “the other side” will do much more for your mind than seeing just the facile, emotionally charged, straw-man arguments often presented (and allowed through the filters) on social media.

Further, services like feedly also allow you to follow lots of other things, including twitter accounts, youtube channels, and podcasts. I follow almost all podcasts through feedly, downloading the individual episodes I want to listen to, storing them in a cloud directory, and then deleting the files when I’m done. This gives me one list of things to listen to, rather than a huge playlist full of seemingly never-ending content.

All this said, this blog has a lot of different RSS feeds available. I don’t have a complete list, but these are a good place to start—

The main feed (every post other than worth reading): https://rule11.tech/feed/
Longer written pieces (no podcast, worth reading, posts on other sites, weekend reads, etc.): https://rule11.tech/category/content-type/written/feed/
The Hedge: https://rule11.tech/category/hedge/feed/
The History of Networking: https://rule11.tech/category/hon/feed/

I keep these very same links on a page of RSS feeds you can find under the about menu. If you’re interested in the RSS feeds I follow, please reach out to me directly, as feedly no longer has any way to share your feeds other than pushing an OPML file (at least not that I can find).

The Insecurity of Ambiguous Standards

Russ — Mon, 29 Mar 2021 17:00:49 +0000

Why are networks so insecure?

One reason is we don’t take network security seriously. We just don’t think of the network as a serious target of attack. Or we think of security as a problem “over there,” something that exists in the application realm, that needs to be solved by application developers. Or we think the consequences of a network security breach as “well, they can DDoS us, and then we can figure out how to move load around, so if we build with resilience (enough redundancy) we’re already taking care of our security issues.” Or we put our trust in the firewall, which sits there like some magic box solving all our problems.

The problem is–none of this is true. In any system where overall security is important, defense-in-depth is the key to building a secure system. No single part of the system bears the “primary responsibility” for “security.” The network is certainly a part of any defense-in-depth scheme that is going to work.

Which means network protocols need to be secure, at least in some sense, as well. I don’t mean “secure” in the sense of privacy—routes are not (generally) personally identifiable information (there are always exceptions, however). But rather “secure” in the sense that they cannot be easily attacked. On-the-wire encryption should prevent anyone from reading the contents of the packet or stream all the time. Network devices like routers and switches should be difficult to break in too, which means the protocols they run must be “secure” in the fuzzing sense—there should be no unexpected outputs because you’ve received an unexpected input.

I definitely do not mean path security of any sort. Making certain a packet (or update or whatever else) has followed a specified path is a chimera in packet switched networks. It’s like trying to nail your choice of multicolored gelatin desert to the wall. Packet switched networks are designed to adapt to changes in the network by rerouting traffic. Get over it.

So why are protocols and network devices so insecure? I recently ran into an interesting piece of research that provides some of the answer. To wit—

Our research saw that ambiguous keywords SHOULD and MAY had the second highest number of occurrences across all RFCs. We’ve also seen that their intended meaning is only to be interpreted as such when written in uppercase (whereas often they are written in lowercase). In addition, around 40% of RFCs made no use of uppercase requirements level keywords. These observations point to inconsistency in use of these keywords, and possibly misunderstanding about their importance in a security context. We saw that RFCs relating to Session Initiation Protocol (SIP) made most use of ambiguous keywords, and had the most number of implementation flaws as seen across SIP-based CVEs. While not conclusive, this suggests that there may be some correlation between the level of ambiguity in RFCs and subsequent implementation security flaws.

In other words, ambiguous language leads to ambiguous implementations which leads to security flaws in protocols.

The solution for this situation might be just this—specify protocols more rigorously. But simple solutions rarely admit reality within their scope. It’s easy to build more precise specifications—so why aren’t our specifications more precise?

In a word: politics.

For every RFC I’ve been involved in drafting, reviewing, or otherwise getting through the IETF, there are two reasons for each MAY or SHOULD therein. The first is someone has thought of a use-case where an implementor or operator might want to do something that would be otherwise not allowed by MUST. In these cases, everyone looks at the proposed MAY or SHOULD, thinks about how not doing it might be useful, and then thinks … “this isn’t so bad, the available functionality is a good thing, and there’s no real problem I can see with making this a MAY or SHOULD.” In other words, we can think of possible worlds where someone might want to do something, so we allow them to do it. Call this the “freedom principle.”

The second reason is that multiple vendors have multiple customers who want to do things different ways. When the two vendors clash in the realm of standards, the result is often a set of interlocking MAYs and SHOULDs that allow two implementors to build solutions that are interoperable in the main, but not along the edges, that satisfy both of their existing customer’s requirements. Call this the “big check principle.”

The problem with these situations is—the specification has an undetermined set of MAYs and SHOULDs that might interlock in unforeseen ways, resulting in unanticipated variances in implementations that ultimately show up as security holes.

Okay—now that I’ve described the problem, what can you do about it? One thing is to simplify. Stop putting everything into a small set of protocols. The more functionality you pour into a protocol or system, the harder it is to secure. Complexity is the enemy of security (and privacy!).

As for the political problems, these are human-scale, which means they are larger than any network you can ever build—but I’ll ponder this more and get back to you if I come up with any answers.

It is Always Something (RFC1925, Rule 7)

Russ — Tue, 23 Mar 2021 17:00:17 +0000

While those working in the network engineering world are quite familiar with the expression “it is always something!,” defining this (often exasperated) declaration is a little trickier. The wise folks in the IETF, however, have provided a definition in RFC1925. Rule 7, “it is always something,” is quickly followed with a corollary, rule 7a, which says: “Good, Fast, Cheap: Pick any two (you can’t have all three).”

You can either quickly build a network which works well and is therefore expensive, or take your time and build a network that is cheap and still does not work well, or… Well, you get the idea. There are many other instances of these sorts of three-way tradeoffs in the real world, such as the (in)famous CAP theorem, which states a database can be consistent, available, and partitionable (or partitioned). Eventual consistency, and problems from microloops to surprise package deliveries (when you thought you ordered one thing, but another was placed in your cart because of a database inconsistency) have resulted. Another form of this three-way tradeoff is the much less famous, but equally true, state, optimization, surface tradeoff trio in network design.

It is possible, however, to build a system which fails at all three measures—a system which is expensive, takes a long time to build, and does not perform well. The fine folks at the IETF have provided examples of such systems.

For instance, RFC1149 describes a system of transporting IPv4 packets over avian carriers, or pigeons. This is particularly useful in areas where electricity and network cabling are not commonly found. To quote the relevant part of the RFC:

The IP datagram is printed, on a small scroll of paper, in hexadecimal, with each octet separated by whitestuff and blackstuff. The scroll of paper is wrapped around one leg of the avian carrier. A band of duct tape is used to secure the datagram’s edges. The bandwidth is limited to the leg length.

The specification of duct tape is quite odd, however; perhaps the its water-resistant properties are required for this mode of transport. This mode of transport has been adapted for use with more the more modern (in relative terms only!) IPv6 transport in RFC6214. For situation where quality of service is critical, RFC2549 describes quality of service extensions to the transport mechanism.

To further prove it is possible to build a network which is slow, expensive, and does not work well, RFC4824 describes the transmission of packets through semaphore flag signaling, or SFSS. Commonly used to signal between two ships when no other form of communication is available, SFSS consists of a sender who positions (waves) flags of particular shape and color in specific positions to signal individual characters. Rather than transmitting letters or other characters, RFC4824 describes using these flags to transmit 0’s, 1’s, and the framing elements required to transmit an IPv4 datagram over long distances. The advantage of such a system is, much like the avian carrier, that it will operate where there is no electricity. The disadvantage is the cost of encoding and decoding the packet’s contents may be many orders of magnitude more difficult than using existing SFSS specifications to signal messages.

Finally, RFC7511 provides an option which allows the most use of resources on any network while providing the slowest possible network performance by allowing senders to include a scenic route header on IPv6 packets. This header notifies routers and other networking devices that this packet would like to take the scenic route to its destination, which will cause paths including avian carriers (as described in RFC6214) to be chosen over any other available path.

Slow Learning and Range

Russ — Mon, 22 Mar 2021 17:00:15 +0000

Jack of all trades, master of none.

This singular saying—a misquote of Benjamin Franklin (more on this in a moment)—is the defining statement of our time. An alternative form might be the fox knows many small things, but the hedgehog knows one big thing.

The rules for success in the modern marketplace, particularly in the technical world, are simple: start early, focus on a single thing, and practice hard.

But when I look around, I find these rules rarely define actual success. Consider my life. I started out with three different interests, starting jazz piano lessons when I was twelve, continuing music through high school, college, and for many years after. At the same time, I was learning electronics—just about everyone in my family is in electronic engineering (or computers, when those came along) in one way or another.

I worked as on airfield electronics for a few years in the US Air Force (one of the reasons I tend to be calm is I’ve faced death up close and personal multiple times, an experience that tends to center your mind), including RADAR, radio, and instrument landing systems. Besides these two, I was highly interested in art and illustration, getting to the point of majoring in art in college for a short time, and making a living doing commercial illustration for a time.

You might notice that none of this really has a lot to do with computer networking. That’s the point.

I once thought I was a bit of an anomaly in this—in fact, I’m a bit of an anomaly throughout my life, including coming rather late to deep philosophy and theology (perhaps a bit too late!).

After reading Range by David Epstein, it turns out I’m wrong. I’m not the exception, I’m the rule. My case is so common as to be almost trivial.

Epstein not only destroys the common view—start early, stay focused, and practice hard—with reasoning, he also gives so many examples of people who have succeeded because they “wandered around” for many years before settling into a single “thing”—and sometimes just never “settling” throughout their entire lives. People who experience many different things, experimenting with ideas, careers, and paths, have what Epstein calls range.

He gives several reasons for people with range succeeding. They learn how to fail fast, unlike those who are focused on succeeding at a single thing—he calls this “too much grit.” They also learn to think outside the box—they are not restricted by the “accepted norms” within any field of study. It also turns out that slower learning is much more effective, as shown by multiple experiments.

There are three warnings about becoming a person with range, however—the fox rather than the hedgehog, so-to-speak. First, it takes a long time. Slow learning is, after all slow. Second, range works best in a world full of specialist—like the world we live in right now. In a world full of generalists, specialists are likely to succeed more often than generalist. What is different stands out (both in bad and good ways, by the way). Third, people with range do better with wicked problems—problems that are not easily solved with repetition and linear thought.

Of course, computer networks are clearly wicked problems.

That original quote that bothers me so much? Franklin did not say: jack of all trades, master of non. Instead, he said: jack of all trades, master of one. What a difference a single letter makes.

Complexity Bites Back

Russ — Mon, 15 Mar 2021 17:00:47 +0000

What percentage of business-impacting application outages are caused by networks? According to a recent survey by the Uptime Institute, about 30% of the 300 operators they surveyed, 29% have experienced network related outages in the last three years—the highest percentage of causes for IT failures across the period.

A secondary question on the survey attempted to “dig a little deeper” to understand the reasons for network failure; the chart below shows the result.

We can be almost certain the third-party failures, if the providers were queried, would break down along the same lines. Is there a pattern among the reasons for failure?

Configuration change—while this could be somewhat managed through automation, these kinds of failures are more generally the result of complexity. Firmware and software failures? The more complex the pieces of software, the more likely it is to have mission-impacting errors of some kind—so again, complexity related. Corrupted policies and routing tables are also complexity related. The only item among the top preventable causes that does not seem, at first, to relate directly to complexity is network overload and/or congestion problems. Many of these cases, however, might also be complexity related.

The Uptime Institute draws this same lesson, though through a slightly different process, saying: “Networks are complex not only technically, but also operationally.”

For years—decades, even—we have talked about the increasing complexity of networks, but we have done little about it. Yes, we have automated all the things, but automation can only carry us so far in covering complexity up. Automation also adds a large dop of complexity on top of the existing network—sometimes (not always, of course!) automating a complex system without making substantial efforts at simplification is just like trying to put a fire out with a can of gas (or, in one instance I actually saw, trying to put out an electrical fire with a can of soda, with the predictable trip to the local hospital.

We are (finally) starting to be “bit hard” by complexity problems in our networks—and I suspect this is the leading edge of the problem, rather than the trailing edge.

Maybe it’s time to realize making every protocol serve every purpose in the network wasn’t a good idea—we now have protocols that are so complex that they can only be correctly configured by machines, and then only when you narrow the use case enough to make the design parameters intelligible.

Maybe it’s time to realize optimizing for every edge use case wasn’t a good idea. Sometimes it’s just better to throw resources at the problem, rather than throwing state at the control plane to squeeze out just one more ounce of optimization.

Maybe it’s time to stop building networks around “whatever the application developer can dream up.” To start working as a team with the application developers to build a complete system that puts complexity where it most makes sense, and divides complexity from complexity, rather than just assuming “the network can do that.”

Maybe it’s time to stop thinking we can automate our way out of this.

Maybe it’s time to lay our superhero capes down and just start building simpler systems.

You Can Always Add Another Layer of Indirection (RFC1925, Rule 6a)

Russ — Tue, 09 Mar 2021 18:00:15 +0000

Many within the network engineering community have heard of the OSI seven-layer model, and some may have heard of the Recursive Internet Architecture (RINA) model. The truth is, however, that while protocol designers may talk about these things and network designers study them, very few networks today are built using any of these models. What is often used instead is what might be called the Infinitely Layered Functional Indirection (ILFI) model of network engineering. In this model, nothing is solved at a particular layer of the network if it can be moved to another layer, whether successfully or not.

For instance, Ethernet is the physical and data link layer of choice over almost all types of physical medium, including optical and copper. No new type of physical transport layer (other than wireless) can succeed unless if can be described as “Ethernet” in some regard or another, much like almost no new networking software can success unless it has a Command Line Interface (CLI) similar to the one a particular vendor developed some twenty years ago. It’s not that these things are necessarily better, but they are well-known.

Ethernet, however, goes far beyond providing physical layer connectivity. Because many applications rely on using Ethernet semantics directly, many networks are built with some physical form of Ethernet (or something claiming to be like Ethernet), with IP on top of this. On top of the IP, there is some other transport protocol, such as VXLAN, UDP, GRE, or perhaps even MPLS over UDP. On top of these layers rides … Ethernet. On which IP runs. On which TCP or UDP, or potentially VXLAN runs. It turns out it is easier to add another layer of indirection to solve many of the problems caused by applications that expect Ethernet than it is to solve them with IP—or any other transport protocol. You’ve heard of turtles all the way down—today we have Ethernet all the way down.

Many other suggestions of this type have been made in network engineering and protocol design across the years, but none of them seem to have been as widely deployed as Ethernet over IP over Ethernet. For instance, RFC3252 notes it has always been difficult to understand the contents of Ethernet, IP, and other packets as they are transmitted from host to host. The eXtensible Markup Language (XML) is, on the other hand, designed to be both machine- and human-readable. A logical solution to the problem of unreadable packets, then, is to add another layer of indirection by formatting all packets, including Ethernet and IP, into XML. Once this is done, there would be no need for expensive or complex protocol analyzers, as anyone could simply capture packets off the wire and read them directly. Adding another layer, in this case, could save many hours of troubleshooting time, and generally reduce the cost of operating a network significantly.

Once the idea of adding another layer has been fully grasped, the range of problems which can be solved becomes almost limitless. Many companies struggle to find some way to provide secure remote access to their employees, contractors, and even customers. The systems designed to solve this problem are often complex, difficult to deploy, and almost impossible to troubleshoot. RFC5514, however, provides an alternate solution: simply layer an IPv6 transport stream on top of the social media networks everyone already uses. Everyone, after all, already has at least one social media account, and can already reach that social media account using at least one device. Creating an IPv6 stream across social media would provide universal cloud-based access to anyone who desires.

Such streams could even be encrypted to ensure the operators and users of the underlying social media network cannot see any private information transmitted across the IPv6 channel created in this way.

On Using the Right Word

Russ — Mon, 08 Mar 2021 18:00:36 +0000

A while back, I was sitting in a meeting where the presenter described switching from a “traditional, hierarchical data center fabric” to a spine-and-leaf (while drawing CLOS, in all capital letters, on the whiteboard). He pointed out that the spine-and-leaf design is simpler because it only has two tiers rather than three.

There is so much wrong with this I almost winced in physical pain. Traditional hierarchical designs are not fabrics. Spine-and-leaf fabrics are not CLOS, but Clos, fabrics. Clos fabrics have three stages, not two—even if we draw them “folded” so you only see two apparent levels to the fabric. In fact, all spine-and-leaf fabrics always have an odd number of stages, and they are stages, not tiers.

More recently, I heard someone talking about an operating system that was built using microservices. I thought—“that would be at neat trick.” To build something with microservices does not just mean a piece of software using modules—this would be modular application (or operating system) design. Microservices architectures break the application up into the most basic components possible and then scale each kind of component out (rather than up) by spinning new copies of each service as needed. I cannot imagine scaling an operating system out by spinning multiple copies of the same service, and then providing some sort way to spread load across the various copies. Would you have some sort of anycast IPC? An internal DNS server or load balancer?

You can have an OS that natively participates in a larger microservices-based architecture, but what would microservices within the operating system look like, precisely?

Maybe my recent studies in philosophy make me much more attuned to the way we use language in the network engineering world—or maybe I’m just getting old. Whatever it is, our determination to make every word mean everything is driving me nuts.

What is the difference between a router and a switch? There used to be a simple definition—routers rewrite the L2 header and switches don’t. But now that routers switch packets, and switches route packets, the only difference seems to be … buffer depth? Feature set? The line between router and switch is fuzzy to the point of being meaningless, leaving us with no real term to describe a real switch any longer (a device that doesn’t do routing).

What about software defined networks? We’ve been treated to software defined everything now, of course. And intent? I get the point of intent, but we’re already moving down the path of making the meaning so broad that it can even contain configuring the CLI on an old AGS+. And don’t get me started on artificial intelligence, which is often learned to describe something closer to machine learning. Of course machine learning is often used to describe things that are really nothing more than statistical inference.

Maybe it’s time for a general rebellion against the sloppy use of language in network engineering. Or maybe I’m just tilting at yet another windmill. Wake me up when we’ve gotten to the point that we can use any word interchangeably with any other word in the network engineering dictionary. I await the AI that routes packets by reading your mind (through intent) called a swouter… or something.

Rethinking BGP on the DC Fabric (part 5)

Russ — Mon, 01 Mar 2021 18:00:48 +0000

BGP is widely used as an IGP in the underlay of modern DC fabrics. This series argues this is not the best long-term solution to the problem of routing in fabrics because BGP is not ideal for this use case. This post will consider the potential harm we are doing to the larger Internet by pressing BGP into a role it was not originally designed to fulfill—an underlay protocol or an IGP.

My last post described the kinds of configuration required to make BGP work on a DC fabric—it turns out that the configuration of each BGP speaker on the fabric is close to unique. It is possible to automate configuring each speaker—but it would be better if we could get closer to autonomic operation.

To move BGP closer to autonomic operation in a DC fabric, there are several things we can do. First, we can allow a BGP speaker to peer with any other BGP speaker it receives an open message from—this is often called promiscuous mode. While each router in the fabric will still need to be configured with the right autonomous system, at least we won’t need to configure the correct peers on each router (including the remote AS).

Note, however, that using this kind of promiscuous peering does come with a set of tradeoffs (if you’re reading this blog, you know there will be tradeoffs). BGP speakers running in promiscuous mode open a large attack surface on the control plane of the network. We can close this attack surface by configuring authentication on all BGP speakers … but we are now adding complexity to reduce complexity. We could also reduce the scope of the attack surface by never permitting BGP to peer beyond a single hop, and then filtering all BGP packets at the fabric edge. Again, just a bit more complexity to manage—but remember that the road to highly fragile and complex systems is always paved with individual steps that never, on their own, seem to add “too much complexity.”

The second thing we can do to move BGP closer to autonomic operation is to advertise routes to every connected peer without any policy configured. This does, again, introduce some tradeoffs, particularly in the realm of security, but let’s leave that aside for the moment.

Assume we can create a version of BGP that has these modifications—it always accepts any peer from any other AS, and it advertises all routes without any policy configured. Put these features behind a single knob which also includes setting the MRAI to 0 or 1, tightens up the dampening parameters, and adjusts a few other things to make BGP work better in a DC fabric.

As an experiment, let’s enable this DC fabric knob on a BGP speaker at the edge of a dual-homed “enterprise customer.” What will happen?

The enterprise network will automatically peer to any speaker that sends an open message—a huge security hole on the open Internet—and it will advertise every route it learns even though there is no policy configured. This second issue—advertising routes with no policy configured—can cause the enterprise network to become a transit between two much larger provider networks, crashing out some small corner of the Internet.

This might seem like a trivial issue. After all, just don’t ever enable the DC fabric knob on an eBGP peering session upstream into the DFZ, or any other “real” internetwork. Sure, and just don’t ever hit the brakes when you mean to hit the accelerator, or the accelerator when you mean to hit the brakes. If I had a dime for every time we “just don’t ever make that mistake …” Well, I wouldn’t be blogging, I’d be relaxing in the sun someplace (okay, I’m not likely to ever stop working to sit around and “relax” all the time, but you get the picture anyway).

Maybe—just maybe—it would really be better overall to use two different protocols for IGP and EGP work. Maybe—just maybe—it’s better not to mix these two different kinds of functions in a single protocol. Not only is the single resulting protocol bound to be really complex (most BGP implementations are now over 100,000 lines of code, after all), but it will end up being really easy to make really bad mistakes.

No tool is omnicompetent. If you found a tool that was, in fact, omnicompetent, it would also be the most dangerous tool in your toolbox.

Technologies that Didn’t: Directory Services

Russ — Tue, 23 Feb 2021 18:00:39 +0000

One of the most important features of the Network Operating Systems, like Banyan Vines and Novell Netware, available in the middle of the 1980’s was their integrated directory system. These directory systems allowed for the automatic discovery of many different kinds of devices attached to a network, such as printers, servers, and computers. Printers, of course, were the important item in this list, because printers have always been the bane of the network administrator’s existence. An example of one such system, an early version of Active Directory, is shown in the illustration below.

Users, devices and resources, such as file mounts, were stored in a tree. The root of the tree was (generally) the organization. There were Organizational Units (OUs) under this root. Users and devices could belong to an OU, and be given access to devices and services in other OUs through a fairly simple drag and drop, or GUI based checkbox style interface. These systems were highly developed, making it fairly easy to find any sort of resource, including email addresses of other uses in the organization, services such as shared filers, and—yes—even printers.

The original system of this kind was Banyan’s Streetalk, which did not have the depth or expressiveness of later systems, like the one shown above from Windows NT, or Novell’s Directory Services. A similar system existed in another network operating system called LANtastic, which was never really widely deployed (although I worked on a LANtastic system in the late 1980’s).

The usual “pitch” for deploying these systems was the ease of access control they brought into the organization from the administration side, along with the ease of finding resources from the user’s perspective. Suppose you were sitting at your desk, and needed to know who over in some other department, say accounting, you could contact about some sort of problem, or idea. If you had one of these directory services up and running, the solution was simple: open the directory, look for the accounting OU within the tree, and look for a familiar name. Once you have found them, you could send them an email, find their phone number, or even—if you had permission—print a document at a printer near their desk for them to pick up. Better than a FAX machine, right?

What if you had multiple organizations who needed to work together? Or you really wanted a standard way to build these kinds of directories, rather than being required to run one of the network operating systems that could support such a system? There were two industry wide standards designed to address these kinds of problems: LDAP and X.500.

The OUs, CNs, and other elements shown in the illustration above are actually an expression of the X.500 directory system. As X.500 was standardized starting in the mid-1990’s, these network operating systems changed their native directory systems to match the X.500 schema. The ultimate goal was to make these various directory services interoperate through X.500 connectors.

Given all this background, what happened to these systems? Why are these kinds of directories widely available today? While there are many reasons, two of these stand out.

First, these systems are complex and heavy. Their complexity made them very hard to code and maintain; I can well remember working on a large Netware Directory Service deployment where objects fell into the wrong place on a regular basis, drive mapping did not work correctly, and objects had to be deleted and recreated to force their permissions to reset.

Large, complex systems tend to be unstable in unpredictable ways. One lesson the information technology world has not learned across the years is that abstraction is not enough; the underlying systems themselves must be simplified in a way that makes the abstraction more closely resemble the underlying reality. Abstraction can cover problems up as easily as it can solve problems.

Second, these systems fit better in a world of proprietary protocols and network operating systems than into a world of open protocols. The complexity driven into the network by trying to route IP, Novell’s IPX, Banyan’s VIP, DECnet, Microsoft’s protocols, Apple’s protocols, etc., made building and managing networks ever more complex. Again, while the interfaces were pretty abstractions, the underlying network was also reminiscent of a large bowl of spaghetti. There were even attempts to build IPX/VIP/IP packet translators so a host running Vines’ could communicate with devices on the then nascent global Internet.

Over time, the simplicity of IP, combined with the complexity and expense of these kinds of systems drove them from the scene. Some remnants live on in the directory structure contained in email and office software packages, but they are a shadow of Streettalk, NDS, and the Microsoft equivalent. The more direct descendants of these systems are single sign-on and OAUTH systems that allow you to use a single identity to log into multiple places.

But the primary function of finding things, rather than authenticating them, has long been left behind. Today, if you want to know someone’s email address, you look them up on your favorite social medial network. Or you don’t bother with email at all.