On Writing Complexity

I’ve been on a bit of a writer’s break after finishing the CCST book, but it’s time to rekindle my “thousand words a day” habit. As always, one part of this is thinking about how I write—is there anything I need to change? Tools, perhaps, or style?

What about the grade level complexity of my writing? I’ve never really paid attention to this, but I’m working on contributing to a site regularly that does. So maybe I should.

I tend to write to the tenth or eleventh-grade level, even when writing “popular material,” like blog posts. The recommended level is around the eighth-grade level. Is this something I need to change?

It seems the average person considers anything above the eighth-grade reading level “too hard” to read, so they give up. Every reading level calculation I’ve looked at essentially uses word and sentence length as proxies for complexity. Long words and sentences intimidate people.

On the other hand, measuring the reading grade level can seem futile. There are plenty of complex concepts described by one- and two-syllable words. Short sentences can still have lots of meaning.

Further, the reading grade level does not tell you if the sentence makes sense. A famous politician recently said, “… it’s time for us to do what we have been doing, and that time is every day.” The reading grade level of this sentence is in the sixth grade—but saying nothing is still saying nothing, even if you say it at a sixth-grade level.

While reading level complexity might be important, it is more important to say something.

Sometimes, using long words and sentences stops people from paying attention to your words. However, replacing long words and sentences with shorter ones sometimes removes your words’ real meaning (or at least flavor). I am not, at this point, certain how to balance these. I suspect I will have to consider the tradeoff in every situation.

When you write—and if you are doing your job as a network engineer well, you do write—you might want to consider the complexity of your writing. I will use the grade level as “another tool” in my set, which means I’ll be thinking about writing complexity more—but I’m not going to allow it to drive my writing style. If I can reduce the complexity of my writing without losing meaning, I may … sometimes … or I might not. 😊

Looking at the other side of the coin—what about reading grade level from a reader’s point of view? Should we only read easy-to-read things? The answer should be obvious: no.

There is a bit of a feeling that text above a certain reading level is “sheer nonsense.” Again, though, the grade level has nothing to do with the value of the content. Sometimes, saying complex things just requires complex text. Readers (all of us) need to learn to read complex text.

Reading grade level is a good tool in many situations—but it is one tool among many.

Making Networking Cool Again? (2)

Network engineering is not “going away.” Network engineering is not less important than it was yesterday, last year, or even a decade ago.

But there still seems to be a gap somewhere. There are fewer folks interested than we need. We need more folks who want to work as full-time network engineers, and more folks with network engineering skills diffused within the larger IT community. More of both is the right answer if we’re going to continue building large-scale systems that work. The real lack of enthusiasm for learning network engineering is hurting all of IT, not just network engineering.

How do we bridge this gap? We’re engineers. We solve problems. This seems to be a problem we might be able to solve (unlike human nature). Let’s try to solve it.

As you might have guessed, I have some ideas. These are not the only ideas in the world—feel free to think up more!

If you walk into a robotics class, even an introductory robotics class, you see people … building robots. If you walk into a coding class, even an introductory one, you see people … writing software. If you walk into a network network engineering class you see … someone lecturing about the OSI model, packet formats, or how to configure BGP.

What problems are people learning to solve robotic engineering? How to build a robot and get it to do something to solve a real-world problem. What problems are people learning to code solving? How to tackle some real-world problem.

Sure, the problems being solved at an introductory level might be trivial, like: “Read this file and spit out a sum of the numbers in the fourth column.” But they are still starting, right from the beginning, by taking requirements and converting them into solutions.

What problems are network engineers learning how to solve? How to choose hardware, string it all together, and configure BGP.

Do you see the difference?

All engineers solve problems—it’s the nature of engineering. But are we creating a mindset in prospective network engineers, or even adjacent fields, that we solve real-world problems? Or are we giving them the impression that we solve whiteboard problems by talking about bits, bytes, configurations, and cable types?

Have you ever seen the glazing over of eyes while explaining how you put four transport protocols on top of one another (look at all the pretty tunnels)? How about when you create a chart showing how TCP and QUIC can be “kind-of sort-of” forced into the OSI model? Or when you spin out your BGP packet format charts, showing how we’ve (mis)used address families to carry everything anyone can imagine?

I’ve been teaching this stuff for years (okay, decades). Over time, I’ve moved away from teaching configurations and packet formats. I’ve gone from Advanced IP Network Design to Computer Networking Problems and Solutions. These are very different ways of looking at network engineering.

Focusing on real-world problems would help connect business and other IT folks to the network, connect theory to practice, and people to network engineering. Going home at the end of the day saying, “I solved a problem,” can be satisfying. Going home at the end of the day saying, “I configured BGP?”

Another thing adopting the mindset of solving real-world problems might do is help us lose unnecessary complexity. I know complexity is necessary to build resilient systems; we cannot build what we build without creating and encountering complexity.

But we often run ourselves into the ditch on both sides of the road.

We unintentionally build too complex because we try to make it too simple. Quick, which is simpler: building a data center fabric with one routing protocol or two? A single chassis system or several smaller fixed format devices? A proprietary system or something built on open standards?

How many balloons fit in a bag? (thanks, Don)

Failing to start with the tradeoffs, and thinking through what problem we’re actually trying to solve, leads to unnecessary complexity. Such designs might not immediately fail, but they will fail, and “it’s so complex” just isn’t an excuse.

Don’t even try to tell me there aren’t any tradeoffs. If you think there aren’t any tradeoffs, that just means you haven’t looked hard enough. Go find them, think about them, and document them.

We also build complex things because we think it offers job security, or it’s neat, or we like to feel like the kid who says to the world, “look what I built!”

I know it’s exciting to hear stories about that time someone rescued a network from a major failure—after all, that’s solving a real-world problem. Building a network that just works might be “boring,” but it solves many more real-world problems than raising a network from the dead.

We love our fashionable capes, but … capes can get caught in a nearby jet engine. Lose the cape. In the long run, it’ll make network engineering more attractive as a career field and field of knowledge.

The Bottom Line

No, the sky is not falling. We still need networks, and we still need network engineers.

Yes, there is a problem. Too many companies are going “to the cloud” because they cannot find people qualified to build and maintain their very complex networks. There’s too much centralization and too little oppeness.

So maybe let’s stop saying “we don’t need network engineers.” And maybe let’s really think about how we’re building things. And maybe let’s focus on solving real-world problems, starting from day one in network engineering classrooms.

Network engineering is still cool—let’s go out there believing—and selling—that idea to the world.

Making Networking Cool Again? (1)

Is network engineering still cool?

It certainly doesn’t seem like it, does it? College admissions seem to be down in the network engineering programs I know of, and networking certifications seem to be down, too. Maybe we’ve just passed the top of the curve, and computer networking skills are just going the way of coopering. Let’s see if we can sort out the nature of this malaise and possible solutions. Fair warning—this is going to take more than one post.

Let’s start here: It could be that computer networking is a solved problem, and we just don’t need network engineers any longer.

I’ve certainly heard people say these kinds of things—for instance, one rather well-known network engineer said, just a few years back, that network engineers would no longer be needed in five years. According to this view, the entire network should be like a car. You get in, turn the key, and it “just works.” There shouldn’t be any excitement or concern about a commodity like transporting packets. Another illustration I’ve heard used is “network bandwidth should just be like computer memory—if you need more, add it.”

Does this really hold, though? Even if we accept the car and computer memory illustrations and individual routers like these things, is an entire network system like a car? A closer analogy for a network in the world of cars would be an entire transportation system.

You have different kinds of physical transport (rail, over-the-road trucks, air travel, ships, etc.), each with its characteristics, and all of which must be connected to move physical objects from one place to another. There must be some kind of “control plane” that coordinates, shared addressing, formatting rules, etc.

While a single car might, in some sense, be a commodity at this point (and I’ll bet there aren’t many car owners who would wholly agree with that characterization), I don’t see how we could call an entire transportation system a commodity—especially if we want to say “the skills needed to build a transportation system just aren’t needed any longer, there’s nothing more to learn, this is so … boring …”

Let’s dispense with this idea that networks just aren’t needed any longer. We must still build networks that carry traffic between servers, cities, countries, and continents. Building these networks is still a hard problem. Even if there is less room to improve these things than ten or twenty years ago, the problems are still hard. Even if many problems are solved at a broad level, not every problem is solved in every network in the universe.

A more reasonable take on this perspective is that networking skills are diffusing into a larger information technology (IT) skill set. Perhaps IT, in its relative “youth,” divided too sharply and finely—we created too many career fields. What is happening right now, then, is just a kind of right sizing in the market.

Network engineering skills, in fact, do seem to be dispersing to one degree or another. But let’s put this in perspective.

The first point is I’m not convinced there are fewer network engineers. Instead, it’s more likely there are just as many network engineers as there ever have been, if not more. Perhaps, though, “real” network engineering has been growing linearly while all the other IT fields have been growing at a rate faster than linear (I don’t want to say exponential, just something more than linear).

In a world that counts lack of growth as a failure, networking growing at a slower pace than, say, programming seems like a failure from the outside. People like to follow winners; growing is winning; network engineering is not growing as fast as other things, so network engineering is failing.

I dislike the modern progressive mindset—but while I’m working on something in this area, this isn’t the time or place to dive into this topic. Let’s agree that we must let go of the idea: “Growing slower is a failure.”

Returning to the idea of transportation—I will just about bet automobile designers built entire departments in the early days of car manufacturing. Today, there might be just as many automobile designers as ever. They’re just buried in large car manufacturing, servicing, etc., companies, so it feels like there are a lot fewer than there were.

Just because most new engineers must learn many different things, and network engineering skills are diffusing into many different areas of IT, does not mean network engineering is dying, regardless of what it might look like from the outside.

Second, there is nothing wrong with network engineering skills diffusing into the larger IT skill set. Has anyone reading this ever really been a “pure” network engineer? If so, I don’t know whether to envy or feel sorry for you.

When building networks in the military, I had to deal with all the politics of customer relationships and understanding mission needs. When taking cases in technical support, I had to deal with time management and customer-facing skills—and I needed to learn or use coding skills to be an effective network engineer. Today, I do network engineering, like I always have, but I work on security, privacy, DNS, coding, and all sorts of other things.

I cannot think of a time in my career when I would have considered myself a “pure network engineer.” I’ve always had to find and build adjacent skills to design, build, and maintain networks. I would say this is truer today than ever, but I do not believe my skills as a network engineer are any less useful than they have ever been.

Where does all of this leave us?

Let’s continue the discussion in part 2 next week.

Thoughts on 2023

As we close out 2023, some random observations about engineering, culture, and life.

Network engineering needs help. I am hearing, from all over the place, that network engineering is “not cool.” There is a dearth of students entering the pipeline. College programs are struggling, and many organizations are struggling with a lack of engineering talent—in fact, I would guess the most common reason for companies to move to “the cloud” is because they cannot find anyone who knows how to build an operate a network any longer.

It probably didn’t help that for the last few years many “thought leaders” in the network engineering space have been saying there is no future in network engineering. It also doesn’t help that network engineering training has become stilted and … boring. Coders are off talking about how to solve problems. Robotics folks are working on cool projects that solve problems.

Network engineers are being taught how to spend less money and told to “find another career.”

I don’t know how we think we can sustain a healthy world of IT without network engineers.

And yes, I know there are folks who think networking problems are simple, easy enough to solve with some basic software knowhow. I think I have enough knowledge and experience of the wider world of information technology to say those folks are wrong.

I’d actually like to help solve this specific problem. I’ve been looking for a Christian college someplace in the US interested in starting or growing a strong engineering program. Someplace where I join with a team to help build and teach an entire program from the first class to the last. If anyone knows of such a place, get in touch. We need to make network engineering cool again.

How much did you read this year? I read just over 40 books this year, not many of which were technology related. If you don’t read regularly, why not?

How much did you create this year? I wrote one book—the CCST Official Study Guide. I’ve written two dozen articles or so and created a few new slide decks. I’m working on several new live webinars with Pearson through Safari Books Online, including interview skills, open-source labs, some work around coding skills, and a few other things.

It you aren’t creating new things, why not?

Big is, for the most part, bad. I’ve started thinking that one of the worst things about technology-driven culture is how deeply it has enabled and taught—even encouraged—us to be passive-aggressive.

For instance, I’ve been “lifetime banned” from eBay. Why? I’ve no idea—I barely even use eBay. I logged in, listed a few items for sale, and then couldn’t log back in again. I tried to reset my password—the service accepted my new password, but still refused to allow me to log in. No notifications, no email, no … anything. I called customer support and was told I have been “banned for life.” They will not discuss why, only that some “system flagged my account.”

It is just this kind of “the computer says you are a bad person, and we will not explain why” thing that makes people dislike technology companies so deeply.

As always, feel free to get in touch if you have thoughts, want to chat, or have an idea for an episode of the Hedge.

Simple or Complex?

A few weeks ago, Daniel posted a piece about using different underlay and overlay protocols in a data center fabric. He says:

There is nothing wrong with running BGP in the overlay but I oppose to the argument of it being simpler.

One of the major problems we often face in network engineering—and engineering more broadly—is confusing that which is simple with that which has lower complexity. Simpler things are not always less complex. Let me give you a few examples, all of which are going to be controversial.

When OSPF was first created, it was designed to be a simpler and more efficient form of IS-IS. Instead of using TLVs to encode data, OSPF used fixed-length fields. To process the contents of a TLV, you need to build a case/switch construction where each possible type a separate bit of code. You must count off the correct length for the type of data, or (worse) read a length field and count out where you are in the stream.

Fixed-length fields are just much easier to process. You build a structure matching the layout of the fixed-length fields in memory, then point this structure at the packet contents in-memory. From there, you can just use the structure’s contents to directly access the data.

Over time, however, as new requirements have been pushed into IGPs, OSPF has become much more complex while IS-IS has remained relatively constant (in terms of complexity). IS-IS went through a bit of a mess when transitioning from narrow to wide metrics, but otherwise the IS-IS we use today is the same protocol we used when I first started working on networks (back in the early 1990s).

OSPF’s simplicity, in the end, did not translate into a less complex protocol.

Another example is the way we transport data in BGP. A lot of people do not know that BGP’s original design allowed for carrying information other than straight reachability in the protocol. BGP speakers can negotiate multiple sessions, with each session carrying a different kind of information. Rather than using this mechanism, however, BGP has consistently been extended using address families—because it is simpler to create a new address family than it is to define a new kind of data parallel with address families.

This has resulted in AFs that are all over the place, magic numbers, and all sorts of complexity. The AF solution is simpler, but ultimately more complex.

Returning to Daniel’s example, running a single protocol for underlay and overlay is simpler, while running two different protocols is less simple. However, I’ve observed—many times—that running different protocols for underlay and overlay is less complex.

Why? Daniel mentions a couple of reasons, such as each protocol has a separate purpose, and we’re pushing features into BGP to make it serve the role of an IGP (which is, in the end, going to cause some major outages—if it hasn’t already).

Consider this: is it easier to troubleshoot infrastructure reachability separately from vrf reachability? The answer is obvious—yes! What about security? Is it easier to secure a fabric when the underlay never touches any attached workload? Again—yes!

We get this tradeoff wrong all the time. A lot of times this is because we are afraid of what we do not know. Ten years ago I struggled to convince large operators to run BGP in their networks. Today no-one runs anyone other than BGP—and they all say “but we don’t have anyone who knows OSPF or IS-IS.” I’ve no idea what happened to old-fashioned network engineering. Do people really only have one “protocol slot” in their brains? Can people really only ever learn one protocol?

Or maybe we’ve become so fixated on learning features that we no longer no protocols?

I don’t know the answer to these questions, but I will say this—over the years I’ve learned that simpler is not always less complex.

Route Servers and Loops

From the question pile: Route servers (as opposed to route reflectors) don’t change anything about a BGP route when re-advertising it to a peer, whether iBGP or eBGP. Why don’t route servers cause routing loops (or other problems) in a BGP network?

Route servers are often used by Internet Exchange Points (IXPs) to distribute routes between connected BGP speakers. BGP route servers

  • Don’t change anything about a received BGP route when advertising the route to its peers (other BGP speakers)
  • Don’t install routes received through BGP into the local routing table

Shouldn’t using route servers in a network—pontentially, at least—cause routing loops or other BGP routing issues? Maybe a practical example will help.

Assume b, e, and s are all route servers in their respective networks. Starting at the far left, a receives some route, 101::/64, and sends it on to b,, which then sends the unmodified route to c. When c receives traffic destined to 101::/64, what will happen? Regardless of whether these routers are running iBGP or eBGP, b will not change the next hop, so when c receives the route, a is still the next hop. If there’s no underlying routing protocol, c won’t know how to reach A, so it will ignore the route and drop the traffic. Even if there is an underlying routing protocol, c’s route to 101::/64’s route passes through b, and b isn’t installing any routing information learned from BGP into its local routing table (because it’s a route server). b is going to drop traffic destined to 101::/64.

We can solve this simple problem by adding a new link between the two clients of the route server, as shown in the center diagram. Here, d sends 101::/64 to e, which then sends the unchanged route to g. Since g has a direct connection to d, we can assume g will send traffic destined to 101::/64 directly to d, where it will be forwarded to the destination. Why wouldn’t d and g peer directly instead of counting on e to carry routes between them? In most cases this kind of indirect peering is done to increase network scale. If there are thousand routes like d and g, it will be simpler for them all to peer to e than to build a full mesh of connections.

Why not use a route reflector rather than a route server in this situation? Route reflectors can only be used to carry routes between iBGP peers. If d, e, and g are all in different autonomous systems, route reflectors cannot be used to solve this problem.

But this brings us back to the original question—route reflectors use the cluster list to prevent loops within an AS (the cluster list is similar in form and function to the AS path carried between autonomous systems, but it uses router ID’s rather than AS numbers to describe the path)?

If you have multiple route servers connected to one another you can, in fact, form routing loops.

In this network, a is sending 101::/64 to b, which is then sending the route, unmodified, to e. Because of some local policy, e is choosing the path through a, which means e forwards traffic destined to 101::/64 to c. At the same time, e is advertising 101::/64 to b, which is then sending the route (unmodified) to a, and a is choosing the path through c. In this case, a permanent (persistent) routing loop is formed through the control plane, primarily because no single BGP speaker has a complete view of the topology. The two route servers, by hiding the real path to 101::/64, makes is possible to form a routing loop.

The deploy route servers without forming these kinds of loops—

  • BGP speakers learning routes from route servers should be directly connected—there should not be destinations reachable via some “hidden” intermediate hop
  • Route servers should send all the routes they learn from clients; they should not use bestpath to choose which routes to send to clients

These restrictions prevent routing loops from forming when deploying route servers—but they also restrict the use of route servers to situations like carrying routes between BGP speakers connected to a single fabric.

Cisco filed a patent some time back describing a method to prevent routing loops when using BGP route servers; it makes interesting reading for folks who want to dive a little deeper.

RFC9199: Lessons in Large-scale Service Deployment

While RFC9199 (are we really in the 9000’s?) is targeted at large-scale DNS deployments–specifically root zone operators–so it might seem the average operator won’t find a lot of value here.

This is, however, far from the truth. Every lesson we’ve learned in deploying large-scale DNS root servers applies to any other large-scale user-facing service. Internally deployed DNS recursive servers are an obvious instance, but the lessons here might well apply to a scheduling, banking, or any other multi-user application accessed from a lot of places by a lot of different users. There are some unique points in DNS, such as the relatively slower pace of database synchronization across nodes, but the network-side lessons can still be useful for a lot of applications.

What are those lessons?

First, using anycast dramatically improves performance for these kinds of services. For those who aren’t familiar with the concept, anycase turns an IP address into a service identifier. Any host with a copy (or instance) or a given service advertises the same address, causing the routing table to choose the (topologically) closest instance of the service. If you’re using anycast, traffic destined to your service will automatically be forwarded to the closest server running the application, providing a kind of load sharing among multiple instances through routing. If there are instances in New York, California, France, and Taipei, traffic from users in North Carolina will be routed to New York and traffic from users in Singapore will be routed to Taipei.

You can think of an anycast address something like a cell tower; users within a certain desintance will be “captured” by a particular instance. The more copies of the service you deploy, the smaller the geographic region the service will support. Hence you can control the number of users using a particular copy of the service by controlling the number and location of service copies.

To understand where and how to deploy service instances, create anycast catchment maps. Again, just like a wifi signal coverage map, or a cellphone tower coverage map, it’s important to understand which users will be directed to which instances. Using a catchment map will help you decide where new instances need to be deployed, which instances need the fastest links and hardware, etc. The RIPE ATLAS pobes and looking glass servers are good ways to start building such a map. If the application supports a large number of users, you might be able to convince the application developer to include some sort of geographic information in requests to help build these maps.

Third, when deploying service instance, pay as much attention to routing and connectivity as you do the number of instances deployed. As the authors note, sometimes eight instances will provide the same level of service as several thousand instances. The connectivity available into each instance of the service–bandwidth, delay, availability, etc.–still has a huge impact on service speed.

Fourth, reduce the speed at which the database needs to be synchronized where possible. Not every piece of information needs to be synchronized at the same rate. The less data being synchronized, the more consistent the view from multiple users is going to be.

RFC9199 is well worth reading, even for the average network engineer.