Is it planning… or just plain engineering?

Over at the ECI blog, Jonathan Homa has a nice article about the importance of network planning–

In the classic movie, The Graduate (1967), the protagonist is advised on career choices, “In one word – plastics.” If you were asked by a young person today, graduating with an engineering or similar degree about a career choice in telecommunications, would you think of responding, “network planning”? Well, probably not.

Jonathan describes why this is so–traffic is constantly increasing, and the choice of tools we have to support the traffic loads of today and tomorrow can be classified in two ways: slim and none (as I remember a weather forecaster saying when I “wore a younger man’s shoes”). The problem, however, is not just tools. The network is increasingly seen as a commodity, “pure bandwidth that should be replaceable like memory,” made up of entirely interchangeable parts and pieces, primarily driven by the cost to move a bit across a given distance.

This situation is driving several different reactions in the network engineering world, none of which are really healthy. There is a sense of resignation among people who work on networks. If commodities are driven by price, then the entire life of a network operator or engineer is driven by speed, and speed alone. All that matters is how you can build ever larger networks with ever fewer people–so long as you get the bandwidth you need, nothing else matters.

This is compounded by a simple reality–network world has driven itself into the corner of focusing on the appliance–the entire network is appliances running customized software, with little thought about the entire system. Regardless of whether this is because of the way we educate engineers through our college programs and our certifications, this is the reality on the ground level of network engineering. When your skill set is primarily built around configuring and managing appliances, and the world is increasingly making those appliances into commodities, you find yourself in a rather depressing place.

Further, there is a belief that there is no more real innovation to be had–the end of the road is nigh, and things are going to look pretty much like they look right now for the rest of … well, forever.

I want you, as a network engineer, operator, or whatever you call yourself, to look these beliefs in the eye and call them what they are: nonsense on stilts.

The real situation is this: the current “networking industry,” such as it is, has backed itself into a corner. The emphasis on planning Jonathan brings out is valid, but it is just the tip of the proverbial iceberg. There is a hint in this direction in Jonathan’s article in the list of suggestions (or requirements). Thinking across layers, thinking about failure, continuous optimization… these are all… system level thinking, To put this another way, a railway boxcar might be a commodity, but the railroad system is not. The individual over-the-road truck might be a commodity, and the individual road might not be all that remarkable, but the road system is definitely not a commodity.

The sooner we start thinking outside the appliance as network engineers or operators (or whatever you call yourself), the sooner we will start adding value to the business. This means thinking about algorithms, protocols, and systems–all that “theory stuff” we typically decry as being less than usefl–rather than how to configure x on device y. This means thinking about security across the network, rather than as how you configure a firewall. This means thinking about the tradeoffs with implementing security, including what systemic risk looks like, and when the risks are acceptable when trying to accomplish as specific goal, rather than thinking about how to route traffic through a firewall.

If demand is growing, why is the networking world such a depressing place right now? Why do I see lots of people saying things like “there will be no network engineers in enterprises in five years?” Rather than blaming the world, maybe we should start looking at how we are trying to solve the problems in front of us.

Autonomic, Automated, and Reality

Once the shipping department drops the box off with that new switch, router, or “firewall,” what happens next? You rack it, cable it up, turn it on, and start configuring, right? There are access to controls to configure—SSH, keys, disabling standard accounts, disabling telnet—interface addresses to configure, routing adjacencies to configure, local policies to configure, and… After configuring all of this, you can adjust routing in the network to route around the new device, and then either canary the device “in production” (if you run your network the way it should be run), or find some prearranged maintenance time to bring the new device online and test things out. After all of this, you can leave the new device up and running in the network, and move on to the next task.

Until it breaks.

Then you consult the documentation to remind yourself why it was configured this way, consult the documentation to understand how the application everyone is complaining about not working should work, etc. There are the many hours spent sitting on the console gathering information by running various commands and the output of various logs. Eventually, once you find the problem, you can either replace the right parts, or reconfigure the right bits, and get everything running again.

In the “modern” world (such as it is), we think it’s a huge leap forward to stop configuring devices manually. If we can just automate the configuration of all that “stuff” we have to do at the beginning, after the box is opened and before the device is placed into service, we think we have this whole networking thing pretty well figured out.

Even if you had everything in your network automated, you still haven’t figured this networking thing out.

We need to move beyond automation. Where do we need to move to? It’s not one place, but two. The first is we need to move beyond automation to autonomous operation. As an example, there is a shiny new system that is currently being widely deployed to automate the deployment and management of containers. Part of this system is the automation of connectivity, including routing, between containers. The routing system being deployed as part of this system is essentially statically configured policy-based routing combined with network address translation.

Let me point something out that is not going to be very popular: this is a step backwards in terms of making the system autonomous. Automating static routing information is not a better solution than building a real, dynamic, proactive, autonomic, routing system. It’s not simpler—trust me, I say this as someone who has operated large networks which used automated static routes to do everything.

The “opsification of everything” is neat, but it shouldn’t be our end goal.

Now part of this, I know, is the fault of vendors. Vendors who push EGPs onto data center fabrics because, after all, “the configuration complexity doesn’t matter so long as you can automate it.” The configuration complexity does matter, because configuration complexity belies an underlying protocol complexity, and sets up long and difficult troubleshooting sessions that are completely unnecessary.

The second place we need to move in the networking world? The focus on automation is just another form of focusing on configuration. We abstract the configuration, and we touch a lot more devices at once, but we are still thinking about configuration. The more we think about configuration, the less we think about how the system should work, how it really works, what the gaps are, and how to bridge those gaps. So long as we are focused on the configuration, automated or not, we are not focused on how the network can bring value to the business. The longer we are focused on configuration, the less value we are bringing to the business, and the more likely we are to end up being replaced by … an automated system … no matter how poorly that automated system actually works.

And no, the cloud isn’t going to solve this. Containers aren’t going to solve this. The “automated configuration pattern” is already being repeated in the cloud. As more complex workloads are moved into the cloud, the problems there are only going to get harder. What starts out as a “simple” system using policy-based routing analogs and network address translation configured through an automation server will eventually look complex against the hardest problems we had to solve using T1’s, frame relay circuits, inverse multiplexers, wire down patch panels, and mechanical switch crossbar frames. It’s fun to pretend we don’t need dynamic routing to solve the problems that face the network—at least until you hit hard problems, and have to relearn the lessons of the last 20+ years.

Yes, I know vendors are partly to blame for this. I know that, for a vendor, it’s easier to get people to buy into your CLI, or your entire ecosystem, rather than getting them to think about how to solve the problems your business is handing them.

On the other hand, none of this is going to change from the top down. This is only going to change when the average network engineer starts asking vendors for truly simpler solutions that don’t require reams configuration information. It will change when network engineers get their heads out of the configuration and features, and into the business problems.

Used to Mean… Now Means…

sarcasm warning—take the following post with a large grain of salt

A thousand years from now, when someone is writing the history of computer networks, one thing they will notice—at least I think they will—is how we tend to reduce our language so as many terms as possible have precisely the same meaning. They might attribute this to marketing, or the hype cycle, or… but whatever the cause this is clearly a trend in the networking world. Some examples might be helpful, so … forthwith, the reduced terminology of the networking world.

Software Defined Networking (SDN): Used to mean a standardized set of interfaces that enabled open access to the forwarding hardware. Came to mean some form of control plane centralization. Now means automated configuration and management of network devices, centralized control planes, traffic engineering, and just about anything else that seems remotely related to these.

Fabric: Used to mean a regular, non-planar, repeating network topology with scale-out characteristics. Now means any vaguely hierarchical topology (not a ring) with a lot of links.

DevOps: Used to mean applying software development processes to the configuration, operation, and troubleshooting of server and network devices. Now means the same thing as SDN.

Clos: Used to mean a three stage fabric in which every device in a prior stage is connected to every device in the next stage, all devices have the same number of ports, all traffic is east/west, and having a scale-out characteristics. Now means the same thing as fabric, and is spelled CLOS because—aren’t all four letter words abbreviations? Now external links are commonly attached to the “core” of the Clos, because… well, it kindof looks hierarchical, after all.

Hierarchical Design: Used to mean a network design with a modular layered design, and specific functions tied to each layer of the network. Generally there were two or three layers, with clear failure domain separation through aggregation and summarization of control plane information. Now means the same thing as fabric.

Cloud: Used to mean the centralization and abstraction of resources to support agile development strategies. Now means… well… the meaning is cloudy at this time, but generally applied to just about anything. Will probably end up meaning the same thing as DevOps, SDN, and fabric.

Network Topology: Used to mean a description of the interconnection system used in building a network. Some kinds of topologies were hub-and-spoke, ring, partial mesh, Clos, Benes, butterfly, full mesh, etc. Now means the same as fabric.

Routing Protocol: Used to mean the protocol, including the semantics and algorithm or heuristic, used to calculate the set of loop-free paths through a network. Includes instances such as IS-IS, EIGRP, and OSPF. Now means BGP, as this is the only protocol used in any production network (except SDN).

Router: Used to mean a device that determines the next hop to which the packet should be forwarded using the layer 3 address, replacing the layer 2 header in the process of forwarding the packet. Now means the same thing as a switch.

Switch: Used to mean a device which determined which port through which a packet should be forwarded based on the layer 2 header, did not modify the packet, etc. Now means any device that forwards packets; has generally replaced “router.”

Security: Used to mean thinking through attack surfaces, understanding protocols and their operation, and how to build a system that is difficult to attack. Now means inserting a firewall into the network.

We used to have a rich set of terms we could use to describe different kinds of topologies, devices, and ways of building networks. We seem to want to insist on merging as many terms as possible so they all mean the same thing; we are quickly reducing ourselves to fabric, switch, SDN, and cloud to describe everything.

Which makes me wonder sometimes—what are they teaching in network engineering classes now-a-days?

Reaction: Overly Attached

In a recent edition of ACM Queue, Kate Matsudaira has an article discussing the problem of being overly attached to a project or solution.

The longer you work on one system or application, the deeper the attachment. For years you have been investing in it—adding new features, updating functionality, fixing bugs and corner cases, polishing, and refactoring. If the product serves a need, you likely reap satisfaction for a job well done (and maybe you even received some raises or promotions as a result of your great work).

Attachment is a two-edged sword—without some form of attachment, it seems there is no way to have pride in your work. On the other hand, attachment leads to poorly designed solutions. For instance, we all know the hyper-certified person who knows every in and out of a particular vendor’s solution, and hence solves every problem in terms of that vendor’s products. Or the person who knows a particular network automation system and, as a result, solves every problem through automation.

The most pernicious forms of attachment in the network engineering world are to a single technology or vendor. One of the cycles I have seen play out many times across the last 30 years is: a new idea is invented; this new idea is applied to every possible problem anyone has ever faced in designing or operating a network; the  resulting solution becomes overburdened and complicated; people complain about the complexity of the solution and rush to… the next new idea. I could name tens or hundreds of technologies that have been through this cycle over time.

Another related cycle: a team adopts a new technology in order to solve a problem.

Kate points out some very helpful ways to solve over-attachment at an organizational level. For instance, aligning on goals and purpose, and asking everyone to be open to ideas and alternatives. But these organizational level solutions are more difficult to apply at an individual level. How can this be applied to the individual—to your life?

Perhaps the most important piece of advice Kate gives here is ask for stories, not solutions. In telling stories you are not eliminating attachment but refocusing it. Rather than becoming attached to a solution or technology, you are becoming attached to a goal or a narrative. This accepts that you will always be attached to something—in fact, that it is ultimately healthy to be attached to something outside yourself in a fundamental way. The life that is attached to nothing is ugly and self-centered, ultimately failing to accomplish anything.

Even here, however, there are tradeoffs. You can attach yourself to the story of a company, dedicating yourself to that single brand. To expand this a little, then, you should focus on stories about solving problems for people rather than stories about a product or technology. This might mean telling people they are wrong, by the way—sometimes the best thing is not what someone thinks they want.

Stories are ultimately about people. This is something not many people in engineering fields like to hear, because most of us are in these kinds of fields because we are either introverted, or because we struggle to relate to people in some other way.

To expand this a bit more, you should be willing to tell multiple stories, rather than just one. These stories might overlap or intersect, of course—I have been invested in a story about reducing complexity, disaggregation, and understanding why rather than how for the last ten or fifteen years. These three stories are, in many ways, the same story, just told from different perspectives. You need to allow the story to be shaped, and the path to tell that story to change, over time.

Realize your work is neither as bad as you think it is, nor as good as you think it is. Do not take criticism personally. This is a lesson I had to learn the hard way, from receiving many manuscripts back covered in red marks, either physical or virtual. Failure is not an option; it is a requirement. The more you fail, the more you will actively seek out the tradeoffs, and approach problems and people with humility.

Finally, you need to internalize modularity. Do not try to solve all the problems with a single solution, no matter how neat or new. Part of this is going to go back to understanding why things work the way they do and the limits of people (including yourself!). Solve problems incrementally and set limits on what you will try to do with any single technology.

Ultimately, refusing to become overly attached is a matter of attitude. It is something that is learned through hard work, a skill developed across time.

What’s wrong with the IETF. And what’s right

I have not counted the IETF’s I have attended; I only know the first RFC on which I’m listed as a co-author was published in 2000, so this must be close to 20 years of interacting with the IETF community. I’m pretty certain I’ve attended at least two meetings a year in some years, and three meetings a year in most of those years. Across that time, there has never been a time when I have not been told, at least once, “the IETF is broken.” And there has not been a single time I cannot remember agreeing with the sentiment.

My belief that the IETF is broken, however, is narrow, and offset by the many ways in which I think the IETF is still useful for the larger networking community.

So, how is the IETF broken? The trend that bothers me the most right now is the gold rush syndrome. A new technology is brought into the IETF, and if it looks like it might somehow be “important,” there is a “land rush” as people stake out new drafts, find use cases, find corner cases, and work to develop drafts and communities around those drafts. This generally results in a sort of ossification process, where there are clear insiders and outsiders, an entirely new vocabulary is developed, and the drafts fly so fast and furious there is almost no time to read them all. There are many problematic parts of this process. For instance, there is often a feeling that “this is important, no need to get the details right,” or “if you don’t understand, butt out of the conversation.”

A particularly troubling aspect of this is the wide desire to “be famous,” to chair a working group, to get your name on a draft, and ultimately on an RFC. This eventually becomes all important, carrying all practical considerations before it. The old ethos of “build small and flexible, code it, and let it grow where needed” is almost always lost in the shuffle of producing tens of drafts. Companies pay by the draft, or only pay for travel if you have a draft—both of which have a tendency to destroy the value of the community itself, and the way the community functions.

So that’s what broken. What’s right?

One night I was walking back from dinner with a couple of friends—Gonzolos and Joe—and I ran into Stewart Bryant in the hotel lobby. Soon enough, Paul Mockapetris joined the conversation. At some point, Dave Oran, Ignas B, and George Swallow joined the conversation. There are few places in the world you can get some collection of folks who had a hand in the creation of technologies like DNS, psuedowires, MPLS/TE, SMTP, IS-IS, IP fast reroute, and probably a dozen other technologies, standing around talking about “the good old days,” or even where to go for dinner. Across this week, I’ve chatted with Tony Li, Tony P, Jeff T, Alvaro Retana, Russ Housley, Fred Baker, Alia Atlas, and… more than I can remember.

If there is one that is striking about all of these people, it is that they are all more interested in solving problems than taking credit. They all live by the old IETF mantra: “it is amazing what can get done when no-one cares who gets the credit.” None of them are obsessed with getting their names on drafts, or with inventing something new that will change the world. They see problems, they develop solutions; that is all.

This, then, is what is right about the IETF. People who care about the challenges users have with networks, and have spent their lives finding solutions. So people are what’s wrong with the IETF, and people are also what’s right. The point?

You can choose to participate in the IETF. In fact, I hope to see you at a future meeting. But if you choose to participate, be a part of the solution, rather than a part of the problem. Be someone who looks on the land rush with skepticism, who doesn’t care about getting their name on a draft, who just wants to help solve a problem that has been fairly explained and defined to the community. Don’t be afraid to work on small things, and to insist that solutions be small and well scoped, even if that means your name is not put up in lights.

Even better advice: carry this into all the communities in which you live in your life. We live in an age that values name recognition far too much, that worries too much about being left out of the latest gold rush, that worries too much about our “rightly deserved” fifteen minutes of fame. This goes far beyond network engineering, the ethos of the “old way” in the IETF. It’s a lesson we can all take away from this little community of engineers who have worked so hard across the years to build something on which we all rely every day—to the very formats of the packets which carry this screed to your computer screen, your email box, or however else you are reading it.

Leave Your Ego at the Door

You are just about to walk into the interview room. Regardless of whether you are being interviewed, or interviewing—what are you thinking about? Are you thinking about winning? Are you thinking about whining? Or are you thinking about engaging? I have noticed, on many mailing lists, and in many other forums, that interviews in our world have devolved into a contest of egos.

The person on the other side of the table has some certification I don’t care about—how can I prove they are dumb, not as smart as their certification might indicate, or… The person on the other side of the table claims to know some protocol, can I find some bit of information they don’t know? These kinds of questions are really just ego questions—and you need to leave them at the door. This is particularly acute with certifications right now—a lot of people doubt the value of certifications, claiming folks who have them don’t know anything, the certifications are worthless, they don’t reflect the real world, etc.

I will agree that we have a problem with the depth and level of knowledge of network engineers at the moment. We all need to grow up a little, learn technologies rather than CLIs, and actually learn how to be engineers. On the other hand, when you interview someone, or when you are being interviewed…

Leave your ego at the door.

Is it really worth losing a really good hire because you needed your ego stroked by “beating” someone in an interview?

No, I didn’t think so.

Cultivate questions

Imagine that you’re sitting in a room interviewing a potential candidate for a position on your team. It’s not too hard to imagine, right, because it happens all the time. You know the next question I’m going to ask: what questions will you ask this candidate? I know a lot of people who have “set questions” they use to evaluate a candidate, such as “what is the OSPF type four for,” or “why do some states in the BGP peering session not have corresponding packets?” Since I’ve worked on certifications in the past (like the CCDE), I understand the value of these sorts of questions. They pinpoint the set and scope of the candidate’s knowledge, and they’re easy to grade. But is easy to grade what we should really be after?

Let me expand the scope a little: isn’t this the way we see our own careers? The engineer with the most bits of knowledge stuffed away when they die wins? I probably need to make a sign that says that, actually, just to highlight the humor of such a thought.

The problem is it simply isn’t a good way to measure an engineer, including the engineer reading this post (you). For one thing, as Ethan so eloquently pointed out this week—

The future of IT is not compatible with a network that waits for a human to make a change in accordance with a complex process that takes weeks. And thus it is that the future of networking becomes important. Yes, we grumpy old network engineers know how to build networks in a reliable, predictable way. But that presumes a reliable, predictable demand from business that just isn’t so in many cases.

The question becomes: how do we cultivate this culture among network engineers? It’s nice enough to say, but what do I do? I’m going to make a simple suggestion. Perhaps, in fact, it’s too simple. But it’s worth a try.

Instead of cultivating knowledge, cultivate questions.

Let’s take my current series on security BGP as an example. In part two of the series, from last week, I pointed out that it’s a long slog through the world of security for BGP. You have to ask a lot of questions, beginning with one that doesn’t even seem to make sense: what can I actually secure? Cultivating question asking is important because it helps us to actually feel our way around the problem at hand, understanding it better, and finding new ways to solve it.

Okay, so given we want to encourage engineers to ask more questions—that networks must change, now—and the path to changing networks is changing engineers, what do we do?

First, we need to rethink our certifications around cultivating questions. I think we did a pretty good job with the CCDE here, but the concept of asking if the candidate understands the right question to ask at any given phase of the process is an important skill to measure. I haven’t taken a CCIE lab since 1997, but I remember my proctor asking me if I knew what I was looking for at various times—he was trying to make certain I knew what questions to ask.

Second, we need to start thinking in models, rather than in technologies. I’ve written a lot about this; there’s an entire chapter on models in The Art of Network Architecture, and more on models in Navigating Network Complexity, but we really need to start thinking about why rather than how more often. Why do you think I talk about this stuff so often? It’s not because I don’t know the inner guts of IS-IS (I have an upcoming video series on this being published by Cisco Press), but because I think the ability to turn models and networks into questions is more important than knowing the guts of any particular protocol.

Third, we need to follow Ethan’s lead and start thinking about a broader set of skills and technology.

Finally, maybe—just maybe—we need to start setting up interviews so we can find out if the candidate knows the right questions, rather than focusing on the esoteric game, and whether or not they know all the right answers.