Skip to content

CULTURE

(Effective) Habits of the Network Expert

For any field of study, there are some mental habits that will make you an expert over time. Whether you are an infrastructure architect, a network designer, or a network reliability engineer, what are the habits of mind those involved in the building and operation of networks follow that mark out expertise?

Experts involve the user

Experts don’t just listen to the user, they involve the user. This means taking the time to teach the developer or application owner how their applications interact with the network, showing them how their applications either simplify or complicate the network, and the impact of these decisions on the overall network.

Experts think about data

Rather than applications. What does the data look like? How does the business use the data? Where does the data need to be, when does it need to be there, how often does it need to go, and what is the cost of moving it? What might be in the data that can be harmful? How can I protect the data while at rest and in flight?

Experts think in modules, surfaces, and protocols

Devices and configurations can, and should, change over time. The way a problem is broken up into modules and the interaction surfaces (interfaces) between those modules can be permanent. Choosing the wrong protocol means choosing a different protocol to solve every problem, leading to accretion of complexity, ossification, and ultimately brittleness. Break the problem up right the first time, and choose the protocols carefully, and let the devices and configurations follow.

Choosing devices first is like selecting the hammer you’re going to use to build a house, and then selecting the design and materials used in the house based on what you can use the hammer for.

Experts think about tradeoffs

State, optimization, and surface is an ironclad tradeoff. If you increase state, you increase complexity while also increasing optimization. If you increase surfaces through abstraction, you are both increasing and decreasing state, which has an impact both on complexity and optimization. All nontrivial abstractions leak. Every time you move data you are facing the speed of serialization, queueing, and light, and hence you are dealing with the choice between consistency, availablity, and partitioning.

If you haven’t found the tradeoffs, you haven’t looked hard enough.

Experts focus on the essence

Every problem has an essential core—something you are trying to solve, and a reason for solving it. Experts know how to divide between the essential and the nonessential. Experts think about what they are not designing, and what they are not trying to accomplish, as well as what they are. This doesn’t mean the rest isn’t there, it just means it’s not quite in focus all the time.

Experts are mentally stimulated to simulate

Labs are great—but moving beyond the lab and thinking about how the system works as a whole is better. Experts mentally simulate how the data moves, how the network converges, how attackers might try to break in, and other things besides.

Experts look around

Interior designers go to famous spaces to see how others have designed before them. Building designers walk through cities and famous buildings to see how others have designed before them. The more you know about how others have designed, the more you know about the history of networks, the more of an expert you will be.

Experts reshape the problem space

Experts are unafraid to think about the problem in a different way, to say “no,” and to try solutions that have not been tried before. Best common practice is a place to start, not a final arbiter of all that is good and true. Experts do not fall to the “is/ought” fallacy.

Experts treat problems as opportunities

Whether the problem is a mistake or a failure, or even a little bit of both, every problem is an opportunity to learn how the system works, and how networks work in general.

 

Is it planning… or just plain engineering?

Over at the ECI blog, Jonathan Homa has a nice article about the importance of network planning–

In the classic movie, The Graduate (1967), the protagonist is advised on career choices, “In one word – plastics.” If you were asked by a young person today, graduating with an engineering or similar degree about a career choice in telecommunications, would you think of responding, “network planning”? Well, probably not.

Jonathan describes why this is so–traffic is constantly increasing, and the choice of tools we have to support the traffic loads of today and tomorrow can be classified in two ways: slim and none (as I remember a weather forecaster saying when I “wore a younger man’s shoes”). The problem, however, is not just tools. The network is increasingly seen as a commodity, “pure bandwidth that should be replaceable like memory,” made up of entirely interchangeable parts and pieces, primarily driven by the cost to move a bit across a given distance.

This situation is driving several different reactions in the network engineering world, none of which are really healthy. There is a sense of resignation among people who work on networks. If commodities are driven by price, then the entire life of a network operator or engineer is driven by speed, and speed alone. All that matters is how you can build ever larger networks with ever fewer people–so long as you get the bandwidth you need, nothing else matters.

This is compounded by a simple reality–network world has driven itself into the corner of focusing on the appliance–the entire network is appliances running customized software, with little thought about the entire system. Regardless of whether this is because of the way we educate engineers through our college programs and our certifications, this is the reality on the ground level of network engineering. When your skill set is primarily built around configuring and managing appliances, and the world is increasingly making those appliances into commodities, you find yourself in a rather depressing place.

Further, there is a belief that there is no more real innovation to be had–the end of the road is nigh, and things are going to look pretty much like they look right now for the rest of … well, forever.

I want you, as a network engineer, operator, or whatever you call yourself, to look these beliefs in the eye and call them what they are: nonsense on stilts.

The real situation is this: the current “networking industry,” such as it is, has backed itself into a corner. The emphasis on planning Jonathan brings out is valid, but it is just the tip of the proverbial iceberg. There is a hint in this direction in Jonathan’s article in the list of suggestions (or requirements). Thinking across layers, thinking about failure, continuous optimization… these are all… system level thinking, To put this another way, a railway boxcar might be a commodity, but the railroad system is not. The individual over-the-road truck might be a commodity, and the individual road might not be all that remarkable, but the road system is definitely not a commodity.

The sooner we start thinking outside the appliance as network engineers or operators (or whatever you call yourself), the sooner we will start adding value to the business. This means thinking about algorithms, protocols, and systems–all that “theory stuff” we typically decry as being less than usefl–rather than how to configure x on device y. This means thinking about security across the network, rather than as how you configure a firewall. This means thinking about the tradeoffs with implementing security, including what systemic risk looks like, and when the risks are acceptable when trying to accomplish as specific goal, rather than thinking about how to route traffic through a firewall.

If demand is growing, why is the networking world such a depressing place right now? Why do I see lots of people saying things like “there will be no network engineers in enterprises in five years?” Rather than blaming the world, maybe we should start looking at how we are trying to solve the problems in front of us.

Autonomic, Automated, and Reality

Once the shipping department drops the box off with that new switch, router, or “firewall,” what happens next? You rack it, cable it up, turn it on, and start configuring, right? There are access to controls to configure—SSH, keys, disabling standard accounts, disabling telnet—interface addresses to configure, routing adjacencies to configure, local policies to configure, and… After configuring all of this, you can adjust routing in the network to route around the new device, and then either canary the device “in production” (if you run your network the way it should be run), or find some prearranged maintenance time to bring the new device online and test things out. After all of this, you can leave the new device up and running in the network, and move on to the next task.

Until it breaks.

Then you consult the documentation to remind yourself why it was configured this way, consult the documentation to understand how the application everyone is complaining about not working should work, etc. There are the many hours spent sitting on the console gathering information by running various commands and the output of various logs. Eventually, once you find the problem, you can either replace the right parts, or reconfigure the right bits, and get everything running again.

In the “modern” world (such as it is), we think it’s a huge leap forward to stop configuring devices manually. If we can just automate the configuration of all that “stuff” we have to do at the beginning, after the box is opened and before the device is placed into service, we think we have this whole networking thing pretty well figured out.

Even if you had everything in your network automated, you still haven’t figured this networking thing out.

We need to move beyond automation. Where do we need to move to? It’s not one place, but two. The first is we need to move beyond automation to autonomous operation. As an example, there is a shiny new system that is currently being widely deployed to automate the deployment and management of containers. Part of this system is the automation of connectivity, including routing, between containers. The routing system being deployed as part of this system is essentially statically configured policy-based routing combined with network address translation.

Let me point something out that is not going to be very popular: this is a step backwards in terms of making the system autonomous. Automating static routing information is not a better solution than building a real, dynamic, proactive, autonomic, routing system. It’s not simpler—trust me, I say this as someone who has operated large networks which used automated static routes to do everything.

The “opsification of everything” is neat, but it shouldn’t be our end goal.

Now part of this, I know, is the fault of vendors. Vendors who push EGPs onto data center fabrics because, after all, “the configuration complexity doesn’t matter so long as you can automate it.” The configuration complexity does matter, because configuration complexity belies an underlying protocol complexity, and sets up long and difficult troubleshooting sessions that are completely unnecessary.

The second place we need to move in the networking world? The focus on automation is just another form of focusing on configuration. We abstract the configuration, and we touch a lot more devices at once, but we are still thinking about configuration. The more we think about configuration, the less we think about how the system should work, how it really works, what the gaps are, and how to bridge those gaps. So long as we are focused on the configuration, automated or not, we are not focused on how the network can bring value to the business. The longer we are focused on configuration, the less value we are bringing to the business, and the more likely we are to end up being replaced by … an automated system … no matter how poorly that automated system actually works.

And no, the cloud isn’t going to solve this. Containers aren’t going to solve this. The “automated configuration pattern” is already being repeated in the cloud. As more complex workloads are moved into the cloud, the problems there are only going to get harder. What starts out as a “simple” system using policy-based routing analogs and network address translation configured through an automation server will eventually look complex against the hardest problems we had to solve using T1’s, frame relay circuits, inverse multiplexers, wire down patch panels, and mechanical switch crossbar frames. It’s fun to pretend we don’t need dynamic routing to solve the problems that face the network—at least until you hit hard problems, and have to relearn the lessons of the last 20+ years.

Yes, I know vendors are partly to blame for this. I know that, for a vendor, it’s easier to get people to buy into your CLI, or your entire ecosystem, rather than getting them to think about how to solve the problems your business is handing them.

On the other hand, none of this is going to change from the top down. This is only going to change when the average network engineer starts asking vendors for truly simpler solutions that don’t require reams configuration information. It will change when network engineers get their heads out of the configuration and features, and into the business problems.

Reaction: Overly Attached

In a recent edition of ACM Queue, Kate Matsudaira has an article discussing the problem of being overly attached to a project or solution.

The longer you work on one system or application, the deeper the attachment. For years you have been investing in it—adding new features, updating functionality, fixing bugs and corner cases, polishing, and refactoring. If the product serves a need, you likely reap satisfaction for a job well done (and maybe you even received some raises or promotions as a result of your great work).

Attachment is a two-edged sword—without some form of attachment, it seems there is no way to have pride in your work. On the other hand, attachment leads to poorly designed solutions. For instance, we all know the hyper-certified person who knows every in and out of a particular vendor’s solution, and hence solves every problem in terms of that vendor’s products. Or the person who knows a particular network automation system and, as a result, solves every problem through automation.

The most pernicious forms of attachment in the network engineering world are to a single technology or vendor. One of the cycles I have seen play out many times across the last 30 years is: a new idea is invented; this new idea is applied to every possible problem anyone has ever faced in designing or operating a network; the  resulting solution becomes overburdened and complicated; people complain about the complexity of the solution and rush to… the next new idea. I could name tens or hundreds of technologies that have been through this cycle over time.

Another related cycle: a team adopts a new technology in order to solve a problem.

Kate points out some very helpful ways to solve over-attachment at an organizational level. For instance, aligning on goals and purpose, and asking everyone to be open to ideas and alternatives. But these organizational level solutions are more difficult to apply at an individual level. How can this be applied to the individual—to your life?

Perhaps the most important piece of advice Kate gives here is ask for stories, not solutions. In telling stories you are not eliminating attachment but refocusing it. Rather than becoming attached to a solution or technology, you are becoming attached to a goal or a narrative. This accepts that you will always be attached to something—in fact, that it is ultimately healthy to be attached to something outside yourself in a fundamental way. The life that is attached to nothing is ugly and self-centered, ultimately failing to accomplish anything.

Even here, however, there are tradeoffs. You can attach yourself to the story of a company, dedicating yourself to that single brand. To expand this a little, then, you should focus on stories about solving problems for people rather than stories about a product or technology. This might mean telling people they are wrong, by the way—sometimes the best thing is not what someone thinks they want.

Stories are ultimately about people. This is something not many people in engineering fields like to hear, because most of us are in these kinds of fields because we are either introverted, or because we struggle to relate to people in some other way.

To expand this a bit more, you should be willing to tell multiple stories, rather than just one. These stories might overlap or intersect, of course—I have been invested in a story about reducing complexity, disaggregation, and understanding why rather than how for the last ten or fifteen years. These three stories are, in many ways, the same story, just told from different perspectives. You need to allow the story to be shaped, and the path to tell that story to change, over time.

Realize your work is neither as bad as you think it is, nor as good as you think it is. Do not take criticism personally. This is a lesson I had to learn the hard way, from receiving many manuscripts back covered in red marks, either physical or virtual. Failure is not an option; it is a requirement. The more you fail, the more you will actively seek out the tradeoffs, and approach problems and people with humility.

Finally, you need to internalize modularity. Do not try to solve all the problems with a single solution, no matter how neat or new. Part of this is going to go back to understanding why things work the way they do and the limits of people (including yourself!). Solve problems incrementally and set limits on what you will try to do with any single technology.

Ultimately, refusing to become overly attached is a matter of attitude. It is something that is learned through hard work, a skill developed across time.

meeting madness, ONUG, and software defined

Jordan, Eyvonne, and I sit down for a conversation that begins with meetings, and ends with talking about software defined everything (including meetings??).

Outro Music:
Danger Storm Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 3.0 License
http://creativecommons.org/licenses/by/3.0/

Throwing the baby out with the bathwater (No, you’re not Google, but why does this matter?)

It was quite difficult to prepare a tub full of bath water at many points in recent history (and it probably still is in some many parts of the world). First, there was the water itself—if you do not have plumbing, then the water must be manually transported, one bucket at a time, from a stream, well, or pump, to the tub. The result, of course, would be someone who was sweaty enough to need the forthcoming bath. Then there is the warming of the water. Shy of building a fire under the tub itself, how can you heat enough water quickly enough to make the eventual bathing experience? According to legend, this resulted in the entire household using the same tub of water to bathe. The last to bathe was always the smallest, the baby. By then, the water would be murky with dirt, which means the child could not be seen in the tub. When the tub was thrown out, then, no-one could tell if the baby was still in there.

But it doesn’t take a dirty tub of water to throw the baby out with the bath. All it really takes is an unwillingness to learn from the lessons of others because, somehow, you have convinced yourself that your circumstances are so different there is nothing to learn. Take, for instance, the constant refrain, “you are not Google.”

I should hope not.

But this phrase, or something similar, is often used to say something like this: you don’t have the problems of any of the hyperscalers, so you should not look to their solutions to find solutions for your problems. An entertaining read on this from a recent blog:

Software engineers go crazy for the most ridiculous things. We like to think that we’re hyper-rational, but when we have to choose a technology, we end up in a kind of frenzy — bouncing from one person’s Hacker News comment to another’s blog post until, in a stupor, we float helplessly toward the brightest light and lay prone in front of it, oblivious to what we were looking for in the first place. —Oz Nova

There is a lot of truth here—you should never choose a system or solution because it solves someone else’s problem. Do not deploy Kafka if you you need the scale Kafka represents. Maybe you don’t need four links between every pair of routers “just to be certain you have enough redundancy.”

On the other hand, there is a real danger here of throwing the baby out with the bathwater—the water is murky with product and project claims, so just abandon the entire mess. To see where the problem is here, let’s look at another large scale system we don’t think about very much any longer: the NASA space program from the mid-1960’s. One of the great things the folks at NASA have always liked to talk about is all the things that came out of the space program. Remember Tang? Or maybe not. It really wasn’t developed for the space program, and it’s mostly sugar and water, but it was used in some of the first space missions, and hence became associated with hanging out in space.

There are a number of other inventions, however, that really did come directly out of research into solving problems folks hanging out in space would have, such as the space pen, freeze-dried ice cream, exercise machines, scratch-resistant eyeglass lenses, cameras on phones, battery powered tools, infrared thermometers, and many others.

Since you are not going to space any time soon, you refuse to use any of these technologies, right?

Do not be silly. Of course you still use these technologies. Because you are smart enough not to throw the baby out with the bathwater, right?

You should apply the same level of care to the solutions Google, Amazon, LinkedIn, Microsoft, and the other hyperscalers. Not everything is going to fit in your environment, of course. On the other hand, some things might fit. And regardless of whether any particular technology fits or not, you can still learn something about how systems work by considering how they are building things to scale to their needs. You can adopt operational processes that make sense based on what they have learned. You can pick out technologies and ways of thinking that make sense.

No, you’re (probably not) Google. On the other hand, we are all building complex networks. The more we can learn from those around us, the better what we build will be. Don’t throw the baby out with the bathwater.

Scroll To Top