Learning to Trust

The state of automation among enterprise operators has been a matter of some interest this year, with several firms undertaking studies of the space. Juniper, for instance, recently released the first yearly edition of the SONAR report, which surveyed many network operators to set a baseline for a better future understanding of how automation is being used. Another recent report in this area is Enterprise Network Automation for 2020 and Beyond, conducted by Enterprise Management Associates.

While these reports are, themselves, interesting for understanding the state of automation in the networking world, one correlation noted on page 13 of the EMA report caught my attention: “Individuals who primarily engage with automation as users are less likely to fully trust automation.” This observation is set in parallel with two others on that same page: “Enterprises that consider network automation a high priority initiative trust automation more,” and “Individuals who fully trust automation report significant improvement in change management capacity.” It seems somewhat obvious these three are related in some way, but how? The answer to this, I think, lies in the relationship between the person and the tool.

We often think of tools as “abstract objects,” a “thing” that is “out there in the world.” It’s something we use to get a particular result, but which does not, in turn, impact “me as a person” in any way. This view of tools is not born out in the real world. To illustrate, consider a simple situation: you are peacefully driving down the road when another driver runs a traffic signal, causing your two cars to collide. When you jump out of your car, do you say… “you caused your vehicle to hit mine?” Nope. You say: “you hit me.”

This kind of identification between the user and the tool is widely recognized and remarked. Going back to 1904, Thorstein Veblen writes that the “machine throws out anthropomorphic habits of thought,” forcing the worker to adapt to the work, rather than the work to the worker. Marshall McLuhan says Students of computer programming have had to learn how to approach all knowledge structurally,” shaping the way they information so the computer can store and process it.

What does any of this have to do with network automation and trust? Two things. First, the more “involved” you are with a tool, the more you will trust it. People trust hammers more than they do cars (in general) because the use of hammers is narrow, and the design and operation of the tool is fairly obvious. Cars, on the other hand, are complex; many people simply drive them, rather than learning how they work. If you go to a high speed or off-road driving course, the first thing you will be taught is how a car works. This is not an accident—in learning how a car works, you are learning to trust the tool. Second, the more you work with a tool, the more you will understand its limits, and hence the more you will know when you can, and cannot, trust it.

If you want to trust your network automation, don’t just be a user. Be an active participant in the tools you use. This explains the correlation between the level of trust, level of engagement, and level of improvement. The more you participate in the development of the tooling itself, and the more you work with the tools, the more you will be able to trust them. Increased trust will, in turn, result in increased productivity and effectiveness. To some degree, your way of thinking will be shaped to the tool—this is just a side effect of the way our minds work.

You can extend this lesson to all other areas of network engineering—for instance, if you want to trust your network, then you need to go beyond just configuring routing, and really learn how it works. This does not mean you need in depth knowledge of that particular implementation, nor does it mean knowing how every possible configuration option works in detail, but it does mean knowing how the protocol converges, what the limits to the protocol are, etc. Rinse and repeat for routers, storage systems, quality of service, etc.—and eventually you will not only be able to trust your tools, but also be very productive and effective with them.

The Hedge 11: Roland Dobbins on Working Remotely

I failed to include the right categories the first time, so this didn’t make it into the podcast catcher feeds correctly…

Network engineering and operations are both “mental work” that can largely be done remotely—but working remote is not only great in many ways, it is also often fraught with problems. In this episode of the Hedge, Roland Dobbins joins Tom and Russ to discuss the ins and outs of working remote, including some strategies we have found effective at removing many of the negative aspects.

download

(Effective) Habits of the Network Expert

For any field of study, there are some mental habits that will make you an expert over time. Whether you are an infrastructure architect, a network designer, or a network reliability engineer, what are the habits of mind those involved in the building and operation of networks follow that mark out expertise?

Experts involve the user

Experts don’t just listen to the user, they involve the user. This means taking the time to teach the developer or application owner how their applications interact with the network, showing them how their applications either simplify or complicate the network, and the impact of these decisions on the overall network.

Experts think about data

Rather than applications. What does the data look like? How does the business use the data? Where does the data need to be, when does it need to be there, how often does it need to go, and what is the cost of moving it? What might be in the data that can be harmful? How can I protect the data while at rest and in flight?

Experts think in modules, surfaces, and protocols

Devices and configurations can, and should, change over time. The way a problem is broken up into modules and the interaction surfaces (interfaces) between those modules can be permanent. Choosing the wrong protocol means choosing a different protocol to solve every problem, leading to accretion of complexity, ossification, and ultimately brittleness. Break the problem up right the first time, and choose the protocols carefully, and let the devices and configurations follow.

Choosing devices first is like selecting the hammer you’re going to use to build a house, and then selecting the design and materials used in the house based on what you can use the hammer for.

Experts think about tradeoffs

State, optimization, and surface is an ironclad tradeoff. If you increase state, you increase complexity while also increasing optimization. If you increase surfaces through abstraction, you are both increasing and decreasing state, which has an impact both on complexity and optimization. All nontrivial abstractions leak. Every time you move data you are facing the speed of serialization, queueing, and light, and hence you are dealing with the choice between consistency, availablity, and partitioning.

If you haven’t found the tradeoffs, you haven’t looked hard enough.

Experts focus on the essence

Every problem has an essential core—something you are trying to solve, and a reason for solving it. Experts know how to divide between the essential and the nonessential. Experts think about what they are not designing, and what they are not trying to accomplish, as well as what they are. This doesn’t mean the rest isn’t there, it just means it’s not quite in focus all the time.

Experts are mentally stimulated to simulate

Labs are great—but moving beyond the lab and thinking about how the system works as a whole is better. Experts mentally simulate how the data moves, how the network converges, how attackers might try to break in, and other things besides.

Experts look around

Interior designers go to famous spaces to see how others have designed before them. Building designers walk through cities and famous buildings to see how others have designed before them. The more you know about how others have designed, the more you know about the history of networks, the more of an expert you will be.

Experts reshape the problem space

Experts are unafraid to think about the problem in a different way, to say “no,” and to try solutions that have not been tried before. Best common practice is a place to start, not a final arbiter of all that is good and true. Experts do not fall to the “is/ought” fallacy.

Experts treat problems as opportunities

Whether the problem is a mistake or a failure, or even a little bit of both, every problem is an opportunity to learn how the system works, and how networks work in general.

 

Is it planning… or just plain engineering?

Over at the ECI blog, Jonathan Homa has a nice article about the importance of network planning–

In the classic movie, The Graduate (1967), the protagonist is advised on career choices, “In one word – plastics.” If you were asked by a young person today, graduating with an engineering or similar degree about a career choice in telecommunications, would you think of responding, “network planning”? Well, probably not.

Jonathan describes why this is so–traffic is constantly increasing, and the choice of tools we have to support the traffic loads of today and tomorrow can be classified in two ways: slim and none (as I remember a weather forecaster saying when I “wore a younger man’s shoes”). The problem, however, is not just tools. The network is increasingly seen as a commodity, “pure bandwidth that should be replaceable like memory,” made up of entirely interchangeable parts and pieces, primarily driven by the cost to move a bit across a given distance.

This situation is driving several different reactions in the network engineering world, none of which are really healthy. There is a sense of resignation among people who work on networks. If commodities are driven by price, then the entire life of a network operator or engineer is driven by speed, and speed alone. All that matters is how you can build ever larger networks with ever fewer people–so long as you get the bandwidth you need, nothing else matters.

This is compounded by a simple reality–network world has driven itself into the corner of focusing on the appliance–the entire network is appliances running customized software, with little thought about the entire system. Regardless of whether this is because of the way we educate engineers through our college programs and our certifications, this is the reality on the ground level of network engineering. When your skill set is primarily built around configuring and managing appliances, and the world is increasingly making those appliances into commodities, you find yourself in a rather depressing place.

Further, there is a belief that there is no more real innovation to be had–the end of the road is nigh, and things are going to look pretty much like they look right now for the rest of … well, forever.

I want you, as a network engineer, operator, or whatever you call yourself, to look these beliefs in the eye and call them what they are: nonsense on stilts.

The real situation is this: the current “networking industry,” such as it is, has backed itself into a corner. The emphasis on planning Jonathan brings out is valid, but it is just the tip of the proverbial iceberg. There is a hint in this direction in Jonathan’s article in the list of suggestions (or requirements). Thinking across layers, thinking about failure, continuous optimization… these are all… system level thinking, To put this another way, a railway boxcar might be a commodity, but the railroad system is not. The individual over-the-road truck might be a commodity, and the individual road might not be all that remarkable, but the road system is definitely not a commodity.

The sooner we start thinking outside the appliance as network engineers or operators (or whatever you call yourself), the sooner we will start adding value to the business. This means thinking about algorithms, protocols, and systems–all that “theory stuff” we typically decry as being less than usefl–rather than how to configure x on device y. This means thinking about security across the network, rather than as how you configure a firewall. This means thinking about the tradeoffs with implementing security, including what systemic risk looks like, and when the risks are acceptable when trying to accomplish as specific goal, rather than thinking about how to route traffic through a firewall.

If demand is growing, why is the networking world such a depressing place right now? Why do I see lots of people saying things like “there will be no network engineers in enterprises in five years?” Rather than blaming the world, maybe we should start looking at how we are trying to solve the problems in front of us.

Lessons Learned from the Robustness Principle

The Internet, and networking protocols more broadly, were grounded in a few simple principles. For instance, there is the end-to-end principle, which argues the network should be a simple fat pipe that does not modify data in transit. Many of these principles have tradeoffs—if you haven’t found the tradeoffs, you haven’t looked hard enough—and not looking for them can result in massive failures at the network and protocol level.

Another principle networking is grounded in is the Robustness Principle, which states: “Be liberal in what you accept, and conservative in what you send.” In protocol design and implementation, this means you should accept the widest range of inputs possible without negative consequences. A recent draft, however, challenges the robustness principle—draft-iab-protocol-maintenance.

According to the authors, the basic premise of the robustness principle lies in the problem of updating older software for new features or fixes at the scale of an Internet sized network. The general idea is a protocol designer can set aside some “reserved bits,” using them in a later version of the protocol, and not worry about older implementations misinterpreting them—new meanings of old reserved bits will be silently ignored. In a world where even a very old operating system, such as Windows XP, is still widely used, and people complain endlessly about forced updates, it seems like the robustness principle is on solid ground in this regard.

The argument against this in the draft is implementing the robustness principle allows a protocol to degenerate over time. Older implementations are not removed from service because it still works, implementations are not updated in a timely manner, and the protocol tends to have an ever-increasing amount of “dead code” in the form of older expressions of data formats. Given an infinite amount of time, an infinity number of versions of any given protocol will be deployed. As a result, the protocol can and will break in an infinite number of ways.

The logic of the draft is something along the lines of: old ways of doing things should be removed from protocols which are actively maintained in order to unify and simplify the protocol. At least for actively maintained protocols, reliance on the robustness principle should be toned down a little.

Given the long list of examples in the draft, the authors make a good case.

There is another side to the argument, however. The robustness principle is not “just” about keeping older versions of software working “at scale.” All implementations, no matter how good their quality, have defects (or rather, unintended features). Many of these defects involve failing to release or initialize memory, failing to bounds check inputs, and other similar oversights. A common way to find these errors is to fuzz test code—throw lots of different inputs at it to see if it spits up an error or crash.

The robustness principle runs deeper than infinite versions—it also helps implementations deal with defects in “the other end” that generate bad data. The robustness principle, then, can help keep a network running even in the face of an implementation defect.

Where does this leave us? Abandoning the robustness principle is clearly not a good thing—while the network might end up being more correct, it might also end up simply not running. Ever. The Internet is an interlocking system of protocols, hardware, and software; the robustness principle is the lubricant that makes it all work at all.

Clearly, then, there must be some sort of compromise position that will work. Perhaps a two pronged attack might work. First, don’t discard errors silently. Instead, build logging into software that catches all errors, regardless of how trivial they might seem. This will generate a lot of data, but we need to be clear on the difference between instrumenting something and actually paying attention to what is instrumented. Instrumenting code so that “unknown input” can be caught and logged periodically is not a bad thing.

Second, perhaps protocols need some way to end of life older versions. Part of the problem with the robustness principle is it allows an infinite number of versions of a single protocol to exist in the same network. Perhaps the IETF and other standards organizations should rethink this, explicitly taking older ways of doing things out of specs on a periodic basis. A draft that says “you shouldn’t do this any longer,” or “this is no longer in accordance with the specification,” would not be a bad thing.

For the more “average” network engineer, this discussion around the robustness principle should lead to some important lessons, particularly as we move ever more deeply into an automated world. Be clear about versioning of APIs and components. Deprecate older processes when they should no longer be used.

Control your technical debt, or it will control you.

DoS’ing your mind: Controlling information inflow

Everyone wants your attention. No, seriously, they do. We’ve gone from a world where there were lots of readers and not much content, to a world where there is lots of content, and not many readers. There’s the latest game over here, the latest way to “get 20,000 readers,” over there, the way to “retire by the time you’re 32” over yonder, and “how to cure every known disease with this simple group of weird fruit from someplace you’ve never heard of (but you’ll certainly go find, and revel in the pictures of perfectly healthy inhabitants now),” naggling someplace at the back of your mind.

The insidious, distracting suck of the Internet has become seemingly inescapable. Calling us from our pockets, lurking behind work documents, it’s merely a click away. Studies have shown that each day we spend, on average, five and a half hours on digital media, and glance at our phones 221 times. -via connecting

Living this way isn’t healthy. It reduces your attention span, which in turn destroys your ability to get anything done, as well as destroying your mind. So we need to stop. “Squirrel” is funny, but you crash planes. “Shiny thing” is funny, but you end up mired in jellyfish (and, contrary to the cartoons, you don’t survive that one, either).

So how do we stop? Three hints.

Slack off. Tom, over at Networking Nerd, points out the importance of asynchronous communication (once again) in an excellent post about Slack. Sometimes you need to just sit and talk to someone. But sometimes you can’t. As Tom says:

Think back to the last time you responded to an email. How often did you start your response with “Sorry for the delay” or some version of that phrase? In today’s society, we’ve become accustomed to instant responses to things.

This means shutting down the IM, shutting down Slack for a bit, shutting down email, turning off your ringer, and just actually getting things done. This is also part of my theory of macro tasking. Controlling information inflow is as much about moving to asynchronous communications as it is anything else. And asynchronous means “I don’t have to answer right now,” as well as “I don’t expect them to answer right now, because I have other stuff I can work on while I wait.”

Yes, this is hard, which leads to the second suggestion.

Precommit. Set aside some amount of time that you’ll do nothing but work on a single project, or do a specific thing. You can’t set aside time for everything, and sometimes specific goals are more important than specific timeframes. For instance, I have two personal goals every day in the way of managing information. I write at least 2,000 words a day, and I read at least 75 pages a day. These aren’t static goals, of course; I’ve ramped them up over time, and I don’t meet them every day. I have larger goals, as well—for instance, I try to read 100 books a year (this is a new goal; last year it was 70 or so, but this year I’m trying to ramp up).

Odysseus had his men tie him to the mast of their ship until they were out of the sirens’ range. This is an example of “precommitment,” a self-control strategy that involves imposing a condition on some aspect of your behavior in advance. For example, an MIT study showed that paid proofreaders made fewer errors and turned in their work earlier when they chose to space out their deadlines (e.g., complete one assignment per week for a month), compared to when they had the same amount of time to work, but had only one deadline at the end of a month. -via connecting

Self discipline, as Connecting points out, isn’t so much about resisting temptation as it is avoiding it.

Vary your sources. This one might not seem so obvious, but you really do need to do more than rely on Google for your entire view of the world.

Think of the WorldWideWeb increasingly as the public and open façade of the Web, and Google’s Inner-net as Google’s private, and more closed regime of mostly-dominant, Google-controlled operating systems, platforms, apps, protocols, and APIs that collectively govern how most of the Web operates, largely out of public view. -via Somewhat Reasonable

You build a box for your brain by only consuming movies, or fiction books, or technical books, or Google searches, or a particular game. As I’ve said elsewhere:

You can’t really think outside the box in the real world. You can, however, get a bigger box in which to do your thinking.

Do these actually work? Yes. Do they work all the time, and immediately? No. They take time. Which is why you also must learn to be patient, to give yourself some slack, and to build up your virtue.

But if you don’t start someplace, it’s certainly never going to work. Ultimately, you are responsible for what enters your brain; controlling information inflow is just part of your job as a human being. Or, as I always tell my daughters: garbage in, garbage out. If you don’t learn to control the garbage in part, no-one else is going to be able to help you control the garbage out part.