Is it Money, Flexibility, or… ??

Raise your hand if you think moving to platform as a service or infrastructure as a service is all about saving money. Raise it if you think moving to “the cloud” is all about increasing business agility and flexibility.

Put your hand down. You’re wrong.

Before going any further, let me clarify things a bit. You’ll notice I did not say software as a service above—for good reason. Move email to the cloud? Why not? Word processing? Sure, word processing is (relatively) a commodity service (though I’m always amazed at the number of people who say “word processor x stinks,” opting to learn complex command sets to “solve the problem,” without first consulting a user manual to see if they can customize “word processor x” to meet their needs).

What about supporting business-specific, or business-critical, applications? You know, the ones you’ve hired in-house developers to create and curate?

Will you save money by moving these applications to a platform as a service? There is, of course, some efficiency to be gained. It is cheaper for a large-scale manufacturer of potato chips to make a bag of chips than for you to cook them in your own home. They have access to specialized slicers, fryers, chemists, and even special potatoes (with more starch than the ones you can buy in a grocery store). Does this necessarily mean that buying potato chips in a bag is always cheaper? In other words, does the manufacturer pass all these savings on to you, the consumer? To ask the question is to know the answer.

And once you’ve turned making all your potato chips over to the professionals, getting rid of the equipment needed to make them, and letting the skill of making good potato chips atrophy, what is going to happen to the price? Yep, thought so.

This is not to say cost is not a factor. Rather, the cost of supporting customized applications on the cloud or local infrastructure needs to be evaluated on a case-by-case basis—either might be cheaper than the other, and the cost of both will change over time.
Does using the cloud afford you more business flexibility? Sometimes, yes. And sometimes, no. Again, the flexibility benefit normally comes from “business agnostic” kinds of flexibility. The kind of flexibility you need to run your business efficiently may, or may not, be the same as the majority of other business. Moving your business to another cloud provider is not always as simple as it initially seems.

The cost and flexibility benefit come from relatively customer-agnostic parts of the business models. To that extent, you rely more on them than they rely on you. Yes, you can vote with your feet if the mickey is taken, but if we’re honest, this kind of supply is almost as inelastic as your old IT service deal. There are few realistic options for supply at scale, and the act of reversing out of a big contract, selecting a new supplier, and making the operational switch can bleed any foreseeable benefits out of a change—something all parties in the procurement process know too well.

So… saving money is sometimes a real reason to outsource things. In some situations, flexibility or agility is going to be a factor. But… there is a third factor I have not mentioned yet—probably the most important, but almost never discussed. Risk aversion.

Let’s be honest. For the last twenty years we network engineers have specialized in building extremely complex systems and formulating the excuses required when things don’t go right. We’ve specialized in saying “yes” to every requirement (or even wish) because we think that by saying “yes” we will become indispensable. Rather than building platforms on which the business can operate, we’ve built artisanal, complex, pets that must be handled carefully lest they turn into beasts that devour time and money. You know, like the person who tries to replicate store-bought chips by purchasing expensive fryers and potatoes, and ends up just making a mess out of the kitchen?

If you want to fully understand your infrastructure, and the real risk of complexity, you need to ask about risk, money, and flexibility—all three. When designing a network, or modifying things to deploy a new service onto an existing network, you need to think about risk as well as cost and flexibility.

How do you manage risk? Sarah Clarke, in the article I quoted above, gives us a few places to start (which I’ve modified to fit the network engineering world). First, ask the question about risk. Don’t just ask “how much money is this going to cost or save,” ask “what risk is being averted or managed here?” You can’t ever think the problem through if you don’t ever ask the question. Second, ask about how you are going to assess the solution against risk, money, and flexibility. How will you know if moving in a particular direction worked? Third, build out clear demarcation points. This is both about the modules within the system as well as responsibilities.

Finally, have an escalation plan. Know what you are going to do when things go wrong, and when you are going to do it. Think about how you can back out of a situation entirely. What are the alternatives? What does it take to get there? You can’t really “unmake” decisions, but you can come to a point where you realize you need to make a different decision. Know what that point is, and at least have the information on hand to know what decision you should make when you get there.

But first, ask the question. Risk aversion drives many more decisions than you might think.

The Hedge 17: Michael Natkin and Strong Opinions Loosely Held

According to Michael Natkin, “in the tech industry, with our motto of “strong opinions, loosely held” (also known as “strong opinions, weakly held”), we’ve glorified overconfidence.” Michael joins Tom Ammon and Russ White to discuss the culture of overconfidence, and how it impacts the field of information technology.

download

Obfuscating Complexity Considered Harmful

If you are looking for a good resolution for 2020 still (I know, it’s a bit late), you can’t go wrong with this one: this year, I will focus on making the networks and products I work on truly simpler. Now, before you pull Tom’s take out on me—

There are those that would say that what we’re doing is just hiding the complexity behind another layer of abstraction, which is a favorite saying of Russ White. I’d argue that we’re not hiding the complexity as much as we’re putting it back where it belongs – out of sight. We don’t need the added complexity for most operations.

Three things: First, complex solutions are always required for hard problems. If you’ve ever listened to me talk about complexity, you’ve probably seen this quote on a slide someplace—

[C]omplexity is most succinctly discussed in terms of functionality and its robustness. Specifically, we argue that complexity in highly organized systems arises primarily from design strategies intended to create robustness to uncertainty in their environments and component parts.

You cannot solve hard problems—complex problems—without complex solutions. In fact, a lot of the complexity we run into in our everyday lives is a result of saying “this is too complex, I’m going to build something simpler.” (here I’m thinking of a blog post I read last year that said “when we were building containers, we looked at routing and realized how complex it was… so we invented something simpler… which, of course, turned out to be more complex than dynamic routing!)

Second, abstraction can be used the right way to manage complexity, and it can be used the wrong way to obfuscate or mask complexity. The second great source of complexity and system failure in our world is we don’t abstract complexity so much as we obfuscate it.

Third, abstraction is not a zero-sum game. If you haven’t found the tradeoffs, you haven’t looked hard enough. This is something expressed through the state/optimization/surface triangle, which you should know at this point.

Returning to the top of this post, the point is this: Using abstraction to manage complexity is fine. Obfuscation of complexity is not. Papering over complexity “just because I can” never solves the problem, any more than sweeping dirt under the rug, or papering over the old paint without bothering to fix the wall first.

We need to go beyond just figuring out how to make the user interface simpler, more “intent-driven,” automated, or whatever it is. We need to think of the network as a system, rather than as a collection of bits and bobs that we’ve thrown together across the years. We need to think about the modules horizontally and vertically, think about how they interact, understand how each piece works, understand how each abstraction leaks, and be able to ask hard questions.

For each module, we need to understand how things work well enough to ask is this the right place to divide these two modules? We should be willing to rethink our abstraction lines, the placement of modules, and how things fit together. Sometimes moving an abstraction point around can greatly simplify a design while increasing optimal behavior. Other times it’s worth it to reduce optimization to build a simpler mouse trap. But you cannot know the answer to this question until you ask it. If you’re sweeping complexity under the rug because… well, that’s where it belongs… then you are doing yourself and the organization you work for a disfavor, plain and simple. Whatever you sweep under the rug of obfuscation will grow and multiply. You don’t want to be around when it crawls back out from under that rug.

For each module, we need to learn how to ask is this the right level and kind of abstraction? We need to learn to ask does the set of functions this module is doing really “hang together,” or is this just a bunch of cruft no-one could figure out what to do with, so they shoved it all in a black box and called it done?

Above all, we need to learn to look at the network as a system. I’ve been harping on this for so long, and yet I still don’t think people understand what I am saying a lot of times. So I guess I’ll just have to keep saying it. 😊

The problems networks are designed to solve are hard—therefore, networks are going to be complex. You cannot eliminate complexity, but you can learn to minimize and control it. Abstraction within a well-thought-out system is a valid and useful way to control complexity and understanding how and where to create modules at the edges of which abstraction can take place is a valid and useful way of controlling complexity.

Don’t obfuscate. Think systemically, think about the tradeoffs, and abstract wisely.

Learning to Trust

The state of automation among enterprise operators has been a matter of some interest this year, with several firms undertaking studies of the space. Juniper, for instance, recently released the first yearly edition of the SONAR report, which surveyed many network operators to set a baseline for a better future understanding of how automation is being used. Another recent report in this area is Enterprise Network Automation for 2020 and Beyond, conducted by Enterprise Management Associates.

While these reports are, themselves, interesting for understanding the state of automation in the networking world, one correlation noted on page 13 of the EMA report caught my attention: “Individuals who primarily engage with automation as users are less likely to fully trust automation.” This observation is set in parallel with two others on that same page: “Enterprises that consider network automation a high priority initiative trust automation more,” and “Individuals who fully trust automation report significant improvement in change management capacity.” It seems somewhat obvious these three are related in some way, but how? The answer to this, I think, lies in the relationship between the person and the tool.

We often think of tools as “abstract objects,” a “thing” that is “out there in the world.” It’s something we use to get a particular result, but which does not, in turn, impact “me as a person” in any way. This view of tools is not born out in the real world. To illustrate, consider a simple situation: you are peacefully driving down the road when another driver runs a traffic signal, causing your two cars to collide. When you jump out of your car, do you say… “you caused your vehicle to hit mine?” Nope. You say: “you hit me.”

This kind of identification between the user and the tool is widely recognized and remarked. Going back to 1904, Thorstein Veblen writes that the “machine throws out anthropomorphic habits of thought,” forcing the worker to adapt to the work, rather than the work to the worker. Marshall McLuhan says Students of computer programming have had to learn how to approach all knowledge structurally,” shaping the way they information so the computer can store and process it.

What does any of this have to do with network automation and trust? Two things. First, the more “involved” you are with a tool, the more you will trust it. People trust hammers more than they do cars (in general) because the use of hammers is narrow, and the design and operation of the tool is fairly obvious. Cars, on the other hand, are complex; many people simply drive them, rather than learning how they work. If you go to a high speed or off-road driving course, the first thing you will be taught is how a car works. This is not an accident—in learning how a car works, you are learning to trust the tool. Second, the more you work with a tool, the more you will understand its limits, and hence the more you will know when you can, and cannot, trust it.

If you want to trust your network automation, don’t just be a user. Be an active participant in the tools you use. This explains the correlation between the level of trust, level of engagement, and level of improvement. The more you participate in the development of the tooling itself, and the more you work with the tools, the more you will be able to trust them. Increased trust will, in turn, result in increased productivity and effectiveness. To some degree, your way of thinking will be shaped to the tool—this is just a side effect of the way our minds work.

You can extend this lesson to all other areas of network engineering—for instance, if you want to trust your network, then you need to go beyond just configuring routing, and really learn how it works. This does not mean you need in depth knowledge of that particular implementation, nor does it mean knowing how every possible configuration option works in detail, but it does mean knowing how the protocol converges, what the limits to the protocol are, etc. Rinse and repeat for routers, storage systems, quality of service, etc.—and eventually you will not only be able to trust your tools, but also be very productive and effective with them.

The Hedge 14: Ron Bonica and SRM6

SRv6 uses IPv6 header fields to perform many of the same traffic engineering, fast reroute, and other functions available through MPLS. The size of the header with a large label stack, however, can be problematic from a performance perspective. Further, adding the concept of actions to SRv6 would bring a lot of new functionality into view. On this episode of the Hedge podcast, Ron Bonica joins Russ White to talk about SRm6, or Segment Routing Mapped to the v6 address space, which compacts the label stack and actions into a smaller space, resulting in an easier to deploy version of SRv6.

download