Decision making, especially in large organizations, fails in many interesting ways. Understanding these failure modes can help us cope with seemingly difficult situations, and learn how to make decisions better. On this episode of the Hedge, Federico Lucifredi, Ethan Banks, and Russ White discuss Federico’s thoughts on developing a taxonomy of indecision. You can find his presentation on this topic here.

Posted in CULTURE, HEDGE, LONG AUDIO, SKILLS

Crossing from the domain of test pilots to the domain of network engineering might seem like a large leap indeed—but user interfaces and their tradeoffs are common across physical and virtual spaces. Brian Keys, Eyvonne Sharp, Tom Ammon, and Russ White as we start with user interfaces and move into a wider discussion around attitudes and beliefs in the network engineering world.

download

A while back, I was sitting in a meeting where the presenter described switching from a “traditional, hierarchical data center fabric” to a spine-and-leaf (while drawing CLOS, in all capital letters, on the whiteboard). He pointed out that the spine-and-leaf design is simpler because it only has two tiers rather than three.

There is so much wrong with this I almost winced in physical pain. Traditional hierarchical designs are not fabrics. Spine-and-leaf fabrics are not CLOS, but Clos, fabrics. Clos fabrics have three stages, not two—even if we draw them “folded” so you only see two apparent levels to the fabric. In fact, all spine-and-leaf fabrics always have an odd number of stages, and they are stages, not tiers.

More recently, I heard someone talking about an operating system that was built using microservices. I thought—“that would be at neat trick.” To build something with microservices does not just mean a piece of software using modules—this would be modular application (or operating system) design. Microservices architectures break the application up into the most basic components possible and then scale each kind of component out (rather than up) by spinning new copies of each service as needed. I cannot imagine scaling an operating system out by spinning multiple copies of the same service, and then providing some sort way to spread load across the various copies. Would you have some sort of anycast IPC? An internal DNS server or load balancer?

You can have an OS that natively participates in a larger microservices-based architecture, but what would microservices within the operating system look like, precisely?

Maybe my recent studies in philosophy make me much more attuned to the way we use language in the network engineering world—or maybe I’m just getting old. Whatever it is, our determination to make every word mean everything is driving me nuts.

What is the difference between a router and a switch? There used to be a simple definition—routers rewrite the L2 header and switches don’t. But now that routers switch packets, and switches route packets, the only difference seems to be … buffer depth? Feature set? The line between router and switch is fuzzy to the point of being meaningless, leaving us with no real term to describe a real switch any longer (a device that doesn’t do routing).

What about software defined networks? We’ve been treated to software defined everything now, of course. And intent? I get the point of intent, but we’re already moving down the path of making the meaning so broad that it can even contain configuring the CLI on an old AGS+. And don’t get me started on artificial intelligence, which is often learned to describe something closer to machine learning. Of course machine learning is often used to describe things that are really nothing more than statistical inference.

Maybe it’s time for a general rebellion against the sloppy use of language in network engineering. Or maybe I’m just tilting at yet another windmill. Wake me up when we’ve gotten to the point that we can use any word interchangeably with any other word in the network engineering dictionary. I await the AI that routes packets by reading your mind (through intent) called a swouter… or something.

BGP is widely used as an IGP in the underlay of modern DC fabrics. This series argues this is not the best long-term solution to the problem of routing in fabrics because BGP is not ideal for this use case. This post will consider the potential harm we are doing to the larger Internet by pressing BGP into a role it was not originally designed to fulfill—an underlay protocol or an IGP.

My last post described the kinds of configuration required to make BGP work on a DC fabric—it turns out that the configuration of each BGP speaker on the fabric is close to unique. It is possible to automate configuring each speaker—but it would be better if we could get closer to autonomic operation.

To move BGP closer to autonomic operation in a DC fabric, there are several things we can do. First, we can allow a BGP speaker to peer with any other BGP speaker it receives an open message from—this is often called promiscuous mode. While each router in the fabric will still need to be configured with the right autonomous system, at least we won’t need to configure the correct peers on each router (including the remote AS).

Note, however, that using this kind of promiscuous peering does come with a set of tradeoffs (if you’re reading this blog, you know there will be tradeoffs). BGP speakers running in promiscuous mode open a large attack surface on the control plane of the network. We can close this attack surface by configuring authentication on all BGP speakers … but we are now adding complexity to reduce complexity. We could also reduce the scope of the attack surface by never permitting BGP to peer beyond a single hop, and then filtering all BGP packets at the fabric edge. Again, just a bit more complexity to manage—but remember that the road to highly fragile and complex systems is always paved with individual steps that never, on their own, seem to add “too much complexity.”

The second thing we can do to move BGP closer to autonomic operation is to advertise routes to every connected peer without any policy configured. This does, again, introduce some tradeoffs, particularly in the realm of security, but let’s leave that aside for the moment.

Assume we can create a version of BGP that has these modifications—it always accepts any peer from any other AS, and it advertises all routes without any policy configured. Put these features behind a single knob which also includes setting the MRAI to 0 or 1, tightens up the dampening parameters, and adjusts a few other things to make BGP work better in a DC fabric.

As an experiment, let’s enable this DC fabric knob on a BGP speaker at the edge of a dual-homed “enterprise customer.” What will happen?

The enterprise network will automatically peer to any speaker that sends an open message—a huge security hole on the open Internet—and it will advertise every route it learns even though there is no policy configured. This second issue—advertising routes with no policy configured—can cause the enterprise network to become a transit between two much larger provider networks, crashing out some small corner of the Internet.

This might seem like a trivial issue. After all, just don’t ever enable the DC fabric knob on an eBGP peering session upstream into the DFZ, or any other “real” internetwork. Sure, and just don’t ever hit the brakes when you mean to hit the accelerator, or the accelerator when you mean to hit the brakes. If I had a dime for every time we “just don’t ever make that mistake …” Well, I wouldn’t be blogging, I’d be relaxing in the sun someplace (okay, I’m not likely to ever stop working to sit around and “relax” all the time, but you get the picture anyway).

Maybe—just maybe—it would really be better overall to use two different protocols for IGP and EGP work. Maybe—just maybe—it’s better not to mix these two different kinds of functions in a single protocol. Not only is the single resulting protocol bound to be really complex (most BGP implementations are now over 100,000 lines of code, after all), but it will end up being really easy to make really bad mistakes.

No tool is omnicompetent. If you found a tool that was, in fact, omnicompetent, it would also be the most dangerous tool in your toolbox.

Every software developer has run into “god objects”—some data structure or database that every process must access no matter what it is doing. Creating god objects in software is considered an anti-pattern—something you should not do. Perhaps the most apt description of the god object I’ve seen recently is you ask for a banana, and you get the gorilla as well.

We seem to have a deep desire to solve all the complexity of modern networks through god objects. There was ATM, which was going to solve all our networking problems by allowing the edge device (or a centralized controller) to control the path its traffic takes through the network. There is LISP, which is going to solve every mapping and tunneling/transport problem in the entire networking world (including mobility and security). There is SDN, which is going to solve everything by pushing it all into a controller.

And now there is BGP, which can be a link state protocol (LSVR), the ideal DC fabric control plane, the ideal interdomain protocol, the ideal IGP … a sort-of distributed god object that solves everything, everywhere, all the time (life in the fast lane…).

The problem is, a bunch of people are asking for different bananas, and what we keep getting is the gorilla.

A lot of this is our fault. We crave simplicity so much that we are willing to believe just about anyone, or anything, that promises to solve all the problems we face in building and operating networks in a simple and easy way. The truth is, however, there are no simple solutions to hard problems—solving hard problems requires complex solutions.

Okay, so we can intuitively know god objects are bad, but why? Because they are too complex to really understand, and therefore they cannot be truly operated, nor can you effectively troubleshoot them. There are too many unintended consequences, too many places where you cannot understand the relationship between this and that.

To put this in more human terms—there are many people in the modern world who think that if they just have enough data, they will understand people well enough to operate and troubleshoot them like some kind of machine. Sorry to tell you this—humans are just too complex, and human social institutions, being made up of people (well, duh!), are way more complex than even the most intelligent artificial intelligence can ever “understand.”

We need to fall out of love with the utopia of the god object for once and for all. We need to go back to building simpler systems that solve one or two problems well, and then combining these into intelligible solutions that can be understood, managed, and repaired.

But the move back towards real simplicity has to begin with you.

I tend to be a very private person; I rarely discuss my “real life” with anyone except a few close friends. I thought it appropriate, though, in this season—both the season of the year and this season in my life—to post something a little more personal.

One thing people often remark about my personality is that I seem to be disturbed by very little in life. No matter what curve ball life might throw my way, I take the hit and turn it around, regain my sense of humor, and press forward into the fray more quickly than many expect. This season, combined with a recent curve ball (one of many—few people would suspect the path my life has taken across these 50+ years), and talking to Brian Keys in a recent episode of the Hedge, have given me reason to examine foundational principles once again.

How do I stay “up” when life throws me a curve ball?

Pragmatically, the worst network outage in the world is not likely to equal the stresses I’ve faced in the military, whether on the flight line or in … “other situations.” Life and death were immediately and obviously present in those times. Coming face to face with death—having friend who is there one moment, and not the next—changes your perspective. Knowing you hold the lives of hundreds of people in your hands—that if you make a mistake, real people will die (now!)—changes your perspective. In these times you realize there is more to life than work, or skill, or knowledge.

Spiritually, I am deeply Christian. I am close to God in a real way. I know him, and I trust his character and plans for the future. Job 13:15 and Romans 12:2 are present realities every day, from the moment I wake until the moment I fall asleep.

These two lead to a third observation.

Because of these things, I am grateful.

I am grateful for the people in my life—deeply held friendships, people who have spoken into my life, people who have helped carry me in times of crisis. I am grateful for the things in my life. Gratitude is, in essence, that which turns what you have into what you want (or perhaps enough to fulfill your wants). Gratitude often goes farther, though, teaching you that, all too often, you have more than you deserve.

So this season, whatever your religious beliefs, is a good time to reflect on the importance of all things spiritual, the value of life, the value of friendship, the value of truth, and to decide to have gratitude in the face of every storm. Gratitude causes us to look outside ourselves, and there is power (and healing) in self-forgetfulness. Self-love and self-hate are equal and opposite errors; it is in forgetting yourself, pressing forward, and giving to others, that you find out who you are.

Have a merry Christmas, a miraculous Hanukkah, or … just a joyful time being with family and friends at home. Watch a movie, eat ice cream and cookies, make a new friend, care for someone who has no family to be with.

To paraphrase Abraham Lincoln, most folks are just about as happy as they choose to be—so make the choice to be joyful and grateful.

These are the important things.

Innovation has gained a sort-of mystical aura in our world. Move fast and break stuff. We recognize and lionize innovators in just about every way possible. The result is a general attitude of innovate or die—if you cannot innovate, then you will not progress in your career or life. Maybe it’s time to take a step back and bust some of the innovation myths created by this near idolization of innovation.

You can’t innovate where you are. Reality: innovation is not tied to a particular place and time. “But I work for an enterprise that only uses vendor gear… Maybe if I worked for a vendor, or was deeply involved in open source…” Innovation isn’t just about building new products! You can innovate by designing a simpler network that meets business needs, or by working with your vendor on testing a potential new product. Ninety percent of innovation is just paying attention to problems, along with a sense of what is “too complex,” or where things might be easier.

You don’t work in open source or open standards? That’s not your company’s problem, that’s your problem. Get involved. It’s not just about protocols, anyway. What about certifications, training, and the many other areas of life in information technology? Just because you’re in IT doesn’t mean you have to only invent new technologies.

Innovation must be pursued—it doesn’t “just happen.” We often tell ourselves stories about innovation that imply it “is the kind of thing we can accomplish with a structured, linear process.” The truth is the process of innovation is unpredictable and messy. Why, then, do we tell innovation stories that sound so purposeful and linear?

That’s the nature of good storytelling. It takes a naturally scattered collection of moments and puts them neatly into a beginning, middle, and end. It smoothes out all the rough patches and makes a result seem inevitable from the start, despite whatever moments of uncertainty, panic, even despair we experienced along the way.

Innovation just happens. Either the inspiration just strikes, or it doesn’t, right? You’re just walking along one day and some really innovative idea just jumps out at you. You’re struck by lightning, as it were. This is the opposite of the previous myth, and just as wrong in the other direction.

Innovation requires patience. According to Keith’s Law, any externally obvious improvement in a product is really the result of a large number of smaller changes hidden within the abstraction of the system itself. Innovation is a series of discoveries over months and even years. Innovations are gradual, incremental, and collective—over time.

Innovation often involves combining existing components. If you don’t know what’s already in the field (and usefully adjacent fields), you won’t be able to innovate. Innovation, then, requires a lot of knowledge across a number of subject areas. You have to work to learn to innovate—you can’t fake this.

Innovation often involves a group of people, rather than lone actors. We often emphasize lone actors, but they rarely work alone. To innovate, you have to inteniontally embed yourself in a community with a history of innovation, or build such a community yourself.

Innovation must take place in an environment where failure is seen as a good thing (at least your were trying) rather than a bad one.

Innovative ideas don’t need to be sold. Really? Then let’s look at Qiubi, which “failed after only 7 months of operation and after having received $2 billion in backing from big industry players.” The idea might have been good, but it didn’t catch on. The idea that you can “build a better mousetrap” and “the world will beat a path to your door,” just isn’t true, and it never has been.

The bootom line is…Innovation does require a lot of hard work. You have to prepare your mind, learn to look for problems that can be solved in novel ways, be inquisitive enough to ask why, and if there is a better way, stubborn enough to keep trying, and confident enough to sell your innovation to others. But you can innovate where you are—to believe otherwise is a myth.