Complexity Bites Back

What percentage of business-impacting application outages are caused by networks? According to a recent survey by the Uptime Institute, about 30% of the 300 operators they surveyed, 29% have experienced network related outages in the last three years—the highest percentage of causes for IT failures across the period.

A secondary question on the survey attempted to “dig a little deeper” to understand the reasons for network failure; the chart below shows the result.

The Hedge 74: Brian Keys and the Complexity of User Interfaces

Crossing from the domain of test pilots to the domain of network engineering might seem like a large leap indeed—but user interfaces and their tradeoffs are common across physical and virtual spaces. Brian Keys, Eyvonne Sharp, Tom Ammon, and Russ White as we start with user interfaces and move into a wider discussion around attitudes and beliefs in the network engineering world.

You Can Always Add Another Layer of Indirection (RFC1925, Rule 6a)

Many within the network engineering community have heard of the OSI seven-layer model, and some may have heard of the Recursive Internet Architecture (RINA) model. The truth is, however, that while protocol designers may talk about these things and network designers study them, very few networks today are built using any of these models. What is often used instead is what might be called the Infinitely Layered Functional Indirection (ILFI) model of network engineering. In this model, nothing is solved at a particular layer of the network if it can be moved to another layer, whether successfully or not.

On Using the Right Word

A while back, I was sitting in a meeting where the presenter described switching from a “traditional, hierarchical data center fabric” to a spine-and-leaf (while drawing CLOS, in all capital letters, on the whiteboard). He pointed out that the spine-and-leaf design is simpler because it only has two tiers rather than three.

There is so much wrong with this I almost winced in physical pain. Traditional hierarchical designs are not fabrics. Spine-and-leaf fabrics are not CLOS, but Clos, fabrics. Clos fabrics have three stages, not two—even if we draw them “folded” so you only see two apparent levels to the fabric. In fact, all spine-and-leaf fabrics always have an odd number of stages, and they are stages, not tiers.

The Hedge 73: Daniel Teycheney and Open Source in Networking

Combining, or stitching together, open source projects to build something unique for your network is becoming more common. What does this look like in the real world? What are some of the positive and negative aspects of building things this way? How do open source projects interact with the commercial world? Daniel Teycheney joins Tom Ammon, Jett Tantsura, and Russ White to discuss open source software in networking, particularly around network monitoring and management.

Rethinking BGP on the DC Fabric (part 5)

BGP is widely used as an IGP in the underlay of modern DC fabrics. This series argues this is not the best long-term solution to the problem of routing in fabrics because BGP is not ideal for this use case. This post will consider the potential harm we are doing to the larger Internet by pressing BGP into a role it was not originally designed to fulfill—an underlay protocol or an IGP.

My last post described the kinds of configuration required to make BGP work on a DC fabric—it turns out that the configuration of each BGP speaker on the fabric is close to unique. It is possible to automate configuring each speaker—but it would be better if we could get closer to autonomic operation.

The Hedge 72: Lisa Caywood and Marketectures

The open source world is not much different than the commercial world in terms of building marketectures rather than useable software—largely because open source projects still rely on sources of funding and material support to build and maintain a product. Many times, however, the focus on these marketectures get in the way of real work. Join Tom Ammon, Russ White, and Lisa Caywood as we discuss the problem of marketectures and the broader world of open source software.

Technologies that Didn’t: Directory Services

One of the most important features of the Network Operating Systems, like Banyan Vines and Novell Netware, available in the middle of the 1980’s was their integrated directory system. These directory systems allowed for the automatic discovery of many different kinds of devices attached to a network, such as printers, servers, and computers. Printers, of course, were the important item in this list, because printers have always been the bane of the network administrator’s existence. An example of one such system, an early version of Active Directory, is shown in the illustration below.

Rethinking BGP on the DC Fabric (part 4)

Before I continue, I want to remind you what the purpose of this little series of posts is. The point is not to convince you to never use BGP in the DC underlay ever again. There’s a lot of BGP deployed out there, and there are lot of tools that assume BGP in the underlay. I doubt any of that is going to change. The point is to make you stop and think!

Why are we deploying BGP in this way? Is this the right long-term solution? Should we, as a community, be rethinking our desire to use BGP for everything? Are we just “following the crowd” because … well … we think it’s what the “cool kids” are doing, or because “following the crowd” is what we always seem to do?

In my last post, I argued that BGP converges much more slowly than the other options available for the DC fabric underlay control plane. The pushback I received was two-fold. First, the overlay converges fast enough; the underlay convergence time does not really factor into overall convergence time. Second, there are ways to fix things.

The Hedge 71: Nick Russo and Automating Productivity

When we think of automation—and more broadly tooling—we tend to think of automating the configuration, monitoring, and (possibly) the monitoring of a network. On the other hand, a friend once observed that when interviewing coders, the first thing he asked was about the tools they had developed and used for making themselves more efficient. This “self-tooling” process turns out to be important not just to be more efficient at work, but to use time more effectively in general. Join Nick Russo, Eyvonne Sharp, Tom Ammon, and Russ White as we discuss self-tooling.