Why are networks so insecure?
One reason is we don’t take network security seriously. We just don’t think of the network as a serious target of attack. Or we think of security as a problem “over there,” something that exists in the application realm, that needs to be solved by application developers. Or we think the consequences of a network security breach as “well, they can DDoS us, and then we can figure out how to move load around, so if we build with resilience (enough redundancy) we’re already taking care of our security issues.” Or we put our trust in the firewall, which sits there like some magic box solving all our problems.
While those working in the network engineering world are quite familiar with the expression “it is always something!,” defining this (often exasperated) declaration is a little trickier. The wise folks in the IETF, however, have provided a definition in RFC1925. Rule 7, “it is always something,” is quickly followed with a corollary, rule 7a, which says: “Good, Fast, Cheap: Pick any two (you can’t have all three).”
Jack of all trades, master of none.
This singular saying—a misquote of Benjamin Franklin (more on this in a moment)—is the defining statement of our time. An alternative form might be the fox knows many small things, but the hedgehog knows one big thing.
What percentage of business-impacting application outages are caused by networks? According to a recent survey by the Uptime Institute, about 30% of the 300 operators they surveyed, 29% have experienced network related outages in the last three years—the highest percentage of causes for IT failures across the period.
A secondary question on the survey attempted to “dig a little deeper” to understand the reasons for network failure; the chart below shows the result.
Many within the network engineering community have heard of the OSI seven-layer model, and some may have heard of the Recursive Internet Architecture (RINA) model. The truth is, however, that while protocol designers may talk about these things and network designers study them, very few networks today are built using any of these models. What is often used instead is what might be called the Infinitely Layered Functional Indirection (ILFI) model of network engineering. In this model, nothing is solved at a particular layer of the network if it can be moved to another layer, whether successfully or not.
A while back, I was sitting in a meeting where the presenter described switching from a “traditional, hierarchical data center fabric” to a spine-and-leaf (while drawing CLOS, in all capital letters, on the whiteboard). He pointed out that the spine-and-leaf design is simpler because it only has two tiers rather than three.
There is so much wrong with this I almost winced in physical pain. Traditional hierarchical designs are not fabrics. Spine-and-leaf fabrics are not CLOS, but Clos, fabrics. Clos fabrics have three stages, not two—even if we draw them “folded” so you only see two apparent levels to the fabric. In fact, all spine-and-leaf fabrics always have an odd number of stages, and they are stages, not tiers.
BGP is widely used as an IGP in the underlay of modern DC fabrics. This series argues this is not the best long-term solution to the problem of routing in fabrics because BGP is not ideal for this use case. This post will consider the potential harm we are doing to the larger Internet by pressing BGP into a role it was not originally designed to fulfill—an underlay protocol or an IGP.
My last post described the kinds of configuration required to make BGP work on a DC fabric—it turns out that the configuration of each BGP speaker on the fabric is close to unique. It is possible to automate configuring each speaker—but it would be better if we could get closer to autonomic operation.
One of the most important features of the Network Operating Systems, like Banyan Vines and Novell Netware, available in the middle of the 1980’s was their integrated directory system. These directory systems allowed for the automatic discovery of many different kinds of devices attached to a network, such as printers, servers, and computers. Printers, of course, were the important item in this list, because printers have always been the bane of the network administrator’s existence. An example of one such system, an early version of Active Directory, is shown in the illustration below.
Before I continue, I want to remind you what the purpose of this little series of posts is. The point is not to convince you to never use BGP in the DC underlay ever again. There’s a lot of BGP deployed out there, and there are lot of tools that assume BGP in the underlay. I doubt any of that is going to change. The point is to make you stop and think!
Why are we deploying BGP in this way? Is this the right long-term solution? Should we, as a community, be rethinking our desire to use BGP for everything? Are we just “following the crowd” because … well … we think it’s what the “cool kids” are doing, or because “following the crowd” is what we always seem to do?
In my last post, I argued that BGP converges much more slowly than the other options available for the DC fabric underlay control plane. The pushback I received was two-fold. First, the overlay converges fast enough; the underlay convergence time does not really factor into overall convergence time. Second, there are ways to fix things.
Someone recently asked me to suggest a list of books on thinking skills; I figured others might be interested in the list, as well, so … I decided to post it here. Further, I’ve added a few books to my “recommended book list” here on rule11; I thought I’d point those out, as well. My first suggestion, of course, is that if you want to improve your thinking skills, read. I don’t just mean technical stuff, I mean all over the place, in the form of books, and a lot.
So, forthwith, some more things to read.