BGP is widely used as an IGP in the underlay of modern DC fabrics. This series argues this is not the best long-term solution to the problem of routing in fabrics because BGP is not ideal for this use case. This post will consider the potential harm we are doing to the larger Internet by pressing BGP into a role it was not originally designed to fulfill—an underlay protocol or an IGP.
My last post described the kinds of configuration required to make BGP work on a DC fabric—it turns out that the configuration of each BGP speaker on the fabric is close to unique. It is possible to automate configuring each speaker—but it would be better if we could get closer to autonomic operation.
To move BGP closer to autonomic operation in a DC fabric, there are several things we can do. First, we can allow a BGP speaker to peer with any other BGP speaker it receives an open message from—this is often called promiscuous mode. While each router in the fabric will still need to be configured with the right autonomous system, at least we won’t need to configure the correct peers on each router (including the remote AS).
Note, however, that using this kind of promiscuous peering does come with a set of tradeoffs (if you’re reading this blog, you know there will be tradeoffs). BGP speakers running in promiscuous mode open a large attack surface on the control plane of the network. We can close this attack surface by configuring authentication on all BGP speakers … but we are now adding complexity to reduce complexity. We could also reduce the scope of the attack surface by never permitting BGP to peer beyond a single hop, and then filtering all BGP packets at the fabric edge. Again, just a bit more complexity to manage—but remember that the road to highly fragile and complex systems is always paved with individual steps that never, on their own, seem to add “too much complexity.”
The second thing we can do to move BGP closer to autonomic operation is to advertise routes to every connected peer without any policy configured. This does, again, introduce some tradeoffs, particularly in the realm of security, but let’s leave that aside for the moment.
Assume we can create a version of BGP that has these modifications—it always accepts any peer from any other AS, and it advertises all routes without any policy configured. Put these features behind a single knob which also includes setting the MRAI to 0 or 1, tightens up the dampening parameters, and adjusts a few other things to make BGP work better in a DC fabric.
As an experiment, let’s enable this DC fabric knob on a BGP speaker at the edge of a dual-homed “enterprise customer.” What will happen?
The enterprise network will automatically peer to any speaker that sends an open message—a huge security hole on the open Internet—and it will advertise every route it learns even though there is no policy configured. This second issue—advertising routes with no policy configured—can cause the enterprise network to become a transit between two much larger provider networks, crashing out some small corner of the Internet.
This might seem like a trivial issue. After all, just don’t ever enable the DC fabric knob on an eBGP peering session upstream into the DFZ, or any other “real” internetwork. Sure, and just don’t ever hit the brakes when you mean to hit the accelerator, or the accelerator when you mean to hit the brakes. If I had a dime for every time we “just don’t ever make that mistake …” Well, I wouldn’t be blogging, I’d be relaxing in the sun someplace (okay, I’m not likely to ever stop working to sit around and “relax” all the time, but you get the picture anyway).
Maybe—just maybe—it would really be better overall to use two different protocols for IGP and EGP work. Maybe—just maybe—it’s better not to mix these two different kinds of functions in a single protocol. Not only is the single resulting protocol bound to be really complex (most BGP implementations are now over 100,000 lines of code, after all), but it will end up being really easy to make really bad mistakes.
No tool is omnicompetent. If you found a tool that was, in fact, omnicompetent, it would also be the most dangerous tool in your toolbox.
If you go back and examine all the steps taken to implement the shopping API in this series, you’ll see a purposeful decision to always stick to the simplest possible scenarios. In the process, you end up with the simplest possible solutions.
The Australian Domain Name Administration, AUDA, has recently published its quarterly report for the last quarter of 2020. The report contained the interesting snippet: “The rapid digitisation of our lives and economy – necessitated by COVID-19 – continued to underpin strong growth in .au registrations.”
As a developer, your company hired you to build incredible products that focus on your users’ and customers’ needs. Yet, in the age of microservices, producing the best products relies heavily on efficient cloud service connectivity.
“Observability” expands on that monitoring and enables correlation and inspection of the raw data to provide much deeper insights. With the proper instrumentation, observability allows an enterprise, both inside and outside of the security organization, to solve an extensive number of use cases.
As the world becomes more and more reliant on electronics, it’s worth a periodic reminder that a large solar flare could knock out much of the electronics on earth. Such an event would be devastating to the Internet, satellite broadband, and the many electronics we use in daily life.
The cellular companies have made an unprecedented push to get customers interested in 5G. Back in November, I recorded a college football game which enabled me to go back and count the twelve 5G commercials during the game.
In 2021, there are more reasons why people love Linux than ever before. In this series, I’ll share 21 different reasons to use Linux. This article discusses Linux’s influence on the security of open source software.
As most developers can attest, dealing with security protocols and identity tokens, as well as user and device authentication, can be challenging. Imagine having multiple protocols, multiple tokens, 200M+ users, and thousands of device types, and the problem can explode in scope.
In recent years, concerns about the erosion of trust and invasion of privacy have extended into nearly every interaction between customers, organizations, and devices. Lawmakers have continued to respond with new privacy and data protection laws that put pressure on the related industries.
When it comes to information security, most would agree that guessing is no substitute for knowing. While hope is not a strategy, many organizations employ this approach and don’t run a proof of concept when selecting a new security technology.
In December 2018, researchers at Google detected a group of hackers with their sights set on Microsoft’s Internet Explorer. Even though new development was shut down two years earlier, it’s such a common browser that if you can find a way to hack it, you’ve got a potential open door to billions of computers.
The open source world is not much different than the commercial world in terms of building marketectures rather than useable software—largely because open source projects still rely on sources of funding and material support to build and maintain a product. Many times, however, the focus on these marketectures get in the way of real work. Join Tom Ammon, Russ White, and Lisa Caywood as we discuss the problem of marketectures and the broader world of open source software.
One of the most important features of the Network Operating Systems, like Banyan Vines and Novell Netware, available in the middle of the 1980’s was their integrated directory system. These directory systems allowed for the automatic discovery of many different kinds of devices attached to a network, such as printers, servers, and computers. Printers, of course, were the important item in this list, because printers have always been the bane of the network administrator’s existence. An example of one such system, an early version of Active Directory, is shown in the illustration below.
Users, devices and resources, such as file mounts, were stored in a tree. The root of the tree was (generally) the organization. There were Organizational Units (OUs) under this root. Users and devices could belong to an OU, and be given access to devices and services in other OUs through a fairly simple drag and drop, or GUI based checkbox style interface. These systems were highly developed, making it fairly easy to find any sort of resource, including email addresses of other uses in the organization, services such as shared filers, and—yes—even printers.
The original system of this kind was Banyan’s Streetalk, which did not have the depth or expressiveness of later systems, like the one shown above from Windows NT, or Novell’s Directory Services. A similar system existed in another network operating system called LANtastic, which was never really widely deployed (although I worked on a LANtastic system in the late 1980’s).
The usual “pitch” for deploying these systems was the ease of access control they brought into the organization from the administration side, along with the ease of finding resources from the user’s perspective. Suppose you were sitting at your desk, and needed to know who over in some other department, say accounting, you could contact about some sort of problem, or idea. If you had one of these directory services up and running, the solution was simple: open the directory, look for the accounting OU within the tree, and look for a familiar name. Once you have found them, you could send them an email, find their phone number, or even—if you had permission—print a document at a printer near their desk for them to pick up. Better than a FAX machine, right?
What if you had multiple organizations who needed to work together? Or you really wanted a standard way to build these kinds of directories, rather than being required to run one of the network operating systems that could support such a system? There were two industry wide standards designed to address these kinds of problems: LDAP and X.500.
The OUs, CNs, and other elements shown in the illustration above are actually an expression of the X.500 directory system. As X.500 was standardized starting in the mid-1990’s, these network operating systems changed their native directory systems to match the X.500 schema. The ultimate goal was to make these various directory services interoperate through X.500 connectors.
Given all this background, what happened to these systems? Why are these kinds of directories widely available today? While there are many reasons, two of these stand out.
First, these systems are complex and heavy. Their complexity made them very hard to code and maintain; I can well remember working on a large Netware Directory Service deployment where objects fell into the wrong place on a regular basis, drive mapping did not work correctly, and objects had to be deleted and recreated to force their permissions to reset.
Large, complex systems tend to be unstable in unpredictable ways. One lesson the information technology world has not learned across the years is that abstraction is not enough; the underlying systems themselves must be simplified in a way that makes the abstraction more closely resemble the underlying reality. Abstraction can cover problems up as easily as it can solve problems.
Second, these systems fit better in a world of proprietary protocols and network operating systems than into a world of open protocols. The complexity driven into the network by trying to route IP, Novell’s IPX, Banyan’s VIP, DECnet, Microsoft’s protocols, Apple’s protocols, etc., made building and managing networks ever more complex. Again, while the interfaces were pretty abstractions, the underlying network was also reminiscent of a large bowl of spaghetti. There were even attempts to build IPX/VIP/IP packet translators so a host running Vines’ could communicate with devices on the then nascent global Internet.
Over time, the simplicity of IP, combined with the complexity and expense of these kinds of systems drove them from the scene. Some remnants live on in the directory structure contained in email and office software packages, but they are a shadow of Streettalk, NDS, and the Microsoft equivalent. The more direct descendants of these systems are single sign-on and OAUTH systems that allow you to use a single identity to log into multiple places.
But the primary function of finding things, rather than authenticating them, has long been left behind. Today, if you want to know someone’s email address, you look them up on your favorite social medial network. Or you don’t bother with email at all.
Before I continue, I want to remind you what the purpose of this little series of posts is. The point is not to convince you to never use BGP in the DC underlay ever again. There’s a lot of BGP deployed out there, and there are lot of tools that assume BGP in the underlay. I doubt any of that is going to change. The point is to make you stop and think!
Why are we deploying BGP in this way? Is this the right long-term solution? Should we, as a community, be rethinking our desire to use BGP for everything? Are we just “following the crowd” because … well … we think it’s what the “cool kids” are doing, or because “following the crowd” is what we always seem to do?
In my last post, I argued that BGP converges much more slowly than the other options available for the DC fabric underlay control plane. The pushback I received was two-fold. First, the overlay converges fast enough; the underlay convergence time does not really factor into overall convergence time. Second, there are ways to fix things.
If the first pushback is always true—the speed of the underlay control plane convergence does not matter—then why have an underlay control plane at all? Why not just use a single, merged, control plane for both underlay and overlay? Or … to be a little more shocking, if the speed at which the underlay control plane converges does not matter, why not just configure the entire underlay using … static routes?
The reason we use a dynamic underlay control plane is because we need this foundational connectivity for something. So long as we need this foundational connectivity for something, then that something is always going to be better if it is faster rather than slower.
The second pushback is more interesting. Essentially—because we work on virtual things rather than physical ones, just about anything can be adapted to serve any purpose. I can, for instance, replace BGP’s bestpath algorithm with Dijkstra’s SPF, and BGP’s packet format with a more straight-forward TLV format emulating a link-state protocol, and then say, “see, now BGP looks just like a link-state protocol … I made BGP work really well on a DC fabric.”
Yes, of course you can do these things. Somewhere along the way we became convinced that we are being really clever when we adapt a protocol to do something it wasn’t designed to do, but I’m not certain this is a good way of going about building reliable systems.
Okay, back to the point … the next reason we should rethink BGP on the DC fabric is because it is complex to configure when its being used as an IGP. In my last post, when discussing the configuration required to make BGP converge, I noted AS numbers and AS Path filters must be laid out in a very specific way, following where each device is located in the fabric. The MRAI must be taken down to some minimum on every device (either 0 or 1 second), and individual peers must be configured.
Further, if you are using a version of BGP that follows the IETF’s BCPs for the protocol, you must configure some sort of filter (generally a permit all) to get a BGP speaker to advertise anything to an eBGP peer. If you’re using iBGP, you need to configure route reflectors and tell BGP to advertise multiple paths.
There are two ways to solve this problem. First, you can automate all this configuration—of course! I am a huge fan of automation. It’s an important tool because it can make your network consistent and more secure.
But I’m also realistic enough to know that adding the complexity of an automation system on top of a too-complex system to make things simpler is probably not a really good idea. To give a visual example, consider the possibility of automatically wiping your mouth while eating soup.
Yes, automation can be taken too far. A good rule of thumb might be: automation works best on systems intentionally designed to be simple enough to automate. In this case, perhaps it would be simpler to just use a protocol more directly designed so solve the problem at hand, rather than trying to automate our way out of the problem.
Second, you can modify BGP to be a better fit for use as an IGP in various ways. This post has already run far too long, however, so … I’ll hold off on talking about this until the next post.
In the current quantum computing landscape, there are hardware and software providers with the most prominent players doing both. While the shakeout on which of these companies will become the prominent won’t even begin for a few years yet, there are some companies that have some significant advantages across the quantum spectrum.
NLNOG RING allows engineers to access other networks securely so they can collect troubleshooting data without having to wait for the other side. Ten years after its launch, Martin Pels takes a look back on how the project came to be and how it evolved over the past decade.
Even after 40 years of working to mitigate fileless attacks, the software industry is still struggling to eliminate them. By hijacking the control flow of a running application by exploiting a buffer-overflow vulnerability, fileless malware is responsible for numerous zero-day attacks.
A growing number of ransomware victims appear to be losing confidence that their attackers will delete any data they might have stolen during the attack for additional leverage, even after being paid the demanded ransom.
The security team suddenly hears buzzing overhead at a seemingly secure government site, or critical facility such as a chemical plant. It’s a small, unmanned aerial system (sUAS) – also known as a drone – that’s entered the airspace, presenting an immediate, yet unpredictable threat to the sensitive site.
A significant positive the DNS community could take from 2020 is the receptiveness and responsiveness of the Chromium team to address the large amount of DNS queries being sent to the root server system.
Security researchers at Google have claimed that a quarter of all zero-day software exploits could have been avoided if more effort had been made by vendors when creating patches for vulnerabilities in their software.
Large numbers of US Internet subscribers are using over 1TB a month for the first time as the pandemic continues to boost home-Internet usage, according to research released today by the vendor OpenVault.
In this blog, I discuss how undergraduate science and engineering students perceive their learning in the Corona semester in general and, specifically, what skills they practiced during this year. As it turns out, online learning, which was forced on higher education institutions at the beginning of 2020, has some advantages for science and engineering students in terms of learning habits.
It is the Tuesday morning after a long weekend. You come into work early to get caught up on emails only to find you are completely locked out. You have been hit by a ransomware attack. You ask yourself, “What happened? And how do I fix it?”
Many organizations are migrating their workloads to the cloud. But there are challenges along the way. Specifically, security leaders are concerned about their ability to protect their cloud-based data using secure configurations.
Someone recently asked me to suggest a list of books on thinking skills; I figured others might be interested in the list, as well, so … I decided to post it here. Further, I’ve added a few books to my “recommended book list” here on rule11; I thought I’d point those out, as well. My first suggestion, of course, is that if you want to improve your thinking skills, read. I don’t just mean technical stuff, I mean all over the place, in the form of books, and a lot.
So, forthwith, some more things to read.
- Algorithms in a Nutshell
- The Inquiring Mind
- What Tech Calls Thinking
- Unintended Features
- The Elements of Reasoning
- Deep Work
- Being Logical
Recently Added Books
- From Counterculture to Cyberculture
- Escape from Reason
- The Rise and Triumph of the Modern Self
- Death in the City
- Rational Cybersecurity
- The Age of Access
- Curing Mad Truths
- Called to Freedom