The Effectiveness of AS Path Prepending (2)

Last week I began discussing why AS Path Prepend doesn’t always affect traffic the way we think it will. Two other observations from the research paper I’m working off of are:

  • Adding two prepends will move more traffic than adding a single prepend
  • It’s not possible to move traffic incrementally by prepending; when it works, prepending will end up moving most of the traffic from one inbound path to another

A slightly more complex network will help explain these two observations.

Assume AS65000 would like to control the inbound path for 100::/64. I’ve added a link between AS65001 and 65002 here, but we will still find prepending a single AS to the path won’t make much difference in the path used to reach 100::/64. Why?

Because most providers will have a local policy configured—using local preference—that causes them to choose any available customer connection over other paths. AS65001, on receiving the route to 100::/64 from AS65000, will set the local preference so it will prefer this route over any other route, including the one learned from AS65002. So while the cause is a little different in this case than the situation covered in the first post, the result is the same.

We can, of course, prepend twice onto the AS Path rather than once. What impact would that have here? It still won’t impact the traffic originating in 65005 because AS65001 is the only path available towards 100::64 from their perspective. Prepending cannot change anything if there’s only one path.

However, if most of the traffic destined to 100::/64 coming from AS65006, 7, and 8 rather than from AS65005, prepending two times will allow AS65000 to shift the traffic from the path through AS65002 to the path through AS65001. This example might seem a little contrived. Still, it’s pretty similar to networks that have one connection to some local provider (a cable company or something similar) and one connection to a more prominent national or international provider. Any time you are connected to two different providers who have different ranges of connectivity, prepending two autonomous systems on the AS Path will probably be able to shift traffic from one inbound link to another.

What about prepending more than two hops to the AS Path? Each additional prepend going to shift smaller amounts of traffic. It makes sense that increasing the number of prepends doesn’t shift much more because the further away you get from the edge of the Internet, the more fully connected the autonomous systems are, and the more likely you are to run into some other policy that will override the AS Path in determining the best path. The average length of the AS Path in the Internet is around four; prepending more than this normally won’t have much of an effect on traffic flow

The second question above can also be answered by looking at this network. Why can’t you shift traffic incrementally by prepending onto the AS Path? Because the connectivity close to the edge is probably not meshy enough. You can’t shift over just the traffic from one AS or another; you can only shift traffic from the entire set of autonomous systems behind your upstream from one inbound link to another. You can adjust traffic on a per-prefix basis, however, which can be useful for balancing between two inbound links.

What can you do to control inbound traffic with more certainty? Take a look at this older post for thoughts on using communities and de-aggregation to steer traffic.

The Effectiveness of AS Path Prepending (1)

Just about everyone prepends AS’ to shift inbound traffic from one provider to another—but does this really work? First, a short review on prepending, and then a look at some recent research in this area.

What is prepending meant to do?

Looking at this network diagram, the idea is for AS6500 (each router is in its own AS) to steer traffic through AS65001, rather than AS65002, for 100::/64. The most common method to trying to accomplish this is AS65000 can prepend its own AS number on the AS Path Multiple times. Increasing the length of the AS Path will, in theory, cause a route to be less preferred.

In this case, suppose AS65000 prepends its own AS number on the AS Path once before advertising the route towards AS65001, and not towards AS65002. Assuming there is no link between AS65001 and AS65002, what would we expect to happen? What we would expect is AS65001 will receive one route towards 100::/64 with an AS Path of 2 and use this route. AS65002 will, likewise, receive one route towards 100::/64 with an AS Path of 1 and use this route.

AS65003, however, will receive two routes towards 100::/64, one with an AS Path of 3 through AS65001, and one with an AS Path of 2 through AS65002. All other things being equal (local preference, etc.), AS65003 will prefer the route with the shorter AS Path through AS65002, and select that path to reach 100::/64. AS65004 will only receive one path towards 100::/64, the one through AS65002, because AS65003 will only advertise its best path to AS65004.

The obvious question—how much good does this really do? The only impact on the best path is two hops away, as AS65003, and beyond. The route chosen by AS65001 and AS65002 will not be affected by the prepending.

A recent paper found—

We observe that the effectiveness of prepending can strongly depend on the location (for around 20% of cases, ASPP has moved no targets, while for another 20% , it moved almost all targets).

You might expect As Path prepending to have a much more consistent effect on inbound traffic. Why doesn’t it?

What might not be obvious (the danger of simplified diagrams): if autonomous systems directly attached to AS65001 originate most of the traffic destined to 100::/64, no amount of prepending is going to make any difference in the inbound traffic flow. Assume AS5001 has a connection to some cloud service, AS65002 does not have a connection to the same cloud service, and 100::64 is a local server that communicates with this cloud service on a regular basis. Since AS65001 is the only AS transiting traffic from the cloud service to the server located on the 100::/64 subnet, and AS65001 only has one route to 100::/64, you are not going to be able to shift traffic off that single path no matter how many times you prepend.

The first rule of prepending is location matters. You have to know where the traffic you want to shift is originating, and whether or not it can be shifted.

In my next post on this topic, I’ll continue exploring AS path prepending more in light of the results of the research paper above.

The Hedge 82: Jared Smith and Route Poisoning

Intentionally poisoning BGP routes in the Default-Free Zone (DFZ) would always be a bad thing, right? Actually, this is a fairly common method to steer traffic flows away from and through specific autonomous systems. How does this work, how common is it, and who does this? Jared Smith joins us on this episode of the Hedge to discuss the technique, and his research into how frequently it is used.

download

Ambiguity and complexity: once more into the breach

Recent research into the text of RFCs versus the security of the protocols described came to this conclusion—

While not conclusive, this suggests that there may be some correlation between the level of ambiguity in RFCs and subsequent implementation security flaws.

This should come as no surprise to network engineers—after all, complexity is the enemy of security. Beyond the novel ways the authors use to understand the shape of the world of RFCs (you should really read the paper; it’s really interesting), this desire to increase security by decreasing the ambiguity of specifications is fascinating. We often think that writing better specifications requires having better requirements, but down this path only lies despair.

Better requirements are the one thing a network engineer can never really hope for.

It’s not just that networks are often used as a sort of “complexity sink,” the place where every hard problem goes to be solved. It’s also the uncertainty of the environment in which the network must operate. What new application will be stuffed on top of the network this week? Will anyone tell the network folks about this new application, or just open a ticket when it doesn’t work right? What about all the changes developers are making to applications right now, and their impact on the network? There are link failures, software failures, hardware failures, and the mean time between mistakes. There is the pace of innovation (which I tend to think is a bit overblown—rule11, after all—we are often talking about new products rather than new ideas).

What the network is supposed to do—just provide IP transport between two devices—turns out to be hard. It’s hard because “just transporting packets” isn’t ever enough. These packets must be delivered consistently (jitter and drops) across an ever-changing landscape.

To this end—

[C]omplexity is most succinctly discussed in terms of functionality and its robustness. Specifically, we argue that complexity in highly organized systems arises primarily from design strategies intended to create robustness to uncertainty in their environments and component parts.

Uncertainty is the key word here. What can we do about all of this?

We can reduce uncertainty. There are three ways to reduce uncertainty. First, you can obfuscate it—this is harmful. Second, you can reduce the scope of the job at hand, throwing some of the uncertainty (and therefore complexity) over the cubicle way. This can be useful in some situations, but remember that the less work you’re doing, the less value you add. Beware of self-commodifying.

Finally, you can manage the uncertainty. This generally means using modularization intelligently to partition off problems into smaller sets. It’s easier to solve a set of well-scope problems with little uncertainty than to solve one big problem with unknowable uncertainty.

This might all sound great in theory, but how do we do this in real life? Where does the rubber hit the road? This is what Ethan and I tried to show in Problems and Solutions—how to understand the problems that need to be solved, and then how to solve each of those problems within a larger system. This is also what many parts of The Art of Network Architecture are about, and then again what Jeff and I wrote about in Navigating Network Complexity.

I know it often seems like it’s not worth learning the theory; it’s so much easier to focus on the day-to-day, the configuration of this device, or the shiny thing that vendor just created. It’s easier to assume that if I can just hide all the complexity behind intent or automation, I can get my weekends back.

The truth is that we’re paid to solve hard problems, and solving hard problems involves complexity. We can either try to cover that up, or we can learn to manage it.

Rethinking BGP on the DC Fabric (part 5)

BGP is widely used as an IGP in the underlay of modern DC fabrics. This series argues this is not the best long-term solution to the problem of routing in fabrics because BGP is not ideal for this use case. This post will consider the potential harm we are doing to the larger Internet by pressing BGP into a role it was not originally designed to fulfill—an underlay protocol or an IGP.

My last post described the kinds of configuration required to make BGP work on a DC fabric—it turns out that the configuration of each BGP speaker on the fabric is close to unique. It is possible to automate configuring each speaker—but it would be better if we could get closer to autonomic operation.

To move BGP closer to autonomic operation in a DC fabric, there are several things we can do. First, we can allow a BGP speaker to peer with any other BGP speaker it receives an open message from—this is often called promiscuous mode. While each router in the fabric will still need to be configured with the right autonomous system, at least we won’t need to configure the correct peers on each router (including the remote AS).

Note, however, that using this kind of promiscuous peering does come with a set of tradeoffs (if you’re reading this blog, you know there will be tradeoffs). BGP speakers running in promiscuous mode open a large attack surface on the control plane of the network. We can close this attack surface by configuring authentication on all BGP speakers … but we are now adding complexity to reduce complexity. We could also reduce the scope of the attack surface by never permitting BGP to peer beyond a single hop, and then filtering all BGP packets at the fabric edge. Again, just a bit more complexity to manage—but remember that the road to highly fragile and complex systems is always paved with individual steps that never, on their own, seem to add “too much complexity.”

The second thing we can do to move BGP closer to autonomic operation is to advertise routes to every connected peer without any policy configured. This does, again, introduce some tradeoffs, particularly in the realm of security, but let’s leave that aside for the moment.

Assume we can create a version of BGP that has these modifications—it always accepts any peer from any other AS, and it advertises all routes without any policy configured. Put these features behind a single knob which also includes setting the MRAI to 0 or 1, tightens up the dampening parameters, and adjusts a few other things to make BGP work better in a DC fabric.

As an experiment, let’s enable this DC fabric knob on a BGP speaker at the edge of a dual-homed “enterprise customer.” What will happen?

The enterprise network will automatically peer to any speaker that sends an open message—a huge security hole on the open Internet—and it will advertise every route it learns even though there is no policy configured. This second issue—advertising routes with no policy configured—can cause the enterprise network to become a transit between two much larger provider networks, crashing out some small corner of the Internet.

This might seem like a trivial issue. After all, just don’t ever enable the DC fabric knob on an eBGP peering session upstream into the DFZ, or any other “real” internetwork. Sure, and just don’t ever hit the brakes when you mean to hit the accelerator, or the accelerator when you mean to hit the brakes. If I had a dime for every time we “just don’t ever make that mistake …” Well, I wouldn’t be blogging, I’d be relaxing in the sun someplace (okay, I’m not likely to ever stop working to sit around and “relax” all the time, but you get the picture anyway).

Maybe—just maybe—it would really be better overall to use two different protocols for IGP and EGP work. Maybe—just maybe—it’s better not to mix these two different kinds of functions in a single protocol. Not only is the single resulting protocol bound to be really complex (most BGP implementations are now over 100,000 lines of code, after all), but it will end up being really easy to make really bad mistakes.

No tool is omnicompetent. If you found a tool that was, in fact, omnicompetent, it would also be the most dangerous tool in your toolbox.

Technologies that Didn’t: Directory Services

One of the most important features of the Network Operating Systems, like Banyan Vines and Novell Netware, available in the middle of the 1980’s was their integrated directory system. These directory systems allowed for the automatic discovery of many different kinds of devices attached to a network, such as printers, servers, and computers. Printers, of course, were the important item in this list, because printers have always been the bane of the network administrator’s existence. An example of one such system, an early version of Active Directory, is shown in the illustration below.

Users, devices and resources, such as file mounts, were stored in a tree. The root of the tree was (generally) the organization. There were Organizational Units (OUs) under this root. Users and devices could belong to an OU, and be given access to devices and services in other OUs through a fairly simple drag and drop, or GUI based checkbox style interface. These systems were highly developed, making it fairly easy to find any sort of resource, including email addresses of other uses in the organization, services such as shared filers, and—yes—even printers.

The original system of this kind was Banyan’s Streetalk, which did not have the depth or expressiveness of later systems, like the one shown above from Windows NT, or Novell’s Directory Services. A similar system existed in another network operating system called LANtastic, which was never really widely deployed (although I worked on a LANtastic system in the late 1980’s).

The usual “pitch” for deploying these systems was the ease of access control they brought into the organization from the administration side, along with the ease of finding resources from the user’s perspective. Suppose you were sitting at your desk, and needed to know who over in some other department, say accounting, you could contact about some sort of problem, or idea. If you had one of these directory services up and running, the solution was simple: open the directory, look for the accounting OU within the tree, and look for a familiar name. Once you have found them, you could send them an email, find their phone number, or even—if you had permission—print a document at a printer near their desk for them to pick up. Better than a FAX machine, right?

What if you had multiple organizations who needed to work together? Or you really wanted a standard way to build these kinds of directories, rather than being required to run one of the network operating systems that could support such a system? There were two industry wide standards designed to address these kinds of problems: LDAP and X.500.

The OUs, CNs, and other elements shown in the illustration above are actually an expression of the X.500 directory system. As X.500 was standardized starting in the mid-1990’s, these network operating systems changed their native directory systems to match the X.500 schema. The ultimate goal was to make these various directory services interoperate through X.500 connectors.

Given all this background, what happened to these systems? Why are these kinds of directories widely available today? While there are many reasons, two of these stand out.

First, these systems are complex and heavy. Their complexity made them very hard to code and maintain; I can well remember working on a large Netware Directory Service deployment where objects fell into the wrong place on a regular basis, drive mapping did not work correctly, and objects had to be deleted and recreated to force their permissions to reset.

Large, complex systems tend to be unstable in unpredictable ways. One lesson the information technology world has not learned across the years is that abstraction is not enough; the underlying systems themselves must be simplified in a way that makes the abstraction more closely resemble the underlying reality. Abstraction can cover problems up as easily as it can solve problems.

Second, these systems fit better in a world of proprietary protocols and network operating systems than into a world of open protocols. The complexity driven into the network by trying to route IP, Novell’s IPX, Banyan’s VIP, DECnet, Microsoft’s protocols, Apple’s protocols, etc., made building and managing networks ever more complex. Again, while the interfaces were pretty abstractions, the underlying network was also reminiscent of a large bowl of spaghetti. There were even attempts to build IPX/VIP/IP packet translators so a host running Vines’ could communicate with devices on the then nascent global Internet.

Over time, the simplicity of IP, combined with the complexity and expense of these kinds of systems drove them from the scene. Some remnants live on in the directory structure contained in email and office software packages, but they are a shadow of Streettalk, NDS, and the Microsoft equivalent. The more direct descendants of these systems are single sign-on and OAUTH systems that allow you to use a single identity to log into multiple places.

But the primary function of finding things, rather than authenticating them, has long been left behind. Today, if you want to know someone’s email address, you look them up on your favorite social medial network. Or you don’t bother with email at all.

Rethinking BGP on the DC Fabric (part 4)

Before I continue, I want to remind you what the purpose of this little series of posts is. The point is not to convince you to never use BGP in the DC underlay ever again. There’s a lot of BGP deployed out there, and there are lot of tools that assume BGP in the underlay. I doubt any of that is going to change. The point is to make you stop and think!

Why are we deploying BGP in this way? Is this the right long-term solution? Should we, as a community, be rethinking our desire to use BGP for everything? Are we just “following the crowd” because … well … we think it’s what the “cool kids” are doing, or because “following the crowd” is what we always seem to do?

In my last post, I argued that BGP converges much more slowly than the other options available for the DC fabric underlay control plane. The pushback I received was two-fold. First, the overlay converges fast enough; the underlay convergence time does not really factor into overall convergence time. Second, there are ways to fix things.

If the first pushback is always true—the speed of the underlay control plane convergence does not matter—then why have an underlay control plane at all? Why not just use a single, merged, control plane for both underlay and overlay? Or … to be a little more shocking, if the speed at which the underlay control plane converges does not matter, why not just configure the entire underlay using … static routes?

The reason we use a dynamic underlay control plane is because we need this foundational connectivity for something. So long as we need this foundational connectivity for something, then that something is always going to be better if it is faster rather than slower.

The second pushback is more interesting. Essentially—because we work on virtual things rather than physical ones, just about anything can be adapted to serve any purpose. I can, for instance, replace BGP’s bestpath algorithm with Dijkstra’s SPF, and BGP’s packet format with a more straight-forward TLV format emulating a link-state protocol, and then say, “see, now BGP looks just like a link-state protocol … I made BGP work really well on a DC fabric.”

Yes, of course you can do these things. Somewhere along the way we became convinced that we are being really clever when we adapt a protocol to do something it wasn’t designed to do, but I’m not certain this is a good way of going about building reliable systems. 

Okay, back to the point … the next reason we should rethink BGP on the DC fabric is because it is complex to configure when its being used as an IGP. In my last post, when discussing the configuration required to make BGP converge, I noted AS numbers and AS Path filters must be laid out in a very specific way, following where each device is located in the fabric. The MRAI must be taken down to some minimum on every device (either 0 or 1 second), and individual peers must be configured.

Further, if you are using a version of BGP that follows the IETF’s BCPs for the protocol, you must configure some sort of filter (generally a permit all) to get a BGP speaker to advertise anything to an eBGP peer. If you’re using iBGP, you need to configure route reflectors and tell BGP to advertise multiple paths.

There are two ways to solve this problem. First, you can automate all this configuration—of course! I am a huge fan of automation. It’s an important tool because it can make your network consistent and more secure.

But I’m also realistic enough to know that adding the complexity of an automation system on top of a too-complex system to make things simpler is probably not a really good idea. To give a visual example, consider the possibility of automatically wiping your mouth while eating soup.

Yes, automation can be taken too far. A good rule of thumb might be: automation works best on systems intentionally designed to be simple enough to automate. In this case, perhaps it would be simpler to just use a protocol more directly designed so solve the problem at hand, rather than trying to automate our way out of the problem.

Second, you can modify BGP to be a better fit for use as an IGP in various ways. This post has already run far too long, however, so … I’ll hold off on talking about this until the next post.