Securing BGP: A Case Study (7)

In the last post on this series on securing BGP, I considered a couple of extra questions around business problems that relate to BGP. This time, I want to consider the problem of convergence speed in light of any sort of BGP security system. The next post (to provide something of a road map) should pull all the requirements side together into a single post, so we can begin working through some of the solutions available. Ultimately, as this is a case study, we’re after a set of tradeoffs for each solution, rather than a final decision about which solution to use.

The question we need to consider here is: should the information used to provide validation for BGP be somewhat centralized, or fully distributed? The CAP theorem tells us that there are a range of choices here, with the two extreme cases being—

A single copy of the database we’re using to provide validation information which is always consistent
Multiple disconnected copies of the database we’re using to provide validation which is only intermittently consistent

Between these two extremes there are a range of choices (reducing all possibilities to these two extremes is, in fact, a misuse of the CAP theorem). To help understand this as a range of tradeoffs, take a look at the chart below—

The further we go to the right along this chart, the more—

- copies of the database there are in existence—more copies means more devices that must have a copy, and hence more devices that must receive and update a local copy, which means slower convergence.
- slower the connectivity between the copies of the database.

In complexity model terms, both of these relate to the interaction surface; slower and larger interaction surfaces face their tradeoff in the amount and speed of state that can be successfully (or quickly) managed in a control plane (hence the tradeoffs we see in the CAP theorem are directly mirrored in the complexity model). Given this, what is it we need out of a system used to provide validation for BGP? Let’s set up a specific situation that might help answer this question.

Assume, for a moment, that your network is under some sort of distributed denial of service (DDoS) attack. You call up some sort of DDoS mitigation provider, and they say something like “just transfer your origin validation to us, so we can advertise the route without it being black holed; we’ll scrub the traffic and transfer the clean flows back to you through a tunnel.” Now ask this: how long are you willing to wait before the DDoS protection takes place? Two or three days? A day? Hours? Minutes? If you can locate that amount of time along the chart above, then you can get a sense of the problem we’re trying to solve.

To put this in different terms: any system that provides BGP validation information must converge at roughly the speed of BGP itself.

So—why not just put the information securing BGP in BGP itself, so that routing converges at the same speed as the validation information? This implies every edge device in my network must handle cryptographic processing to verify the validation information. There are some definite tradeoffs to consider here, but we’ll leave those to the evaluation of proposed solutions.

Before leaving this post and beginning on the process of wrapping up the requirements around securing BGP (to be summarized in the next post), one more point needs to be considered. I’ll just state the point here, because the reason for this requirement should be pretty obvious.

Injecting validation information into the routing system should expose no more information about the peering structure of my AS than can be inferred through data mining of publicly available information. For instance, today I can tell that AS65000 is connected to AS65001. I can probably infer something about their business relationship, as well. What I cannot tell, today, is how many actual eBGP speakers connect the two autonomous systems, nor can I infer anything about the location of those connection points. Revealing this information could lead to some serious security and policy problems for a network operator.

Posted in TECH, WRITTEN

Related