From time to time, someone publishes a new blog post lauding the wonderfulness of BGPsec, such as this one over at the Internet Society. In return, I sometimes feel like I am a broken record discussing the problems with the basic idea of BGPsec—while it can solve some problems, it creates a lot of new ones. Overall, BGPsec, as defined by the IETF Secure Interdomain (SIDR) working group is a “bad idea,” a classic study in the power of unintended consequences, and the fond hope that more processing power can solve everything. To begin, a quick review of the operation of BGPsec might be in order. Essentially, each AS in the AS Path signs the “BGP update” as it passes through the internetwork, as shown below.
In this diagram, assume AS65000 is originating some route at A, and advertising it to AS65001 and AS65002 at B and C. At B, the route is advertised with a cryptographic signature “covering” the first two hops in the AS Path, AS65000 and AS65001. At C, the route is advertised with a cryptogrphic signature “covering” the first two hops in the AS Path, AS65000 and AS65002. When F advertises this route to H, at the AS65001 to AS65003 border, it again signs the AS Path, including the AS F is advertising the route to, so the signed path includes AS65000, AS65001, and AS65003.
To validate the route, H can use AS65000’s public key to verify the signature over the first two hops in the AS Path. This shows that AS65000 not only did advertise the route to AS65001, but also that it intended to advertise this route to AS65001. In this way, according to the folks working on BGPsec, the intention of AS65000 is laid bare, and the “path of the update” is cryptographically verified through the network.
Except, of course, there is no such thing as an “update” in BGP that is carried from A to H. Instead, at each router along the way, the information stored in the update is broken up and stored in different memory structures, and then rebuilt to be transmitted to specific peers as needed. BGPsec, then, begins with a misunderstanding of how BGP actually works; it attempts to validate the path of an update through an internetwork—and this turns out to be the one piece of information that doesn’t matter all that much in security terms.
But set this problem aside for a moment, and consider how this actually works. First, B, before the signatures, could have sent a single update to multiple peers. After the signatures, each peer must receive its own update. One of the primary ways BGP uses to increase performance is in gathering updates up and sending one update whenever possible using either a peer group or an update group. Worse yet, every reachable destination—NLRI—now must be carried in its own update. So this means no packing, and no peer groups. The signatures themselves must be added to the update packets, as well, which means they must be stored, carried across the wire, etc. T
he general assumption in the BGPsec community is the resulting performance problems can be resolved by just upping the processor and bandwidth. In practice, this means replacing every eBGP speaker in the internetwork—perhaps hundreds of thousands of them in the ‘net—to support this functionality. “At what cost,” and “for what tradeoffs,” are questions that are almost never asked.
But let’s lay aside this problem for a moment, and just assumed every eBGP speaking router in the entire ‘net could be replaced tomorrow, at no cost to anyone. Okay, all the BGP AS Path problems are now solved right? Not so fast…
Assume, for a moment, that AS65000 and AS65001 break their peering relationship for some reason. At the moment the B to D peering relationship is shut down, D still has a copy of the signed updates it has been using. How long can AS65001 continue advertising connectivity to this route? Given the signatures are in band, carried in the BGP update as constructed at B, and transmitted to D. So long as AS65001 has a copy of a single update, it can appear to remain connected to AS65000, even though the connection has been shut down. The answer, then, is that AS65000 must somehow invalidate the updates it previously sent to AS65001. There are three ways to do this.
First, AS65000 could roll its public and private key pair. This might work, so long as peering and depeering events are relatively rare, but they seem to be all to common in the real world. Further, until the the new public and private key pairs are distributed, and until new routes can be sent through the internetwork using these new keys, the old keys must remain in place to prevent a routing disruption. How long is this? Even if it is 24 hours, probably a reasonable number, AS65001 has the means to grab traffic that is destined to AS65000 and do what it likes with that traffic. Are you comfortable with this?
Second, the community could build a certificate revocation list. This is a mess, so there’s no point in going there.
Third, you could put a timer in the BGP update, like a Link State Update. Once the timer runs down or our, the advertisement must be replaced. Given there are 800k routes in the default free zone, a timer of 24 hours (which would still make me uncomfortable in security terms), there would need to be 800k/24 hours updates per hour added to the load of every router in the Internet. On top of the reduced performance noted above.
Again, it is useful to set this problem aside, and assume it can be solved with the wave of a magic wand someplace. Perhaps someone comes up with a way to add a timer without causing any additional load, or a new form of revocation list is created that has none of the problems of any sort known about today. Given these, all the BGP AS Path problems in the Internet are solved, right?
Again, no. Consider, for a moment, the position of AS65001 and AS65002. These are transit providers, companies that rely on their customer’s trust, and their ability to out compete in the area of peering, to make money. First, signing updates means that you are declaring, to the entire world, in a legally provable way, who your customers are. This, from what I understand of the provider business model, is not only a no-no, but a huge legal issue. But this is actually, still, the simpler problem to solve.
Second, you cannot deploy this kind of system with a single, centrally stored private key. Assume, for a moment, that you do solve the problem this way. What happens if a single eBGP speaker is compromised? What if you need to replace a single eBGP speaker? You must roll your AS level private key. And replace all your advertisements in the entire Internet. This, from a security standpoint, is a bad idea.
Okay—the reasonable alternative is to create a private key per eBGP speaker. This private key would have its own public key, which would, in turn, be signed by the AS level private key. There are two problems with this scheme, however. The first is: when H validates the signature on some update it has received, it must now find not only the AS level private keys for AS65000 and AS65001, it must find the private key for B and F. This is going to be messy. The second is: By examining the private keys I receive in a collection of “every update on the Internet,” I can now map the actual peering points between every pair of autonomous systems in the world. All the secret sauce in peering relationships? Exposed. Which router (or set of routers) to attack to impact the business of a specific company? Exposed.
The bottom line is this: even given BGPsec’s flawed view of the way BGP works, even given BGPsec’s flawed view of what needs to be secured, even giving BGPsec implementations the benefit of doing the impossible (adding state and processing without impacting performance), even given some magical form of replay attack prevention that costs nothing, BGPsec still exposes information no-one really wants exposed. The tradeoffs are ultimately unacceptable.
Which all comes back to this: If you haven’t found the tradeoffs, you haven’t looked hard enough.