BGPsec and Reality

From time to time, someone publishes a new blog post lauding the wonderfulness of BGPsec, such as this one over at the Internet Society. In return, I sometimes feel like I am a broken record discussing the problems with the basic idea of BGPsec—while it can solve some problems, it creates a lot of new ones. Overall, BGPsec, as defined by the IETF Secure Interdomain (SIDR) working group is a “bad idea,” a classic study in the power of unintended consequences, and the fond hope that more processing power can solve everything. To begin, a quick review of the operation of BGPsec might be in order. Essentially, each AS in the AS Path signs the “BGP update” as it passes through the internetwork, as shown below.

In this diagram, assume AS65000 is originating some route at A, and advertising it to AS65001 and AS65002 at B and C. At B, the route is advertised with a cryptographic signature “covering” the first two hops in the AS Path, AS65000 and AS65001. At C, the route is advertised with a cryptogrphic signature “covering” the first two hops in the AS Path, AS65000 and AS65002. When F advertises this route to H, at the AS65001 to AS65003 border, it again signs the AS Path, including the AS F is advertising the route to, so the signed path includes AS65000, AS65001, and AS65003.

To validate the route, H can use AS65000’s public key to verify the signature over the first two hops in the AS Path. This shows that AS65000 not only did advertise the route to AS65001, but also that it intended to advertise this route to AS65001. In this way, according to the folks working on BGPsec, the intention of AS65000 is laid bare, and the “path of the update” is cryptographically verified through the network.

Except, of course, there is no such thing as an “update” in BGP that is carried from A to H. Instead, at each router along the way, the information stored in the update is broken up and stored in different memory structures, and then rebuilt to be transmitted to specific peers as needed. BGPsec, then, begins with a misunderstanding of how BGP actually works; it attempts to validate the path of an update through an internetwork—and this turns out to be the one piece of information that doesn’t matter all that much in security terms.

But set this problem aside for a moment, and consider how this actually works. First, B, before the signatures, could have sent a single update to multiple peers. After the signatures, each peer must receive its own update. One of the primary ways BGP uses to increase performance is in gathering updates up and sending one update whenever possible using either a peer group or an update group. Worse yet, every reachable destination—NLRI—now must be carried in its own update. So this means no packing, and no peer groups. The signatures themselves must be added to the update packets, as well, which means they must be stored, carried across the wire, etc.

The general assumption in the BGPsec community is the resulting performance problems can be resolved by just upping the processor and bandwidth. That BGPsec has been around for 20 years, and the performance problem still hasn’t been solved is not something anyone seems to consider. 🙂 In practice, this also means replacing every eBGP speaker in the internetwork—perhaps hundreds of thousands of them in the ‘net—to support this functionality. “At what cost,” and “for what tradeoffs,” are questions that are almost never asked.

But let’s lay aside this problem for a moment, and just assumed every eBGP speaking router in the entire ‘net could be replaced tomorrow, at no cost to anyone. Okay, all the BGP AS Path problems are now solved right? Not so fast…

Assume, for a moment, that AS65000 and AS65001 break their peering relationship for some reason. At the moment the B to D peering relationship is shut down, D still has a copy of the signed updates it has been using. How long can AS65001 continue advertising connectivity to this route? The signatures are in band, carried in the BGP update as constructed at B, and transmitted to D. So long as AS65001 has a copy of a single update, it can appear to remain connected to AS65000, even though the connection has been shut down. The answer, then, is that AS65000 must somehow invalidate the updates it previously sent to AS65001. There are three ways to do this.

First, AS65000 could roll its public and private key pair. This might work, so long as peering and depeering events are relatively rare, and the risk from such depeering situations is small. But are they? Further, until the new public and private key pairs are distributed, and until new routes can be sent through the internetwork using these new keys, the old keys must remain in place to prevent a routing disruption. How long is this? Even if it is 24 hours, probably a reasonable number, AS65001 has the means to grab traffic that is destined to AS65000 and do what it likes with that traffic. Are you comfortable with this?

Second, the community could build a certificate revocation list. This is a mess, so there’s no point in going there.

Third, you could put a timer in the BGP update, like a Link State Update. Once the timer runs down or our, the advertisement must be replaced. Given there are 800k routes in the default free zone, a timer of 24 hours (which would still make me uncomfortable in security terms), there would need to be 800k/24 hours updates per hour added to the load of every router in the Internet. On top of the reduced performance noted above.

Again, it is useful to set this problem aside, and assume it can be solved with the wave of a magic wand someplace. Perhaps someone comes up with a way to add a timer without causing any additional load, or a new form of revocation list is created that has none of the problems of any sort known about today. Given these, all the BGP AS Path problems in the Internet are solved, right?

Consider, for a moment, the position of AS65001 and AS65002. These are transit providers, companies that rely on their customer’s trust, and their ability to out compete in the area of peering, to make money. First, signing updates means that you are declaring, to the entire world, in a legally provable way, who your customers are. This, from what I understand of the provider business model, is not only a no-no, but a huge legal issue. But this is actually, still, the simpler problem to solve.

Second, you cannot deploy this kind of system with a single, centrally stored private key. Assume, for a moment, that you do solve the problem this way. What happens if a single eBGP speaker is compromised? What if you need to replace a single eBGP speaker? You must roll your AS level private key. And replace all your advertisements in the entire Internet. This, from a security standpoint, is a bad idea.

Okay—the reasonable alternative is to create a private key per eBGP speaker. This private key would have its own public key, which would, in turn, be signed by the AS level private key. There are two problems with this scheme, however. The first is: when H validates the signature on some update it has received, it must now find not only the AS level public keys for AS65000 and AS65001, it must find the public key for B and F. This is going to be messy. The second is: By examining the publickeys I receive in a collection of “every update on the Internet,” I can now map the actual peering points between every pair of autonomous systems in the world. All the secret sauce in peering relationships? Exposed. Which router (or set of routers) to attack to impact the business of a specific company? Exposed.

The bottom line is this: even setting aside BGPsec’s flawed view of the way BGP works, even setting aside BGPsec’s flawed view of what needs to be secured, even setting aside BGPsec implementations the benefit of doing the impossible (adding state and processing without impacting performance), even given some magical form of replay attack prevention that costs nothing, BGPsec still exposes information no-one really wants exposed. The tradeoffs are ultimately unacceptable.

Which all comes back to this: If you haven’t found the tradeoffs, you haven’t looked hard enough.

Securing BGP: A Case Study

What would it take to secure BGP? Let’s begin where any engineering problem should begin: what problem are we trying to solve? This series of posts walks through a wide range of technical and business problems to create a solid set of requirements against which to measure proposed solutions for securing BGP in the global Internet, and then works through several proposed solutions to see how they stack up.

Post 1: An introduction to the problem space
Post 2: What can I prove in a routing system?
Post 3: What can I prove in a routing system?
Post 4: Centralized or decentralized?
Post 5: Centralized or decentralized?
Post 6: Business issues with centralization
Post 7: Technical issues with centralization
Post 8: A full requirements list
Post 9: BGPSEC (S-BGP) compared to the requirements
Post 10: RPKI compared to the requirements

I will continue updating this post as I work through the remaining segments of this series.

Securing BGP: A Case Study (10)

The next proposed (and actually already partially operational) system on our list is the Router Public Key Infrastructure (RPKI) system, which is described in RFC7115 (and a host of additional drafts and RFCs). The RPKI systems is focused on solving a single solution: validating that the originating AS is authorized to originate a particular prefix. An example will be helpful; we’ll use the network below.


(this is a graphic pulled from a presentation, rather than one of my usual line drawings)

Assume, for a moment, that AS65002 and AS65003 both advertise the same route, 2001:db8:0:1::/64, towards AS65000. How can the receiver determine if both of these two advertisers can actually reach the destination, or only one can? And, if only one can, how can AS65000 determine which one is the “real thing?” This is where the RPKI system comes into play. A very simplified version of the process looks something like this (assuming AS650002 is the true owner of 2001:db8:0:1::/64):

  • AS65002 obtains, from the Regional Internet Registry (labeled the RIR in the diagram), a certificate showing AS65002 has been issued 2001:db8:0:1::/64.
  • AS65002 places this certificate into a local database that is synchronized with all the other operators participating in the routing system.
  • When AS65000 receives a route towards 2001:db8:0:1::/64, it checks this database to make certain the origin AS on the advertisement matches the owning AS.

If the owner and the origin AS match, AS65000 can increase the route’s preference. If it doesn’t AS65000 can reduce the route’s preference. It might be that AS65000 discards the route if the origin doesn’t match—or it may not. For instance, AS65003 may know, from historical data, or through a strong and long standing business relationship, or from some other means, that 2001:db8:0:1::/64 actually belongs to AS65004, even through the RPKI data claims it belongs to AS65002. Resolving such problems falls to the receiving operator—the RPKI simply provides more information on which to act, rather than dictating a particular action to take.

Let’s compare this to our requirements to see how this proposal stacks up, and where there might be objections or problems.

Centralized versus Decentralized: The distribution of the origin authentication information is currently undertaken with rsync, which means the certificate system is decentralized from a technical perspective.

However—there have been technical issues with the rsync solution in the past, such that it can take up to 24 hours to change the distributed database. This is a pretty extreme case of eventual consistency, and it’s a major problem in the global default free zone. BGP might converge very slowly, but it still converges more quickly than 24 hours.

Beyond the technical problems, there is a business side to the centralized/decentralized issue as well. Specifically, many business don’t want their operations impacted by contract issues, negotiation issues, and the like. Many large providers see the RPKI system as creating just such problems, as the “trust anchor” is located in the RIRs. There are ways to mitigate this—just use some other root, or even self sign your certificates—but the RPKI system faces an uphill battle in this are from large transit providers.

Cost: The actual cost of setting up and running a server doesn’t appear to be very high within the RPKI system. The only things you need to “get into the game” are a couple of VMs or physical servers to run rsync, and some way to inject the information gleaned from the RPKI system into the routing decisions along the network edge (which could even be just plugging the information into existing policy mechanisms).

The business issue described above can also be counted as a cost—how much would it cost a provider if their origin authentication were taken out of the database for a day or two, or even a week or two, while a contract dispute with the RIR was worked out?

Information Cost: There is virtually no additional information cost involved in deploying the RPKI.

Other thoughts: The RPKI system wasn’t designed to, and doesn’t, validate anything other than the origin in the AS Path. It doesn’t, therefore, allow an operator to detect AS65003, for instance, claiming to be connected to AS65002 even though it’s not (or it’s not supposed to transit traffic to AS65002). This isn’t really a “lack” on the part of the RPKI, it’s just not something it’s designed to do.

Overall, the RPKI is useful, and will probably be deployed by a number of providers, and shunned by others. It would be a good component of some larger system (again, this was the original intent, so this isn’t a lack), but it cannot stand alone as a complete BGP security system.

Securing BGP: A Case Study (9)

There are a number of systems that have been proposed to validate (or secure) the path in BGP. To finish off this series on BGP as a case study, I only want to look at three of them. At some point in the future, I will probably write a couple of posts on what actually seems to be making it to some sort of deployment stage, but for now I just want to compare various proposals against the requirements outlined in the last post on this topic (you can find that post here).

The first of these systems is BGPSEC—or as it was known before it was called BGPSEC, S-BGP. I’m not going to spend a lot of time explaining how S-BGP works, as I’ve written a series of posts over at Packet Pushers on this very topic:

Part 1: Basic Operation
Part 2: Protections Offered
Part 3: Replays, Timers, and Performance
Part 4: Signatures and Performance
Part 5: Leaks

Considering S-BGP against the requirements:

  • Centralized versus decentralized balance: S-BGP distributes path validation information throughout the internetwork, as this information is actually contained in a new attribute carried with route advertisements. Authorization and authentication are implicitly centralized, however, with the root certificates being held by address allocation authorities. It’s hard to say if this is the correct balance.
  • Cost: In terms of financial costs, S-BGP (or BGPSEC) requires every eBGP speaker to perform complex cryptographic operations in line with receiving updates and calculating the best path to each destination. This effectively means replacing every edge router in every AS in the entire world to deploy the solution—this is definitely not cost friendly. Adding to this cost is the simply increase in the table size required to carry all this information, and the loss of commonly used (and generally effective) optimizations.
  • Information cost: S-BGP leaks new information into the global table as a matter of course—not only can anyone see who is peered with whom by examining information gleaned from route view servers, they can even figure out how many actual pairs of routers connect each AS, and (potentially) what other peerings those same routers serve. This huge new chunk of information about provider topology being revealed simply isn’t acceptable.

Overall, then, BGP-SEC doesn’t meet the requirements as they’ve been outlined in this series of posts. Next week, I’ll spend some time explaining the operation of another potential system, a graph overlay, and then we’ll consider how well it meets the requirements as outlined in these posts.