On Securing BGP

The US Federal Communications Commission recently asked for comments on securing Internet routing. While I worked on the responses offered by various organizations, I also put in my own response as an individual, which I’ve included below.

I am not providing this answer as a representative of any organization, but rather as an individual with long experience in the global standards and operations communities surrounding the Internet, and with long experience in routing and routing security.

I completely agree with the Notice of Inquiry that “networks are essential to the daily functioning of critical infrastructure [yet they] can be vulnerable to attack” due to insecurities in the BGP protocol. While proposed solutions exist that would increase the security of the BGP routing system, only some of these mechanisms are being widely deployed. This response will consider some of the reasons existing proposals are not deployed and suggest some avenues the Commission might explore to aid the community in developing and deploying solutions.

9: Measuring BGP Security.
At this point, I only know of the systems mentioned in the query for measuring BGP routing security incidents. There have been attempts to build other systems, but none of these systems have been successfully built or deployed. Three problems seem to affect these kinds of systems.

First, there is a general lack of funding for building and maintaining such systems. These kinds of systems require a fair amount of research and creative energy to design, including making the networking community aware of these kinds of tools.

Second, building such a system is difficult because of the nature of inter-provider policy. It is often difficult to tell if some change in the Default Free Zone (DFZ) routing is valid or is somehow related to an attack. False positives can have a very negative impact and are hard to detect and guard against.

Third, these kinds of systems generally focus on a single system—routing—while excluding hints and information that can be gained from other systems (particularly the DNS). This is, at least in part, because of the complexity of each individual system, and the difficulty in understanding how to correlate and understand information from overlapping systems.

10: Deployment of BGP Security Measures.
BGP security is divided into at least four different domains right now.

First is the exposure of policies and information through registries and similar mechanisms (such as peeringdb and whois). These mechanisms can generally be useful at the initial stages of peering, and hence are not very helpful in resolving hijacks, mistakes, etc., in near-real-time within the DFZ.

Second is the set of best common practices, such as BCP38, and represented by the MANRS effort. These will be more fully discussed in answer to question 13.

Third is origin validation, currently represented by the RPKI, which will be considered more fully in answering question 11.

Fourth is a more complete security system, currently represented by BGPSEC, which will be considered more fully in answering question 12.

11: The Commission seeks comment on the extent to which RPKI, as implemented by other regional internet registries, effectively prevents BGP hijacking.
The RPKI can effectively block some hijacking events—so long as most providers implement and “pay attention” to the validation process. There are, however, problems with the RPKI system, including—

  • There is no “quality control” over the contents of the RPKI. Other systems, such as the Internet Routing Registries (IRRs), that store policy and origination information have, over time, deteriorated in terms of the quality of information housed there. There is very little research into the quality of information stored in the RPKI, nor do we have any sense about how the quality of this information will stand up over time.
  • There are some concerns about the centralization of control over resources the RPKI represents. For instance, if a content or transit provider becomes entangled in a contract dispute over some resource with a registry, the registry can use the RPKI system to remove the provider from the Internet, essentially putting the provider out of business. Governments can, in theory, also cause registries to remove a provider’s authorization to use Internet resources. These are areas that may need to be researched and addressed to gain the trust of a larger part of the community.
  • The RPKI system does not expose any information about a route other than the originator. This leaves the possibility of hijacking a route by an Autonomous System (AS) advertising a route even though they cannot reach the destination by simply claiming to be connected to the originating AS.
  • The RPKI system does little to prevent an AS that should not be transiting traffic—end customers such as content providers and “enterprises”—from advertising routes in a way that pulls them into a transit role.

The RPKI system does appear to be gaining widespread acceptance, and its deployment is increasing in scope.

12: The Commission seeks comment on whether and to what extent network operators anticipate integrating BGPsec-capable routers into their networks.
BGPsec has not been deployed by a single provider on other than an experimental basis, as far as I know, and there are no active plans to implement BGPsec by any provider. BGPsec, in general, fails to provide enough additional security to justify the additional costs associated with its deployment. Specifically—

  • Deploying BGPsec on individual routers requires the BGP speaker to perform complex cryptographic operations. No production router in existence today has the processing power to perform these operations quickly enough to be useful. The only apparent solution to this problem is to build specifically designed hardware to perform these operations—no router includes this hardware today, and no plans are in place to include them. The additional costs incurred to allow individual routers to perform these complex cryptographic operations would be prohibitive.
  • If it is run “on the side” by moving the complex cryptographic operations onto a separate device, the cost and complexity of running a network are dramatically increased.
  • BGPsec only signs the reachable destination (NLRI) and AS Path, which are only two components of a route. There are many other components in a route, such as the next hop and communities, which are just as important to the validity of an individual advertisement which are not covered by BGPsec. The signing of a “route” in BGPsec is a term of convenience, rather than a description of what is really signed.
  • BGPsec will only provide some additional security (BGPsec is not “perfect” from a security perspective) if most providers deploy the technology. This leads to a “chicken and egg” problem.
  • BGPsec reduces performance by eliminating specific optimizations, such as update packing, which have an important impact on BGP performance and BGP’s consumption of resources.
  • The additional resources required by BGPsec represent a surface of attack for DDoS attacks against individual routers and, with coordination, against entire networks.
  • BGPsec “freezes BGP in place” by assuming the best way to secure BGP is to “secure the way BGP works.” Deploying BGPsec would restrict future innovation in routing systems, particularly in the global Internet.

To these general problems, there is one further problem—BGPsec does not secure the withdrawal of reachability, only its advertisement. Because of this, BGPsec can only be considered a somewhat partial solution to the problems any BGP security system needs to solve.

Consider a BGP speaker that has received a signed NLRI/AS Path pair (a signed “route”). This BGP speaker can continue advertising this route so long as it appears to be valid—breaking the peering session does not invalidate the route.

Hence, the BGP speaker may mistakenly or intentionally replay this signed reachability information until something within the signed pair invalidates the information. There are four ways the signed route may be invalidated:

  • A “better” route is propagated through the system
  • Some form of “revocation list” is maintained and distributed
  • Each signed route is given a defined “time-to-live,” after which it is invalidated
  • The signing key is revoked and/or replaced

The first is impractical to guarantee in all situations. The second would involve maintaining a “negative routing table,” which is nearly impossible in practice.

The third—adding a time-to-live to BGP reachability information—imposes high operational costs. BGP assumes that so long as a peer advertising a reachable destination maintains the peering session, the destination remains reachable (the route is valid). This assumption replaces the workload of constantly advertising already existing routing information with a single “hello” process to ensure the connection is still valid. A single “hello,” then, is a proxy validating the routing information for hundreds of thousands (potentially millions) of reachable destinations. Routes, in other words, have an implied infinite time-to-live.

Adding a time-to-live to individual routes would mean a BGP speaker must readvertise a given reachable destination periodically for the routing information to continue to be considered valid. According to this site, there are currently 916,000 IPv4 routes carried by a BGP speaker connected to the Internet (the number varies by location, policies implemented, etc.). Note the analysis below does not consider IPv6 routes, which will probably be more numerous.

The time-to-live attached to any route determines how long the information can be replayed. If the originator sets the timer to 168 hours, the route can be replayed for a week before it is invalidated. It is difficult to say how long any given route should be valid, or what level of replay protection any given route requires. This illustration will assume 24 hours would be an average across many routes—but there are strong incentives to set the time-to-live much shorter, and there is little cost to the originator for doing so.

If each of these routes were given a time-to-live of 24 hours, the typical Internet BGP speaker would need to process about 10 updates/second (with the additional cryptographic processing requirements described above) just to process time-to-live expirations.

The impacts of this level of activity in the DFZ—beyond the sheer processing and bandwidth requirements—are wide-ranging. For instance, logging, telemetry, false route detection systems, and the way timers are deployed to dampen and manage high speed flapping events, would all need to be reconsidered and adjusted.

The fourth alternative is for the signing key to be revoked when a route is withdrawn.

If the operator uses a single key to sign all routes being advertised by the AS, then replacing the key on a single route requires re-advertising every route. Readvertising every route is a difficult process, fraught with potential failure modes.

If the operator assigns each BGP speaker a key, then only the key for BGP speakers impacted by withdrawing the route must have their keys changes. Hence, only the routes advertised by or through these individual speakers need to be re-advertised into the routing system. However, assigning each BGP speaker an individual key for signing routing information exposes another set of problems.

Key management is an obvious problem with this solution; the exposure of peering information, and the security implications of that exposure, are non-obvious problems. If each BGP speaker on the edge of a network has its own signing key, then outside observers can determine the actual pair of routers used to connect any two autonomous systems. This creates a “map” of points at which the network can be attacked, and is generally an unacceptable exposure of information for most providers.

These issues have, to this point, prevented any serious plans for deploying BGPsec—and will probably continue to do so for the foreseeable future. The very best that can be hoped for is BGPsec deployment in 10–20 years, and even full deployment would not necessarily improve the overall security posture of the global Internet.

13: For network operators that currently participate in MANRS and comply with its requirements, including support for IETF Best Common Practice standards, the Commission seeks comment on the efficacy of such measures for preventing BGP hijacking.

MANRS, BCP38, and peer-to-peer BGP session encryption (such as TCP-AO) should, in theory, be effective a large part of the unintentional and “unsophisticated” attacks and mistakes that cause large-scale BGP failures. There has been little research attempting to measure the impact of these measures, and it seems difficult to measure their impact.

The MANRS vendor program is an effective mechanism for promoting the common-sense practices, although it could probably be ramped up somewhat, and vendors more strongly encouraged to participate.

These measures should continue to be promoted through education, presentations, and other means, as they do appear to be improving the overall security posture of the Internet. TCP-AO, BCP38, and MANRS should, in particular, be encouraged and emphasized by all parties within the ecosystem.

14: Commission’s Role.
The Commission should focus on supporting the community in developing deployable standards and systems to improve the global routing system.

First, the Commission can encourage governmental organizations, and organizations funded by government organizations, to “go back to basics” and ask specific questions about what needs to be secured, how it can practically be secured, and what the tradeoffs are.

To this point, BGP security efforts have often begun with the question how we can secure the existing operation of BGP. This is not the right question to ask. Instead, the community needs to be encouraged to create and understand what needs to be secured. Possible questions might be—

  • What does valid mean in relation to a route? Must it include the entire route, or is “just” the AS Path and reachable destination “enough?”
  • In relation to the AS Path, is the AS Path given valid in the sense that it exists, and there are no policies preventing the use of this path to reach the given destination?
  • In relation to the reachable destination, how can aggregation and other forms of alternate origination be supported while still answering the questions posed above?
  • Will the providers along the path actually use the given path? Can “quality of path” be ensured? If so, how can the be accomplished without incurring unacceptable costs?
  • How can the effectiveness of the system be measured?
  • How can a system be designed so that increasing deployment increases security? How can the “tragedy of the commons” and “chicken and egg” problems be avoided?

Second, the Commission can encourage providers and operators, including large “enterprise” organizations, to participate in the process of understanding and building global routing system security. To this point, only a few providers have participated in the discussion. Quite often, those participating have a narrow perspective, and have been guided by groups asking the wrong question (as above). The scope of enquiry needs to be expanded.

What the Commission, or any other government organization, should not do is to push a solution from the top down. The IETF community is effective at finding solutions for these kinds of problems, and has vast experience in understanding the intended consequences, the unintended consequences, and operational aspects of deploying technologies at the scale of the Internet. Government agencies need to leverage these capacities, rather than trying to override them.

If funding is provided for research in this area, it should begin with some sort of “open research grant,” rather than selecting one solution to fund. Funding should not have an impact on the selection of a technical solution in open standards organizations (such as the IETF). Funding does, however, play a significant role by impacting the availability of implementations, time spent researching problems, time spent supporting a given solution at open meetings, etc.

The community must return to the beginning and find a solution that works by asking the right questions.

15: The Commission seeks comment on the extent to which the effectiveness of BGP security measures may be related to international participation and coordination.
International coordination and cooperation are basic requirements.

16″ Costs and Benefits.
Please see the answers above, as some of the costs are considered there.

17: The Commission seeks comment on whether the Commission should encourage industry to prioritize the deployment of BGP security measures within the networks on which critical infrastructure and emergency services rely, as a means of helping industry to control costs otherwise associated with a network-wide deployment.

This is an attractive idea from the perspective of finding places where routing security could be deployed at a smaller scale and in a controlled manner to understand how the system works, make improvements in the system, etc. However, I would be concerned about how these kinds of services can be “separated out” for deployment in an effective way.

This kind of deployment would, however, make the problem of incremental deployment a fundamental requirement of any proposed system, which may at least encourage steps in the right direction.

The Hedge 82: Jared Smith and Route Poisoning

Intentionally poisoning BGP routes in the Default-Free Zone (DFZ) would always be a bad thing, right? Actually, this is a fairly common method to steer traffic flows away from and through specific autonomous systems. How does this work, how common is it, and who does this? Jared Smith joins us on this episode of the Hedge to discuss the technique, and his research into how frequently it is used.

download

The Hedge 66: Tyler McDaniel and BGP Peer Locking

Tyler McDaniel joins Eyvonne, Tom, and Russ to discuss a study on BGP peerlocking, which is designed to prevent route leaks in the global Internet. From the study abstract:

BGP route leaks frequently precipitate serious disruptions to interdomain routing. These incidents have plagued the Internet for decades while deployment and usability issues cripple efforts to mitigate the problem. Peerlock, introduced in 2016, addresses route leaks with a new approach. Peerlock enables filtering agreements between transit providers to protect their own networks without the need for broad cooperation or a trust infrastructure.

download

Current Work in BGP Security

I’ve been chasing BGP security since before the publication of the soBGP drafts, way back in the early 2000’s (that’s almost 20 years for those who are math challenged). The most recent news largely centers on the RPKI, which is used to ensure the AS originating an advertisements is authorized to do so (or rather “owns” the resource or prefix). If you are not “up” on what the RPKI does, or how it works, you might find this old blog post useful—its actually the tenth post in a ten post series on the topic of BGP security.

Recent news in this space largely centers around the ongoing deployment of the RPKI. According to Wired, Google and Facebook have both recently adopted MANRS, and are adopting RPKI. While it might not seem like autonomous systems along the edge adopting BGP security best practices and the RPKI system can make much of a difference, but the “heavy hitters” among the content providers can play a pivotal role here by refusing to accept routes that appear to be hijacked. This not only helps these providers and their customers directly—a point the Wired article makes—this also helps the ‘net in a larger way by blocking attackers access to at least some of the “big fish” in terms of traffic.

Leslie Daigle, over at the Global Cyber Alliance—an organization I’d never heard of until I saw this—has a post up explaining exactly how deploying the RPKI in an edge AS can make a big difference in the service level from a customer’s perspective. Leslie is looking for operators who will fill out a survey on the routing security measures they deploy. If you operate a network that has any sort of BGP presence in the default-free zone (DFZ), it’s worth taking a look and filling the survey out.

One of the various problems with routing security is just being able to see what’s in the RPKI. If you have a problem with your route in the global table, you can always go look at a route view server or looking glass (a topic I will cover in some detail in an upcoming live webinar over on Safari Books Online—I think it’s scheduled for February right now). But what about the RPKI? RIPE NCC has released a new tool called the JDR:

Just like RP software, JDR interprets certificates and signed objects in the RPKI, but instead of producing a set of Verified ROA Payloads (VRPs) to be fed to a router, it annotates everything that could somehow cause trouble. It will go out of its way to try to decode and parse objects: even if a file is clearly violating the standards and should be rejected by RP software, JDR will try to process it and present as much troubleshooting information to the end-user afterwards.

You can find the JDR here.

Finally, the folks at APNIC, working with NLnet Labs, have taken a page from the BGP playbook and proposed an opaque object for the RPKI, extending it beyond “just prefixes.” They’ve created a new Resource Tagged Attestations, or RTAs, which can carry “any arbitrary file.” They have a post up explaining the rational and work here.

Reducing RPKI Single Point of Takedown Risk

The RPKI, for those who do not know, ties the origin AS to a prefix using a certificate (the Route Origin Authorization, or ROA) signed by a third party. The third party, in this case, is validating that the AS in the ROA is authorized to advertise the destination prefix in the ROA—if ROA’s were self-signed, the security would be no better than simply advertising the prefix in BGP. Who should be able to sign these ROAs? The assigning authority makes the most sense—the Regional Internet Registries (RIRs), since they (should) know which company owns which set of AS numbers and prefixes.

The general idea makes sense—you should not accept routes from “just anyone,” as they might be advertising the route for any number of reasons. An operator could advertise routes to source spam or phishing emails, or some government agency might advertise a route to redirect traffic, or block access to some web site. But … if you haven’t found the tradeoffs, you haven’t looked hard enough. Security, in particular, is replete with tradeoffs.

Every time you deploy some new security mechanism, you create some new attack surface—sometimes more than one. Deploy a stateful packet filter to protect a server, and the device itself becomes a target of attack, including buffer overflows, phishing attacks to gain access to the device as a launch-point into the private network, and the holes you have to punch in the filters to allow services to work. What about the RPKI?

When the RKI was first proposed, one of my various concerns was the creation of new attack services. One specific attack surface is the control a single organization—the issuing RIR—has over the very existence of the operator. Suppose you start a new content provider. To get the new service up and running, you sign a contract with an RIR for some address space, sign a contract with some upstream provider (or providers), set up your servers and service, and start advertising routes. For whatever reason, your service goes viral, netting millions of users in a short span of time.

Now assume the RIR receives a complaint against your service for whatever reason—the reason for the complaint is not important. This places the RIR in the position of a prosecutor, defense attorney, and judge—the RIR must somehow figure out whether or not the charges are true, figure out whether or not taking action on the charges is warranted, and then take the action they’ve settled on.

In the case of a government agency (or a large criminal organization) making the complaint, there is probably going to be little the RIR can do other than simply revoke your certificate, pulling your service off-line.

Overnight your business is gone. You can drag the case through the court system, of course, but this can take years. In the meantime, you are losing users, other services are imitating what you built, and you have no money to pay the legal fees.

A true story—without the names. I once knew a man who worked for a satellite provider, let’s call them SATA. Now, SATA’s leadership decided they had no expertise in accounts receivables, and they were spending too much time on trying to collect overdue bills, so they outsourced the process. SATB, a competing service, decided to buy the firm SATA outsourced their accounts receivables to. You can imagine what happens next… The accounting firm worked as hard as it could to reduce the revenue SATA was receiving.

Of course, SATA sued the accounting firm, but before the case could make it to court, SATA ran out of money, laid off all their people, and shut their service down. SATA essentially went out of business. They won some money later, in court, but … whatever money they won was just given to the investors of various kinds to make up for losses. The business itself was gone, permanently.

Herein lies the danger of giving a single entity like an RIR, even if they are friendly, honest, etc., control over a critical resource.

A recent paper presented at the ANRW at APNIC caught my attention as a potential way to solve this problem. The idea is simple—just allow (or even require) multiple signatures on a ROA. To be more accurate, each authorizing party issues a “partial certificate;” if “enough” pieces of the certificate are found and valid, the route will be validated.

The question is—how many signatures (or parts of the signature, or partial attestations) should be enough? The authors of the paper suggest there should be a “Threshold Signature Module” that makes this decision. The attestations of the various signers are combined in the threshold module to produce a single signature that is then used to validate the route. This way the validation process on the router remains the same, which means the only real change in the overall RPKI system is the addition of the threshold module.

If one RIR—even the one that allocated the addresses you are using—revokes their attestation on your ROA, the remaining attestations should be enough to convince anyone receiving your route that it is still valid. Since there are five regions, you have at least five different choices to countersign your ROA. Each RIR is under the control of a different national government; hence organizations like governments (or criminals!) would need to work across multiple RIRs and through other government organizations to have a ROA completely revoked.

An alternate solutions here, one that follows the PGP model, might be to simply have the threshold signature model consider the number and source of ROAs using the existing model. Local policy could determine how to weight attestations from different RIRs, etc.

This multiple or “shared” attestation (or signature) idea seems like a neat way to work around one of (possibly the major) attack surfaces introduced by the RPKI system. If you are interested in Internet core routing security, you should take a read through the post linked above, and then watch the video.

The Hedge 43: Ivan Pepelnjak and Trusting Routing Protocols

Can you really trust what a routing protocol tells you about how to reach a given destination? Ivan Pepelnjak joins Nick Russo and Russ White to provide a longer version of the tempting one-word answer: no! Join us as we discuss a wide range of issues including third-party next-hops, BGP communities, and the RPKI.

download

The Hedge 42: Andrei Robachevsky and MANRS

The security of the global routing table is foundational to the security of the overall Internet as an ecosystem—if routing cannot be trusted, then everything that relies on routing is suspect, as well. Mutually Agreed Norms for Routing Security (MANRS) is a project of the Internet Society designed to draw network operators of all kinds into thinking about, and doing something about, the security of the global routing table by using common-sense filtering and observation. Andrei Robachevsky joins Russ White and Tom Ammon to talk about MANRS.

More information about MANRS can be found on the project web site, including how to join and how to support global routing security.

download