On Securing BGP
The US Federal Communications Commission recently asked for comments on securing Internet routing. While I worked on the responses offered by various organizations, I also put in my own response as an individual, which I’ve included below.
I am not providing this answer as a representative of any organization, but rather as an individual with long experience in the global standards and operations communities surrounding the Internet, and with long experience in routing and routing security.
I completely agree with the Notice of Inquiry that “networks are essential to the daily functioning of critical infrastructure [yet they] can be vulnerable to attack” due to insecurities in the BGP protocol. While proposed solutions exist that would increase the security of the BGP routing system, only some of these mechanisms are being widely deployed. This response will consider some of the reasons existing proposals are not deployed and suggest some avenues the Commission might explore to aid the community in developing and deploying solutions.
9: Measuring BGP Security.
At this point, I only know of the systems mentioned in the query for measuring BGP routing security incidents. There have been attempts to build other systems, but none of these systems have been successfully built or deployed. Three problems seem to affect these kinds of systems.
First, there is a general lack of funding for building and maintaining such systems. These kinds of systems require a fair amount of research and creative energy to design, including making the networking community aware of these kinds of tools.
Second, building such a system is difficult because of the nature of inter-provider policy. It is often difficult to tell if some change in the Default Free Zone (DFZ) routing is valid or is somehow related to an attack. False positives can have a very negative impact and are hard to detect and guard against.
Third, these kinds of systems generally focus on a single system—routing—while excluding hints and information that can be gained from other systems (particularly the DNS). This is, at least in part, because of the complexity of each individual system, and the difficulty in understanding how to correlate and understand information from overlapping systems.
10: Deployment of BGP Security Measures.
BGP security is divided into at least four different domains right now.
First is the exposure of policies and information through registries and similar mechanisms (such as peeringdb and whois). These mechanisms can generally be useful at the initial stages of peering, and hence are not very helpful in resolving hijacks, mistakes, etc., in near-real-time within the DFZ.
Second is the set of best common practices, such as BCP38, and represented by the MANRS effort. These will be more fully discussed in answer to question 13.
Third is origin validation, currently represented by the RPKI, which will be considered more fully in answering question 11.
Fourth is a more complete security system, currently represented by BGPSEC, which will be considered more fully in answering question 12.
11: The Commission seeks comment on the extent to which RPKI, as implemented by other regional internet registries, effectively prevents BGP hijacking.
The RPKI can effectively block some hijacking events—so long as most providers implement and “pay attention” to the validation process. There are, however, problems with the RPKI system, including—
- There is no “quality control” over the contents of the RPKI. Other systems, such as the Internet Routing Registries (IRRs), that store policy and origination information have, over time, deteriorated in terms of the quality of information housed there. There is very little research into the quality of information stored in the RPKI, nor do we have any sense about how the quality of this information will stand up over time.
- There are some concerns about the centralization of control over resources the RPKI represents. For instance, if a content or transit provider becomes entangled in a contract dispute over some resource with a registry, the registry can use the RPKI system to remove the provider from the Internet, essentially putting the provider out of business. Governments can, in theory, also cause registries to remove a provider’s authorization to use Internet resources. These are areas that may need to be researched and addressed to gain the trust of a larger part of the community.
- The RPKI system does not expose any information about a route other than the originator. This leaves the possibility of hijacking a route by an Autonomous System (AS) advertising a route even though they cannot reach the destination by simply claiming to be connected to the originating AS.
- The RPKI system does little to prevent an AS that should not be transiting traffic—end customers such as content providers and “enterprises”—from advertising routes in a way that pulls them into a transit role.
The RPKI system does appear to be gaining widespread acceptance, and its deployment is increasing in scope.
12: The Commission seeks comment on whether and to what extent network operators anticipate integrating BGPsec-capable routers into their networks.
BGPsec has not been deployed by a single provider on other than an experimental basis, as far as I know, and there are no active plans to implement BGPsec by any provider. BGPsec, in general, fails to provide enough additional security to justify the additional costs associated with its deployment. Specifically—
- Deploying BGPsec on individual routers requires the BGP speaker to perform complex cryptographic operations. No production router in existence today has the processing power to perform these operations quickly enough to be useful. The only apparent solution to this problem is to build specifically designed hardware to perform these operations—no router includes this hardware today, and no plans are in place to include them. The additional costs incurred to allow individual routers to perform these complex cryptographic operations would be prohibitive.
- If it is run “on the side” by moving the complex cryptographic operations onto a separate device, the cost and complexity of running a network are dramatically increased.
- BGPsec only signs the reachable destination (NLRI) and AS Path, which are only two components of a route. There are many other components in a route, such as the next hop and communities, which are just as important to the validity of an individual advertisement which are not covered by BGPsec. The signing of a “route” in BGPsec is a term of convenience, rather than a description of what is really signed.
- BGPsec will only provide some additional security (BGPsec is not “perfect” from a security perspective) if most providers deploy the technology. This leads to a “chicken and egg” problem.
- BGPsec reduces performance by eliminating specific optimizations, such as update packing, which have an important impact on BGP performance and BGP’s consumption of resources.
- The additional resources required by BGPsec represent a surface of attack for DDoS attacks against individual routers and, with coordination, against entire networks.
- BGPsec “freezes BGP in place” by assuming the best way to secure BGP is to “secure the way BGP works.” Deploying BGPsec would restrict future innovation in routing systems, particularly in the global Internet.
To these general problems, there is one further problem—BGPsec does not secure the withdrawal of reachability, only its advertisement. Because of this, BGPsec can only be considered a somewhat partial solution to the problems any BGP security system needs to solve.
Consider a BGP speaker that has received a signed NLRI/AS Path pair (a signed “route”). This BGP speaker can continue advertising this route so long as it appears to be valid—breaking the peering session does not invalidate the route.
Hence, the BGP speaker may mistakenly or intentionally replay this signed reachability information until something within the signed pair invalidates the information. There are four ways the signed route may be invalidated:
- A “better” route is propagated through the system
- Some form of “revocation list” is maintained and distributed
- Each signed route is given a defined “time-to-live,” after which it is invalidated
- The signing key is revoked and/or replaced
The first is impractical to guarantee in all situations. The second would involve maintaining a “negative routing table,” which is nearly impossible in practice.
The third—adding a time-to-live to BGP reachability information—imposes high operational costs. BGP assumes that so long as a peer advertising a reachable destination maintains the peering session, the destination remains reachable (the route is valid). This assumption replaces the workload of constantly advertising already existing routing information with a single “hello” process to ensure the connection is still valid. A single “hello,” then, is a proxy validating the routing information for hundreds of thousands (potentially millions) of reachable destinations. Routes, in other words, have an implied infinite time-to-live.
Adding a time-to-live to individual routes would mean a BGP speaker must readvertise a given reachable destination periodically for the routing information to continue to be considered valid. According to this site, there are currently 916,000 IPv4 routes carried by a BGP speaker connected to the Internet (the number varies by location, policies implemented, etc.). Note the analysis below does not consider IPv6 routes, which will probably be more numerous.
The time-to-live attached to any route determines how long the information can be replayed. If the originator sets the timer to 168 hours, the route can be replayed for a week before it is invalidated. It is difficult to say how long any given route should be valid, or what level of replay protection any given route requires. This illustration will assume 24 hours would be an average across many routes—but there are strong incentives to set the time-to-live much shorter, and there is little cost to the originator for doing so.
If each of these routes were given a time-to-live of 24 hours, the typical Internet BGP speaker would need to process about 10 updates/second (with the additional cryptographic processing requirements described above) just to process time-to-live expirations.
The impacts of this level of activity in the DFZ—beyond the sheer processing and bandwidth requirements—are wide-ranging. For instance, logging, telemetry, false route detection systems, and the way timers are deployed to dampen and manage high speed flapping events, would all need to be reconsidered and adjusted.
The fourth alternative is for the signing key to be revoked when a route is withdrawn.
If the operator uses a single key to sign all routes being advertised by the AS, then replacing the key on a single route requires re-advertising every route. Readvertising every route is a difficult process, fraught with potential failure modes.
If the operator assigns each BGP speaker a key, then only the key for BGP speakers impacted by withdrawing the route must have their keys changes. Hence, only the routes advertised by or through these individual speakers need to be re-advertised into the routing system. However, assigning each BGP speaker an individual key for signing routing information exposes another set of problems.
Key management is an obvious problem with this solution; the exposure of peering information, and the security implications of that exposure, are non-obvious problems. If each BGP speaker on the edge of a network has its own signing key, then outside observers can determine the actual pair of routers used to connect any two autonomous systems. This creates a “map” of points at which the network can be attacked, and is generally an unacceptable exposure of information for most providers.
These issues have, to this point, prevented any serious plans for deploying BGPsec—and will probably continue to do so for the foreseeable future. The very best that can be hoped for is BGPsec deployment in 10–20 years, and even full deployment would not necessarily improve the overall security posture of the global Internet.
13: For network operators that currently participate in MANRS and comply with its requirements, including support for IETF Best Common Practice standards, the Commission seeks comment on the efficacy of such measures for preventing BGP hijacking.
MANRS, BCP38, and peer-to-peer BGP session encryption (such as TCP-AO) should, in theory, be effective a large part of the unintentional and “unsophisticated” attacks and mistakes that cause large-scale BGP failures. There has been little research attempting to measure the impact of these measures, and it seems difficult to measure their impact.
The MANRS vendor program is an effective mechanism for promoting the common-sense practices, although it could probably be ramped up somewhat, and vendors more strongly encouraged to participate.
These measures should continue to be promoted through education, presentations, and other means, as they do appear to be improving the overall security posture of the Internet. TCP-AO, BCP38, and MANRS should, in particular, be encouraged and emphasized by all parties within the ecosystem.
14: Commission’s Role.
The Commission should focus on supporting the community in developing deployable standards and systems to improve the global routing system.
First, the Commission can encourage governmental organizations, and organizations funded by government organizations, to “go back to basics” and ask specific questions about what needs to be secured, how it can practically be secured, and what the tradeoffs are.
To this point, BGP security efforts have often begun with the question how we can secure the existing operation of BGP. This is not the right question to ask. Instead, the community needs to be encouraged to create and understand what needs to be secured. Possible questions might be—
- What does valid mean in relation to a route? Must it include the entire route, or is “just” the AS Path and reachable destination “enough?”
- In relation to the AS Path, is the AS Path given valid in the sense that it exists, and there are no policies preventing the use of this path to reach the given destination?
- In relation to the reachable destination, how can aggregation and other forms of alternate origination be supported while still answering the questions posed above?
- Will the providers along the path actually use the given path? Can “quality of path” be ensured? If so, how can the be accomplished without incurring unacceptable costs?
- How can the effectiveness of the system be measured?
- How can a system be designed so that increasing deployment increases security? How can the “tragedy of the commons” and “chicken and egg” problems be avoided?
Second, the Commission can encourage providers and operators, including large “enterprise” organizations, to participate in the process of understanding and building global routing system security. To this point, only a few providers have participated in the discussion. Quite often, those participating have a narrow perspective, and have been guided by groups asking the wrong question (as above). The scope of enquiry needs to be expanded.
What the Commission, or any other government organization, should not do is to push a solution from the top down. The IETF community is effective at finding solutions for these kinds of problems, and has vast experience in understanding the intended consequences, the unintended consequences, and operational aspects of deploying technologies at the scale of the Internet. Government agencies need to leverage these capacities, rather than trying to override them.
If funding is provided for research in this area, it should begin with some sort of “open research grant,” rather than selecting one solution to fund. Funding should not have an impact on the selection of a technical solution in open standards organizations (such as the IETF). Funding does, however, play a significant role by impacting the availability of implementations, time spent researching problems, time spent supporting a given solution at open meetings, etc.
The community must return to the beginning and find a solution that works by asking the right questions.
15: The Commission seeks comment on the extent to which the effectiveness of BGP security measures may be related to international participation and coordination.
International coordination and cooperation are basic requirements.
16″ Costs and Benefits.
Please see the answers above, as some of the costs are considered there.
17: The Commission seeks comment on whether the Commission should encourage industry to prioritize the deployment of BGP security measures within the networks on which critical infrastructure and emergency services rely, as a means of helping industry to control costs otherwise associated with a network-wide deployment.
This is an attractive idea from the perspective of finding places where routing security could be deployed at a smaller scale and in a controlled manner to understand how the system works, make improvements in the system, etc. However, I would be concerned about how these kinds of services can be “separated out” for deployment in an effective way.
This kind of deployment would, however, make the problem of incremental deployment a fundamental requirement of any proposed system, which may at least encourage steps in the right direction.