On Securing BGP

The US Federal Communications Commission recently asked for comments on securing Internet routing. While I worked on the responses offered by various organizations, I also put in my own response as an individual, which I’ve included below.

I am not providing this answer as a representative of any organization, but rather as an individual with long experience in the global standards and operations communities surrounding the Internet, and with long experience in routing and routing security.

I completely agree with the Notice of Inquiry that “networks are essential to the daily functioning of critical infrastructure [yet they] can be vulnerable to attack” due to insecurities in the BGP protocol. While proposed solutions exist that would increase the security of the BGP routing system, only some of these mechanisms are being widely deployed. This response will consider some of the reasons existing proposals are not deployed and suggest some avenues the Commission might explore to aid the community in developing and deploying solutions.

9: Measuring BGP Security.
At this point, I only know of the systems mentioned in the query for measuring BGP routing security incidents. There have been attempts to build other systems, but none of these systems have been successfully built or deployed. Three problems seem to affect these kinds of systems.

First, there is a general lack of funding for building and maintaining such systems. These kinds of systems require a fair amount of research and creative energy to design, including making the networking community aware of these kinds of tools.

Second, building such a system is difficult because of the nature of inter-provider policy. It is often difficult to tell if some change in the Default Free Zone (DFZ) routing is valid or is somehow related to an attack. False positives can have a very negative impact and are hard to detect and guard against.

Third, these kinds of systems generally focus on a single system—routing—while excluding hints and information that can be gained from other systems (particularly the DNS). This is, at least in part, because of the complexity of each individual system, and the difficulty in understanding how to correlate and understand information from overlapping systems.

10: Deployment of BGP Security Measures.
BGP security is divided into at least four different domains right now.

First is the exposure of policies and information through registries and similar mechanisms (such as peeringdb and whois). These mechanisms can generally be useful at the initial stages of peering, and hence are not very helpful in resolving hijacks, mistakes, etc., in near-real-time within the DFZ.

Second is the set of best common practices, such as BCP38, and represented by the MANRS effort. These will be more fully discussed in answer to question 13.

Third is origin validation, currently represented by the RPKI, which will be considered more fully in answering question 11.

Fourth is a more complete security system, currently represented by BGPSEC, which will be considered more fully in answering question 12.

11: The Commission seeks comment on the extent to which RPKI, as implemented by other regional internet registries, effectively prevents BGP hijacking.
The RPKI can effectively block some hijacking events—so long as most providers implement and “pay attention” to the validation process. There are, however, problems with the RPKI system, including—

  • There is no “quality control” over the contents of the RPKI. Other systems, such as the Internet Routing Registries (IRRs), that store policy and origination information have, over time, deteriorated in terms of the quality of information housed there. There is very little research into the quality of information stored in the RPKI, nor do we have any sense about how the quality of this information will stand up over time.
  • There are some concerns about the centralization of control over resources the RPKI represents. For instance, if a content or transit provider becomes entangled in a contract dispute over some resource with a registry, the registry can use the RPKI system to remove the provider from the Internet, essentially putting the provider out of business. Governments can, in theory, also cause registries to remove a provider’s authorization to use Internet resources. These are areas that may need to be researched and addressed to gain the trust of a larger part of the community.
  • The RPKI system does not expose any information about a route other than the originator. This leaves the possibility of hijacking a route by an Autonomous System (AS) advertising a route even though they cannot reach the destination by simply claiming to be connected to the originating AS.
  • The RPKI system does little to prevent an AS that should not be transiting traffic—end customers such as content providers and “enterprises”—from advertising routes in a way that pulls them into a transit role.

The RPKI system does appear to be gaining widespread acceptance, and its deployment is increasing in scope.

12: The Commission seeks comment on whether and to what extent network operators anticipate integrating BGPsec-capable routers into their networks.
BGPsec has not been deployed by a single provider on other than an experimental basis, as far as I know, and there are no active plans to implement BGPsec by any provider. BGPsec, in general, fails to provide enough additional security to justify the additional costs associated with its deployment. Specifically—

  • Deploying BGPsec on individual routers requires the BGP speaker to perform complex cryptographic operations. No production router in existence today has the processing power to perform these operations quickly enough to be useful. The only apparent solution to this problem is to build specifically designed hardware to perform these operations—no router includes this hardware today, and no plans are in place to include them. The additional costs incurred to allow individual routers to perform these complex cryptographic operations would be prohibitive.
  • If it is run “on the side” by moving the complex cryptographic operations onto a separate device, the cost and complexity of running a network are dramatically increased.
  • BGPsec only signs the reachable destination (NLRI) and AS Path, which are only two components of a route. There are many other components in a route, such as the next hop and communities, which are just as important to the validity of an individual advertisement which are not covered by BGPsec. The signing of a “route” in BGPsec is a term of convenience, rather than a description of what is really signed.
  • BGPsec will only provide some additional security (BGPsec is not “perfect” from a security perspective) if most providers deploy the technology. This leads to a “chicken and egg” problem.
  • BGPsec reduces performance by eliminating specific optimizations, such as update packing, which have an important impact on BGP performance and BGP’s consumption of resources.
  • The additional resources required by BGPsec represent a surface of attack for DDoS attacks against individual routers and, with coordination, against entire networks.
  • BGPsec “freezes BGP in place” by assuming the best way to secure BGP is to “secure the way BGP works.” Deploying BGPsec would restrict future innovation in routing systems, particularly in the global Internet.

To these general problems, there is one further problem—BGPsec does not secure the withdrawal of reachability, only its advertisement. Because of this, BGPsec can only be considered a somewhat partial solution to the problems any BGP security system needs to solve.

Consider a BGP speaker that has received a signed NLRI/AS Path pair (a signed “route”). This BGP speaker can continue advertising this route so long as it appears to be valid—breaking the peering session does not invalidate the route.

Hence, the BGP speaker may mistakenly or intentionally replay this signed reachability information until something within the signed pair invalidates the information. There are four ways the signed route may be invalidated:

  • A “better” route is propagated through the system
  • Some form of “revocation list” is maintained and distributed
  • Each signed route is given a defined “time-to-live,” after which it is invalidated
  • The signing key is revoked and/or replaced

The first is impractical to guarantee in all situations. The second would involve maintaining a “negative routing table,” which is nearly impossible in practice.

The third—adding a time-to-live to BGP reachability information—imposes high operational costs. BGP assumes that so long as a peer advertising a reachable destination maintains the peering session, the destination remains reachable (the route is valid). This assumption replaces the workload of constantly advertising already existing routing information with a single “hello” process to ensure the connection is still valid. A single “hello,” then, is a proxy validating the routing information for hundreds of thousands (potentially millions) of reachable destinations. Routes, in other words, have an implied infinite time-to-live.

Adding a time-to-live to individual routes would mean a BGP speaker must readvertise a given reachable destination periodically for the routing information to continue to be considered valid. According to this site, there are currently 916,000 IPv4 routes carried by a BGP speaker connected to the Internet (the number varies by location, policies implemented, etc.). Note the analysis below does not consider IPv6 routes, which will probably be more numerous.

The time-to-live attached to any route determines how long the information can be replayed. If the originator sets the timer to 168 hours, the route can be replayed for a week before it is invalidated. It is difficult to say how long any given route should be valid, or what level of replay protection any given route requires. This illustration will assume 24 hours would be an average across many routes—but there are strong incentives to set the time-to-live much shorter, and there is little cost to the originator for doing so.

If each of these routes were given a time-to-live of 24 hours, the typical Internet BGP speaker would need to process about 10 updates/second (with the additional cryptographic processing requirements described above) just to process time-to-live expirations.

The impacts of this level of activity in the DFZ—beyond the sheer processing and bandwidth requirements—are wide-ranging. For instance, logging, telemetry, false route detection systems, and the way timers are deployed to dampen and manage high speed flapping events, would all need to be reconsidered and adjusted.

The fourth alternative is for the signing key to be revoked when a route is withdrawn.

If the operator uses a single key to sign all routes being advertised by the AS, then replacing the key on a single route requires re-advertising every route. Readvertising every route is a difficult process, fraught with potential failure modes.

If the operator assigns each BGP speaker a key, then only the key for BGP speakers impacted by withdrawing the route must have their keys changes. Hence, only the routes advertised by or through these individual speakers need to be re-advertised into the routing system. However, assigning each BGP speaker an individual key for signing routing information exposes another set of problems.

Key management is an obvious problem with this solution; the exposure of peering information, and the security implications of that exposure, are non-obvious problems. If each BGP speaker on the edge of a network has its own signing key, then outside observers can determine the actual pair of routers used to connect any two autonomous systems. This creates a “map” of points at which the network can be attacked, and is generally an unacceptable exposure of information for most providers.

These issues have, to this point, prevented any serious plans for deploying BGPsec—and will probably continue to do so for the foreseeable future. The very best that can be hoped for is BGPsec deployment in 10–20 years, and even full deployment would not necessarily improve the overall security posture of the global Internet.

13: For network operators that currently participate in MANRS and comply with its requirements, including support for IETF Best Common Practice standards, the Commission seeks comment on the efficacy of such measures for preventing BGP hijacking.

MANRS, BCP38, and peer-to-peer BGP session encryption (such as TCP-AO) should, in theory, be effective a large part of the unintentional and “unsophisticated” attacks and mistakes that cause large-scale BGP failures. There has been little research attempting to measure the impact of these measures, and it seems difficult to measure their impact.

The MANRS vendor program is an effective mechanism for promoting the common-sense practices, although it could probably be ramped up somewhat, and vendors more strongly encouraged to participate.

These measures should continue to be promoted through education, presentations, and other means, as they do appear to be improving the overall security posture of the Internet. TCP-AO, BCP38, and MANRS should, in particular, be encouraged and emphasized by all parties within the ecosystem.

14: Commission’s Role.
The Commission should focus on supporting the community in developing deployable standards and systems to improve the global routing system.

First, the Commission can encourage governmental organizations, and organizations funded by government organizations, to “go back to basics” and ask specific questions about what needs to be secured, how it can practically be secured, and what the tradeoffs are.

To this point, BGP security efforts have often begun with the question how we can secure the existing operation of BGP. This is not the right question to ask. Instead, the community needs to be encouraged to create and understand what needs to be secured. Possible questions might be—

  • What does valid mean in relation to a route? Must it include the entire route, or is “just” the AS Path and reachable destination “enough?”
  • In relation to the AS Path, is the AS Path given valid in the sense that it exists, and there are no policies preventing the use of this path to reach the given destination?
  • In relation to the reachable destination, how can aggregation and other forms of alternate origination be supported while still answering the questions posed above?
  • Will the providers along the path actually use the given path? Can “quality of path” be ensured? If so, how can the be accomplished without incurring unacceptable costs?
  • How can the effectiveness of the system be measured?
  • How can a system be designed so that increasing deployment increases security? How can the “tragedy of the commons” and “chicken and egg” problems be avoided?

Second, the Commission can encourage providers and operators, including large “enterprise” organizations, to participate in the process of understanding and building global routing system security. To this point, only a few providers have participated in the discussion. Quite often, those participating have a narrow perspective, and have been guided by groups asking the wrong question (as above). The scope of enquiry needs to be expanded.

What the Commission, or any other government organization, should not do is to push a solution from the top down. The IETF community is effective at finding solutions for these kinds of problems, and has vast experience in understanding the intended consequences, the unintended consequences, and operational aspects of deploying technologies at the scale of the Internet. Government agencies need to leverage these capacities, rather than trying to override them.

If funding is provided for research in this area, it should begin with some sort of “open research grant,” rather than selecting one solution to fund. Funding should not have an impact on the selection of a technical solution in open standards organizations (such as the IETF). Funding does, however, play a significant role by impacting the availability of implementations, time spent researching problems, time spent supporting a given solution at open meetings, etc.

The community must return to the beginning and find a solution that works by asking the right questions.

15: The Commission seeks comment on the extent to which the effectiveness of BGP security measures may be related to international participation and coordination.
International coordination and cooperation are basic requirements.

16″ Costs and Benefits.
Please see the answers above, as some of the costs are considered there.

17: The Commission seeks comment on whether the Commission should encourage industry to prioritize the deployment of BGP security measures within the networks on which critical infrastructure and emergency services rely, as a means of helping industry to control costs otherwise associated with a network-wide deployment.

This is an attractive idea from the perspective of finding places where routing security could be deployed at a smaller scale and in a controlled manner to understand how the system works, make improvements in the system, etc. However, I would be concerned about how these kinds of services can be “separated out” for deployment in an effective way.

This kind of deployment would, however, make the problem of incremental deployment a fundamental requirement of any proposed system, which may at least encourage steps in the right direction.

Hedge 103: BGP Security with Geoff Huston

Our community has been talking about BGP security for over 20 years. While MANRS and the RPKI have made some headway in securing BGP, the process of deciding on a method to provide at least the information providers need to make more rational decisions about the validity of individual routes is still ongoing. Geoff Huston joins Alvaro, Russ, and Tom to discuss how we got here and whether we will learn from our mistakes.

download

Marketing Wins

Off-topic post for today …

In the battle between marketing and security, marketing always wins. This topic came to mind after reading an article on using email aliases to control your email—

For example, if you sign up for a lot of email newsletters, consider doing so with an alias. That way, you can quickly filter the incoming messages sent to that alias—these are probably low-priority, so you can have your provider automatically apply specific labels, mark them as read, or delete them immediately.

One of the most basic things you can do to increase your security against phishing attacks is to have two email addresses, one you give to financial institutions and another one you give to “everyone else.” It would be nice to have a third for newsletters and marketing, but this won’t work in the real world. Why?

Because it’s very rare to find a company that will keep two email addresses on file for you, one for “business” and another for “marketing.” To give specific examples—my mortgage company sends me both marketing messages in the form of a “newsletter” as well as information about mortgage activity. They only keep one email address on file, though, so they both go to a single email address.

A second example—even worse in my opinion—is PayPal. Whenever you buy something using PayPal, the vendor gets the email address associated with the account. That’s fine—they need to send me updates on the progress of the item I ordered, etc. But they also use this email address to send me newsletters … and PayPal sends any information about account activity to the same email address.

Because of the way these things are structured, I cannot separate information about my account from newsletters, phishing attacks, etc. Since modern Phishing campaigns are using AI to create the most realistic emails possible, and most folks can’t spot a Phish anyway, you’d think banks and financial companies would want to give their users the largest selection of tools to fight against scams.

But they don’t. Why?

Because—if your financial information is mingled with a marketing newsletter, you’ll open the email to see what’s inside … you’ll pay attention. Why spend money helping your users not pay attention to your marketing materials by separating them from “the important stuff?”

When it comes to marketing versus security, marketing always wins. Somehow, we in IT need to do better than this.

NATs, PATs, and Network Hygiene

While reading a research paper on address spoofing from 2019, I ran into this on NAT (really PAT) failures—

In the first failure mode, the NAT simply forwards the packets with the spoofed source address (the victim) intact … In the second failure mode, the NAT rewrites the source address to the NAT’s publicly routable address, and forwards the packet to the amplifier. When the server replies, the NAT system does the inverse translation of the source address, expecting to deliver the packet to an internal system. However, because the mapping is between two routable addresses external to the NAT, the packet is routed by the NAT towards the victim.

The authors state 49% of the NATs they discovered in their investigation of spoofed addresses fail in one of these two ways. From what I remember way back when the first NAT/PAT device (the PIX) was deployed in the real world (I worked in TAC at the time), there was a lot of discussion about what a firewall should do with packets sourced from addresses not indicated anywhere.

If I have an access list including 192.168.1.0/24, and I get a packet sourced from 192.168.2.24, what should the NAT do? Should it forward the packet, assuming it’s from some valid public IP space? Or should it block the packet because there’s no policy covering this source address?

This is similar to the discussion about whether BGP speakers should send routes to an external peer if there is no policy configured. The IETF (though not all vendors) eventually came to the conclusion that BGP speakers should not advertise to external peers without some form of policy configured.

My instinct is the NATs here are doing the right thing—these packets should be forwarded—but network operators should be aware of this failure mode and configure their intentions explicitly. I suspect most operators don’t realize this is the way most NAT implementations work, and hence they aren’t explicitly filtering source addresses that don’t fall within the source translation pool.

In the real world, there should also be a box just outside the NATing device that’s running unicast reverse path forwarding checks. This would resolve these sorts of spoofed packets from being forwarding into the DFZ—but uRPF is rarely implemented by edge providers, and most edge connected operators (enterprises) don’t think about the importance of uRPF to their security.

All this to say—if you’re running a NAT or PAT, make certain you understand how it works. Filters are tricky in the best of circumstances. NAT and PATs just make filters trickier.

Illusory Correlation and Security

Fear sells. Fear of missing out, fear of being an imposter, fear of crime, fear of injury, fear of sickness … we can all think of times when people we know (or worse, a people in the throes of madness of crowds) have made really bad decisions because they were afraid of something. Bruce Schneier has documented this a number of times. For instance: “it’s smart politics to exaggerate terrorist threats”  and “fear makes people deferential, docile, and distrustful, and both politicians and marketers have learned to take advantage of this.” Here is a paper comparing the risk of death in a bathtub to death because of a terrorist attack—bathtubs win.

But while fear sells, the desire to appear unafraid also sells—and it conditions people’s behavior much more than we might think. For instance, we often say of surveillance “if you have done nothing wrong, you have nothing to hide”—a bit of meaningless bravado. What does this latter attitude—“I don’t have anything to worry about”—cause in terms of security?

Several attempts at researching this phenomenon have come to the same conclusion: average users will often intentionally not use things they see someone they perceive as paranoid using. According to this body of research, people will not use password managers because using one is perceived as being paranoid in some way. Theoretically, this effect is caused by illusory correlation, where people associate an action with a kind of person (only bad/scared people would want to carry a weapon). Since we don’t want to be the kind of person we associate with that action, we avoid the action—even though it might make sense.

This is just the flip side of fear sells, of course. Just like we overestimate the possibility of a terrorist attack impacting our lives in a direct, personal way, we also underestimate the possibility of more mundane things, like drowning in a tub, because we either think can control it, or because we don’t think we’ll be targeted in that way, or because we want to signal to the world that we “aren’t one of those people.”

Even knowing this is true, however, how can we counter this? How can we convince people to learn to assess risks rationally, rather than emotionally? How can we convince people that the perception of control should not impact your assessment of personal security or safety?

Simplifying design and use of the systems we build would be one—perhaps not-so-obvious—step we can take. The more security is just “automatic,” the more users will become accustomed to deploying security in their everyday lives. Another thing we might be able to do is stop trying to scare people into using these technologies.

In the meantime, just be aware that if you’re an engineer, your use of a technology “as an example” to others can backfire, causing people to not want to use those technologies.