Spam might seem like an annoyance in the US and other areas where bandwidth is paid for by the access rate—and what does spam have to do with BGP security? In many areas of the world, however, spam makes email practically unusable. When you’re paying for Internet access by the byte transmitted or received, spam costs real money. The normal process for combating spam involves a multi-step process, one step of which is to assess the IP address of the mail server’s previous activity for a history of originating spam. In order to avoid classifiers that rely on the source IP address, spammers have turned to hijacking IP address space for short periods of time. Since this address space is normally used for something other than email (or it’s not used at all), there is no history on which a spam detection system can rely.
The evidence for spam related hijacking, however, is largely anecdotal, primarily based in word of mouth and the rare widely reported incidents. How common are these hijacks, really? What sort of address space is really used? To answer this question, a group of researchers from Symantec and the Qatar Computing Research Center undertook a project in this area, correlating BGP route hijacks with large scale SPAM operations. The researchers first tapped into another system that tracks the relationship between mass spam mailings and events in the Default Free Zone (DFZ—the global Internet core, in essence). Rather than detecting when a route is injected, it watches for a mass mailing, notes the source address, and then records if and when the route to the source address is withdrawn (so it is removed from the DFZ). This system finds a good bit of address space which is advertised only for sending spam in mass mailings. Next, the researchers set about finding out who owns the address space used for these mass mailings. What they discovered was surprising in some ways, and unsurprising in others.
First, in an 18 month period, they discovered 64 address blocks were hijacked to send mass spam mailings. This number might seem low, but examining the origin AS of each of these 64 address blocks uncovered an additional 2,591 address blocks that were also used for mass spam mailings, but were not detected through the original process. Remember this is just the lower/lowest number of such hijacks; the researchers intentionally used very narrow filters to reduce their intake of address blocks to investigate. This project ultimately investigate 2,655 hijacks related to spam events across 18 months, representing somewhere around 5 hijacks per day.
Second, in 92% of the hijacks, the address space was not being advertised by the actual owner at the time of it’s use by the spam operator. In these cases, the hijacker forged the first hop AS number, using a number different from the owning organization. In the remaining 8% of hijacks, the attacker used the correct origin AS, but advertised the correct origin AS as being connected to an incorrect upsteam provider.
It appears, from this research, that hijacked address space is a major origin of mass spam mailings. What can we, the folks who interact with, work on, or work around the Internet do to reduce the level of spam? One good place to start is to stop the hijacking of IP address space used to originate large scale spam operations. This means implementing one of the various mechanisms that would detect and allow operators to ignore or drop hijacked address space.
What would it take to secure BGP? Let’s begin where any engineering problem should begin: what problem are we trying to solve? This series of posts walks through a wide range of technical and business problems to create a solid set of requirements against which to measure proposed solutions for securing BGP in the global Internet, and then works through several proposed solutions to see how they stack up.
Post 1: An introduction to the problem space
Post 2: What can I prove in a routing system?
Post 3: What can I prove in a routing system?
Post 4: Centralized or decentralized?
Post 5: Centralized or decentralized?
Post 6: Business issues with centralization
Post 7: Technical issues with centralization
Post 8: A full requirements list
Post 9: BGPSEC (S-BGP) compared to the requirements
Post 10: RPKI compared to the requirements
I will continue updating this post as I work through the remaining segments of this series.
The next proposed (and actually already partially operational) system on our list is the Router Public Key Infrastructure (RPKI) system, which is described in RFC7115 (and a host of additional drafts and RFCs). The RPKI systems is focused on solving a single solution: validating that the originating AS is authorized to originate a particular prefix. An example will be helpful; we’ll use the network below.
(this is a graphic pulled from a presentation, rather than one of my usual line drawings)
Assume, for a moment, that AS65002 and AS65003 both advertise the same route, 2001:db8:0:1::/64, towards AS65000. How can the receiver determine if both of these two advertisers can actually reach the destination, or only one can? And, if only one can, how can AS65000 determine which one is the “real thing?” This is where the RPKI system comes into play. A very simplified version of the process looks something like this (assuming AS650002 is the true owner of 2001:db8:0:1::/64):
- AS65002 obtains, from the Regional Internet Registry (labeled the RIR in the diagram), a certificate showing AS65002 has been issued 2001:db8:0:1::/64.
- AS65002 places this certificate into a local database that is synchronized with all the other operators participating in the routing system.
- When AS65000 receives a route towards 2001:db8:0:1::/64, it checks this database to make certain the origin AS on the advertisement matches the owning AS.
If the owner and the origin AS match, AS65000 can increase the route’s preference. If it doesn’t AS65000 can reduce the route’s preference. It might be that AS65000 discards the route if the origin doesn’t match—or it may not. For instance, AS65003 may know, from historical data, or through a strong and long standing business relationship, or from some other means, that 2001:db8:0:1::/64 actually belongs to AS65004, even through the RPKI data claims it belongs to AS65002. Resolving such problems falls to the receiving operator—the RPKI simply provides more information on which to act, rather than dictating a particular action to take.
Let’s compare this to our requirements to see how this proposal stacks up, and where there might be objections or problems.
Centralized versus Decentralized: The distribution of the origin authentication information is currently undertaken with rsync, which means the certificate system is decentralized from a technical perspective.
However—there have been technical issues with the rsync solution in the past, such that it can take up to 24 hours to change the distributed database. This is a pretty extreme case of eventual consistency, and it’s a major problem in the global default free zone. BGP might converge very slowly, but it still converges more quickly than 24 hours.
Beyond the technical problems, there is a business side to the centralized/decentralized issue as well. Specifically, many business don’t want their operations impacted by contract issues, negotiation issues, and the like. Many large providers see the RPKI system as creating just such problems, as the “trust anchor” is located in the RIRs. There are ways to mitigate this—just use some other root, or even self sign your certificates—but the RPKI system faces an uphill battle in this are from large transit providers.
Cost: The actual cost of setting up and running a server doesn’t appear to be very high within the RPKI system. The only things you need to “get into the game” are a couple of VMs or physical servers to run rsync, and some way to inject the information gleaned from the RPKI system into the routing decisions along the network edge (which could even be just plugging the information into existing policy mechanisms).
The business issue described above can also be counted as a cost—how much would it cost a provider if their origin authentication were taken out of the database for a day or two, or even a week or two, while a contract dispute with the RIR was worked out?
Information Cost: There is virtually no additional information cost involved in deploying the RPKI.
Other thoughts: The RPKI system wasn’t designed to, and doesn’t, validate anything other than the origin in the AS Path. It doesn’t, therefore, allow an operator to detect AS65003, for instance, claiming to be connected to AS65002 even though it’s not (or it’s not supposed to transit traffic to AS65002). This isn’t really a “lack” on the part of the RPKI, it’s just not something it’s designed to do.
Overall, the RPKI is useful, and will probably be deployed by a number of providers, and shunned by others. It would be a good component of some larger system (again, this was the original intent, so this isn’t a lack), but it cannot stand alone as a complete BGP security system.
There are a number of systems that have been proposed to validate (or secure) the path in BGP. To finish off this series on BGP as a case study, I only want to look at three of them. At some point in the future, I will probably write a couple of posts on what actually seems to be making it to some sort of deployment stage, but for now I just want to compare various proposals against the requirements outlined in the last post on this topic (you can find that post here).
The first of these systems is BGPSEC—or as it was known before it was called BGPSEC, S-BGP. I’m not going to spend a lot of time explaining how S-BGP works, as I’ve written a series of posts over at Packet Pushers on this very topic:
Considering S-BGP against the requirements:
- Centralized versus decentralized balance: S-BGP distributes path validation information throughout the internetwork, as this information is actually contained in a new attribute carried with route advertisements. Authorization and authentication are implicitly centralized, however, with the root certificates being held by address allocation authorities. It’s hard to say if this is the correct balance.
- Cost: In terms of financial costs, S-BGP (or BGPSEC) requires every eBGP speaker to perform complex cryptographic operations in line with receiving updates and calculating the best path to each destination. This effectively means replacing every edge router in every AS in the entire world to deploy the solution—this is definitely not cost friendly. Adding to this cost is the simply increase in the table size required to carry all this information, and the loss of commonly used (and generally effective) optimizations.
- Information cost: S-BGP leaks new information into the global table as a matter of course—not only can anyone see who is peered with whom by examining information gleaned from route view servers, they can even figure out how many actual pairs of routers connect each AS, and (potentially) what other peerings those same routers serve. This huge new chunk of information about provider topology being revealed simply isn’t acceptable.
Overall, then, BGP-SEC doesn’t meet the requirements as they’ve been outlined in this series of posts. Next week, I’ll spend some time explaining the operation of another potential system, a graph overlay, and then we’ll consider how well it meets the requirements as outlined in these posts.
Throughout the last several months, I’ve been building a set of posts examining securing BGP as a sort of case study around protocol and/or system design. The point of this series of posts isn’t to find a way to secure BGP specifically, but rather to look at the kinds of problems we need to think about when building such a system. The interplay between technical and business requirements are wide and deep. In this post, I’m going to summarize the requirements drawn from the last seven posts in the series.
Don’t try to prove things you can’t. This might feel like a bit of an “anti-requirement,” but the point is still important. In this case, we can’t prove which path along which traffic will flow. We also can’t enforce policies, specifically “don’t transit this AS;” the best we can do is to provide information and letting other operators make a local decision about what to follow and what not to follow. In the larger sense, it’s important to understand what can, and what can’t, be solved, or rather what the practical limits of any solution might be, as close to the beginning of the design phase as possible.
In the case of securing BGP, I can, at most, validate three pieces of information:
- That the origin AS in the AS Path matches the owner of the address being advertised.
- That the AS Path in the advertisement is a valid path, in the sense that each pair of autonomous systems in the AS Path are actually connected, and that no-one has “inserted themselves” in the path silently.
- The policies of each pair of autonomous systems along the path towards one another. This is completely voluntary information, of course, and cannot be enforced in any way if it is provided, but more information provided will allow for stronger validation.
There is a fine balance between centralized and distributed systems. There are actually things that can be centralized or distributed in terms of BGP security: how ownership is claimed over resources, and how the validation information is carried to each participating AS. In the case of ownership, the tradeoff is between having a widely trusted third party validate ownership claims and having a third party who can shut down an entire business. In the case of distributing the information, there is a tradeoff between the consistency and the accessibility of the validation information. These are going to be points on which reasonable people can disagree, and hence are probably areas where the successful system must have a good deal of flexibility.
Cost is a major concern. There are a number of costs that need to be considered when determining which solution is best for securing BGP, including—
- Physical equipment costs. The most obvious cost is the physical equipment required to implement each solution. For instance, any solution that requires providers to replace all their edge routers is simply not going to be acceptable.
- Process costs. Any solution that requires a lot of upkeep and maintenance is going to be cast aside very quickly. Good intentions are overruled by the tyranny of the immediate about 99.99% of the time.
Speed is also a cost that can be measured in business terms; if increasing security decreases the speed of convergence, providers who deploy security are at a business disadvantage relative to their competitors. The speed of convergence must be on the order of Internet level convergence today.
Information costs are a particularly important issue. There are at least three kinds of information that can leak out of any attempt to validate BGP, each of them related to connectivity—
- Specific information about peering, such as how many routers interconnect two autonomous systems, where interconnections are, and how interconnection points are related to one another.
- Publicly verifiable claims about interconnection. Many providers argue there is a major difference between connectivity information that can be observed and connectivity information that is claimed.
- Publicly verifiable information about business relationships. Virtually every provider considers it important not to release at least some information about their business relationships with other providers and customers.
While there is some disagreement in the community over each of these points, it’s clear that releasing the first of these is almost always going to be unacceptable, while the second and third are more situational.
With these requirements in place, it’s time to look at a couple of proposed systems to see how they measure up.
In the last post on this series on securing BGP, I considered a couple of extra questions around business problems that relate to BGP. This time, I want to consider the problem of convergence speed in light of any sort of BGP security system. The next post (to provide something of a road map) should pull all the requirements side together into a single post, so we can begin working through some of the solutions available. Ultimately, as this is a case study, we’re after a set of tradeoffs for each solution, rather than a final decision about which solution to use.
The question we need to consider here is: should the information used to provide validation for BGP be somewhat centralized, or fully distributed? The CAP theorem tells us that there are a range of choices here, with the two extreme cases being—
- A single copy of the database we’re using to provide validation information which is always consistent
- Multiple disconnected copies of the database we’re using to provide validation which is only intermittently consistent
Between these two extremes there are a range of choices (reducing all possibilities to these two extremes is, in fact, a misuse of the CAP theorem). To help understand this as a range of tradeoffs, take a look at the chart below—
The further we go to the right along this chart, the more—
- copies of the database there are in existence—more copies means more devices that must have a copy, and hence more devices that must receive and update a local copy, which means slower convergence.
- slower the connectivity between the copies of the database.
In complexity model terms, both of these relate to the interaction surface; slower and larger interaction surfaces face their tradeoff in the amount and speed of state that can be successfully (or quickly) managed in a control plane (hence the tradeoffs we see in the CAP theorem are directly mirrored in the complexity model). Given this, what is it we need out of a system used to provide validation for BGP? Let’s set up a specific situation that might help answer this question.
Assume, for a moment, that your network is under some sort of distributed denial of service (DDoS) attack. You call up some sort of DDoS mitigation provider, and they say something like “just transfer your origin validation to us, so we can advertise the route without it being black holed; we’ll scrub the traffic and transfer the clean flows back to you through a tunnel.” Now ask this: how long are you willing to wait before the DDoS protection takes place? Two or three days? A day? Hours? Minutes? If you can locate that amount of time along the chart above, then you can get a sense of the problem we’re trying to solve.
To put this in different terms: any system that provides BGP validation information must converge at roughly the speed of BGP itself.
So—why not just put the information securing BGP in BGP itself, so that routing converges at the same speed as the validation information? This implies every edge device in my network must handle cryptographic processing to verify the validation information. There are some definite tradeoffs to consider here, but we’ll leave those to the evaluation of proposed solutions.
Before leaving this post and beginning on the process of wrapping up the requirements around securing BGP (to be summarized in the next post), one more point needs to be considered. I’ll just state the point here, because the reason for this requirement should be pretty obvious.
Injecting validation information into the routing system should expose no more information about the peering structure of my AS than can be inferred through data mining of publicly available information. For instance, today I can tell that AS65000 is connected to AS65001. I can probably infer something about their business relationship, as well. What I cannot tell, today, is how many actual eBGP speakers connect the two autonomous systems, nor can I infer anything about the location of those connection points. Revealing this information could lead to some serious security and policy problems for a network operator.
In my last post on securing BGP, I said—
Here I’m going to discuss the problem of a centralized versus distributed database to carry the information needed to secure BGP. There are actually, again, two elements to this problem—a set of pure technical issues, and a set of more business related problems. The technical problems revolve around the CAP theorem, which is something that wants to be discussed in a separate post; I’ll do something on CAP in a separate post next week and link it back to this series.
Before I dive into the technical issues, I want to return to the business issues for a moment. In a call this week on the topic of BGP security, someone pointed out that there is no difference between an advertisement in BGP asserting some piece of information (reachability or connectivity, take your pick), and an advertisements outside BGP asserting this same bit of information. The point of the question is this: if I can’t trust you to advertise the right thing in one setting, then why should I trust you to advertise the right thing in another? More specifically, if you’re using an automated system to build both advertisements, then both are likely to fail at the same time and in the same way.
First, this is an instance of how automation can create a system that is robust yet fragile—which leads directly back to complexity as it applies to computer networks. Remember—if you don’t see the tradeoff, then you’re not looking hard enough.
Second, this is an instance of separating trust for the originator from trust for the transit. Let’s put this in an example that might be more useful. When a credit card company sends you a new card, they send it in a sealed envelope. The sealed envelope doesn’t really do much in the way of security, as it can be opened by anyone along the way. What it does do, however (so long as the tamper resistance of the envelope is good enough) is inform the receiver about whether or not the information received is the same as the information sent. The seal on the envelope cannot make any assertions about the truthfulness of the information contained in the envelope. If the credit card company is a scam, then the credit cared in the envelope is still a scam even if it’s well sealed.
There is only one way to ensure what’s in the envelope is true and useful information—having an trustworthy outsider observe the information, the sender, and the receiver, to make certain they’re all honest. But now we dive into philosophy pretty heavily, and this isn’t a philosophy blog (though I’ve often thought about changing my title to “network philosopher,” just because…).
But it’s worth considering two things about this problem—the idea of having a third party watching to make certain everyone is honest is, itself, full of problems.
First, who watches the watchers? Pretty Good Privacy has always used the concept of a web of trust—this is actually a pretty interesting idea in the realm of BGP security, but one I don’t want to dive in to right this moment (maybe in a later post).
Second, Having some form of verification will necessarily cause any proposed system to rely on third parties to “make it go.” This seems to counter the ideal state of allowing an operator to run their business based on locally available information as much as possible. From a business perspective, there needs to be a balance between obtaining information that can be trusted, and obtaining information in a way that allows the business to operate—particularly in its core realm—without relying on others.
Next time, I’ll get back to the interaction of the CAP theorem and BGP security, with a post around the problem of convergence speed.