The RPKI, for those who do not know, ties the origin AS to a prefix using a certificate (the Route Origin Authorization, or ROA) signed by a third party. The third party, in this case, is validating that the AS in the ROA is authorized to advertise the destination prefix in the ROA—if ROA’s were self-signed, the security would be no better than simply advertising the prefix in BGP. Who should be able to sign these ROAs? The assigning authority makes the most sense—the Regional Internet Registries (RIRs), since they (should) know which company owns which set of AS numbers and prefixes.
The general idea makes sense—you should not accept routes from “just anyone,” as they might be advertising the route for any number of reasons. An operator could advertise routes to source spam or phishing emails, or some government agency might advertise a route to redirect traffic, or block access to some web site. But … if you haven’t found the tradeoffs, you haven’t looked hard enough. Security, in particular, is replete with tradeoffs.
Every time you deploy some new security mechanism, you create some new attack surface—sometimes more than one. Deploy a stateful packet filter to protect a server, and the device itself becomes a target of attack, including buffer overflows, phishing attacks to gain access to the device as a launch-point into the private network, and the holes you have to punch in the filters to allow services to work. What about the RPKI?
When the RKI was first proposed, one of my various concerns was the creation of new attack services. One specific attack surface is the control a single organization—the issuing RIR—has over the very existence of the operator. Suppose you start a new content provider. To get the new service up and running, you sign a contract with an RIR for some address space, sign a contract with some upstream provider (or providers), set up your servers and service, and start advertising routes. For whatever reason, your service goes viral, netting millions of users in a short span of time.
Now assume the RIR receives a complaint against your service for whatever reason—the reason for the complaint is not important. This places the RIR in the position of a prosecutor, defense attorney, and judge—the RIR must somehow figure out whether or not the charges are true, figure out whether or not taking action on the charges is warranted, and then take the action they’ve settled on.
In the case of a government agency (or a large criminal organization) making the complaint, there is probably going to be little the RIR can do other than simply revoke your certificate, pulling your service off-line.
Overnight your business is gone. You can drag the case through the court system, of course, but this can take years. In the meantime, you are losing users, other services are imitating what you built, and you have no money to pay the legal fees.
A true story—without the names. I once knew a man who worked for a satellite provider, let’s call them SATA. Now, SATA’s leadership decided they had no expertise in accounts receivables, and they were spending too much time on trying to collect overdue bills, so they outsourced the process. SATB, a competing service, decided to buy the firm SATA outsourced their accounts receivables to. You can imagine what happens next… The accounting firm worked as hard as it could to reduce the revenue SATA was receiving.
Of course, SATA sued the accounting firm, but before the case could make it to court, SATA ran out of money, laid off all their people, and shut their service down. SATA essentially went out of business. They won some money later, in court, but … whatever money they won was just given to the investors of various kinds to make up for losses. The business itself was gone, permanently.
Herein lies the danger of giving a single entity like an RIR, even if they are friendly, honest, etc., control over a critical resource.
A recent paper presented at the ANRW at APNIC caught my attention as a potential way to solve this problem. The idea is simple—just allow (or even require) multiple signatures on a ROA. To be more accurate, each authorizing party issues a “partial certificate;” if “enough” pieces of the certificate are found and valid, the route will be validated.
The question is—how many signatures (or parts of the signature, or partial attestations) should be enough? The authors of the paper suggest there should be a “Threshold Signature Module” that makes this decision. The attestations of the various signers are combined in the threshold module to produce a single signature that is then used to validate the route. This way the validation process on the router remains the same, which means the only real change in the overall RPKI system is the addition of the threshold module.
If one RIR—even the one that allocated the addresses you are using—revokes their attestation on your ROA, the remaining attestations should be enough to convince anyone receiving your route that it is still valid. Since there are five regions, you have at least five different choices to countersign your ROA. Each RIR is under the control of a different national government; hence organizations like governments (or criminals!) would need to work across multiple RIRs and through other government organizations to have a ROA completely revoked.
An alternate solutions here, one that follows the PGP model, might be to simply have the threshold signature model consider the number and source of ROAs using the existing model. Local policy could determine how to weight attestations from different RIRs, etc.
This multiple or “shared” attestation (or signature) idea seems like a neat way to work around one of (possibly the major) attack surfaces introduced by the RPKI system. If you are interested in Internet core routing security, you should take a read through the post linked above, and then watch the video.
Can you really trust what a routing protocol tells you about how to reach a given destination? Ivan Pepelnjak joins Nick Russo and Russ White to provide a longer version of the tempting one-word answer: no! Join us as we discuss a wide range of issues including third-party next-hops, BGP communities, and the RPKI.
The security of the global routing table is foundational to the security of the overall Internet as an ecosystem—if routing cannot be trusted, then everything that relies on routing is suspect, as well. Mutually Agreed Norms for Routing Security (MANRS) is a project of the Internet Society designed to draw network operators of all kinds into thinking about, and doing something about, the security of the global routing table by using common-sense filtering and observation. Andrei Robachevsky joins Russ White and Tom Ammon to talk about MANRS.
A long time ago, I worked in a secure facility. I won’t disclose the facility; I’m certain it no longer exists, and the people who designed the system I’m about to describe are probably long retired. Soon after being transferred into this organization, someone noted I needed to be trained on how to change the cipher door locks. We gathered up a ladder, placed the ladder just outside the door to the secure facility, popped open one of the tiles on the drop ceiling, and opened a small metal box with a standard, low security key. Inside this box was a jumper board that set the combination for the secure door.
First lesson of security: there is (almost) always a back door.
I was reminded of this while reading a paper recently published about a backdoor attack on certificate authorities. There are, according to the paper, around 130 commercial Certificate Authorities (CAs). Each of these CAs issue widely trusted certificates used for everything from TLS to secure web browsing sessions to RPKI certificates used to validate route origination information. When you encounter these certificates, you assume at least two things: the private key in the public/private key pair has not been compromised, and the person who claims to own the key is really the person you are talking to. The first of these two can come under attack through data breaches. The second is the topic of the paper in question.
How do CAs validate the person asking for a certificate actually is who they claim to be? Do they work for the organization they are obtaining a certificate for? Are they the “right person” within that organization to ask for a certificate? Shy of having a personal relationship with the person who initiates the certificate request, how can the CA validate who this person is and if they are authorized to make this request?
They could do research on the person—check their social media profiles, verify their employment history, etc. They can also send them something that, in theory, only that person can receive, such as a physical letter, or an email sent to their work email address. To be more creative, the CA can ask the requestor to create a small file on their corporate web site with information supplied by the CA. In theory, these electronic forms of authentication should be solid. After all, if you have administrative access to a corporate web site, you are probably working in information technology at that company. If you have a work email address at a company, you probably work for that company.
These electronic forms of authentication, however, can turn out to be much like the small metal box which holds the jumper board that sets the combination just outside the secure door. They can be more security theater than real security.
In fact, the authors of this paper found that some 70% of the CAs could be tricked into issuing a certificate for just about any organization—by hijacking a route. Suppose the CA asks the requestor to place a small file containing some supplied information on the corporate web site. The attacker creates a web server, inserts the file, hijacks the route to the corporate web site so it points at the fake web site, waits for the authentication to finish, and then removes the hijacked route.
The solution recommended in this paper is for the CAs to use multiple overlapping factors when authenticating a certificate requestor—which is always a good security practice. Another solution recommended by the authors is to monitor your BGP tables from multiple “views” on the Internet to discover when someone has hijacked your routes, and take active measures to either remove the hijack, or at least to detect the attack.
These are all good measures—ones your organization should already be taking.
But the larger point should be this: putting a firewall in front of your network is not enough. Trusting that others will “do their job correctly,” and hence that you can trust the claims of certificates or CAs, is not enough. The Internet is a low trust environment. You need to think about the possible back doors and think about how to close them (or at least know when they have been opened).
Having personal relationships with people you do business with is a good start. Being creative in what you monitor and how is another. Firewalls are not enough. Two-factor authentication is not enough. Security is systemic and needs to be thought about holistically.
There are always back doors.
Much like most other problems in technology, securing the reachability (routing) information in the internet core as much or more of a people problem than it is a technology problem. While BGP security can never be perfect (in an imperfect world, the quest for perfection is often the cause of a good solution’s failure), there are several solutions which could be used to provide the information network operators need to determine if they can trust a particular piece of routing information or not. For instance, graph overlays for path validation, or the RPKI system for origin validation. Solving the technical problem, however, only carries us a small way towards “solving the problem.”
One of the many ramifications of deploying a new system—one we do not often think about from a purely technology perspective—is the legal ramifications. Assume, for a moment, that some authority were to publicly validate that some address, such as 2001:db8:3e8:1210::/64, belongs to a particular entity, say bigbank, and that the AS number of this same entity is 65000. On receiving an update from a BGP peer, if you note the route to x:1210::/64 ends in AS 65000, you might think you are safe in using this path to reach destinations located in bigbank’s network.
What if the route has been hijacked? What if the validator is wrong, and has misidentified—or been fooled into misidentifying—the connection between AS65000 and the x:1210::/64 route? What if, based on this information, critical financial information is transmitted to an end point which ultimately turns out to be an attacker, and this attacker uses this falsified routing information to steal millions (or billions) of dollars?
Who is responsible? This legal question ultimately plays into the way numbering authorities allow the certificates they issue to be used. Numbering authorities—specifically ARIN, which is responsible for numbering throughout North America—do not want the RPKI data misused in a way that can leave them legally responsible for the results. Some background is helpful.
The RPKI data, in each region, is stored in a database; each RPKI object (essentially and loosely) contains an origin AS/IP address pair. These are signed using a private key and can be validated using the matching public key. Somehow the public key itself must be validated; ultimately, there is a chain, or hierarchy, of trust, leading to some sort of root. The trust anchor is described in a file called the Trust Anchor Locator, or TAL. ARIN wraps access to their TAL in a strong indemnification clause to protect themselves from the sort of situation described above (and others). Many companies, particularly in the United States, will not accept the legal contract involved without a thorough investigation of their own culpability in any given situation involving misrouting traffic, which ultimately means many companies will simply not use the data, and RPKI is not deployed.
The essential point the paper makes is: is this clause really necessary? Thy authors make several arguments towards removing the strict legal requirements around the use of the data in the TAL provided by ARIN. First, they argue the bounds of potential liability are uncertain, and will shift as the RPKI is more widely deployed. Second, they argue the situations where harm can come from use of the RPKI data needs to be more carefully framed and understood, and how these kinds of legal issues have been used in the past. To this end, the authors argue strict liability is not likely to be raised, and negligence liability can probably be mitigated. They offer an alternative mechanism using straight contract law to limit the liability to ARIN in situations where the RPKI data is misused or incorrect.
Whether this paper causes ARIN to rethink its legal position or not is yet to be seen. At the same time, while these kinds of discussions often leave network engineers flat-out bored, the implications for the Internet are important. This is an excellent example of an intersection between technology and policy, a realm network operators and engineers need to pay more attention to.
At NANOG on the Road (NotR) in September of 2018, I participated in a panel on BGP security—specifically the deployment of Route Origin Authentication (ROA), with some hints and overtones of path validation by carrying signatures in BGP updates (BGPsec). This is an area I have been working in for… 20 years? … at this point, so I have seen the argument develop across these years many times, and in many ways. What always strikes me about this discussion, whenever and wherever it is aired, is the clash between business realities and the desire for “someone to do something about routing security in the DFZ, already!” What also strikes me about these conversations it the number of times very fundamental concepts end up being explained to folks who are “new to the problem.”
- BGP security is a business problem first, and a technology problem second
- Signed information is only useful insofar as it is maintained
- The cost of deployment must be lower than the return on that cost
- Local policy will always override global policy—as it should
- The fear of losing business is a stronger motivator than gaining new business
Part of the problem here is solutions considered “definitive and final” have been offered, the operator community has rejected them for many years, and yet these same solutions are put on the table year after year—like the perennial fruit cake made by someone’s great great aunt in the mists of Christmastime history that has been regifted so many times no-one really remembers where it came from, nor what sorts of fruit it actually contains.
The business reality, in terms of BGP security, is simple. To deploy some sort of check on the global routing table, at point must be reached where it costs more to not deploy it than to deploy it. This simply business reality is something network designers and architects beat their heads against every day. The solution can be the neatest solution in the world. It might even shop for the ingredients, mix the cookie dough in perfect proportions, bake the cookies, and then transport them to the proper location for the perfect amount of enjoyment from just the right people (insert Goldilocks here, perhaps). But none of this will ever matter if there is no financial upside, or if the financial risks are greater than the financial gains.
So, some hopefully helpful business realities.
Signed information is no more useful than unsigned information if it is not kept up to date. It is great to get everyone out in full force to build a cryptographically secured database of who owns what prefixes and AS numbers. It is wonderful to find a way to distribute that information throughout the ‘net so people can use it as another tool to determine whether or not to accept a route, or what weight to place on that route. People might even spend a bit of time building this database, just because they believe it is good for the community.
The problem is not day one. It is day two, and then day two thousand. What motivation is there to keep the information in this database up-to-date? Unless there is some—and here I mean financial motivation—the database will lose its effectiveness over time. At some point, when the error rate reaches some number (around 30% seems to be about right), people will simply stop trusting it. When the error rate gets high enough, the tools will stop being used, and the ‘net will revert to its old self.
The cost of deployment and operation must at least be close to the gains from deployment. At this point, there is no financial gain any one company can see from deploying anything in the realm of BGP security, so the cost of deployment and maintenance must be close to zero. While there are folks (including me) trying to reduce the cost as close to zero as possible, we are not there yet, and I do not know if we will ever be there.
Local policy will always override global policy. The literature of BGP security is replete with statements like: “if the route meets this criteria, the BGP speaker MUST drop it.” Good luck. The Internet is a confederation of independent companies, each of which runs their network in a way that they believe will make the most money at the lowest cost possible. One of the ways this happens is that people tune their local policies to charge their adjacent autonomous systems as much money as possible, while reducing their OPEX and CAPEX as much as possible. There will always be money to be made in the grey space around local tuning of policies for optimal traffic flow. Hence local policy will always win over what any database anyplace might say.
To give a specific example: assume you run a network, and you have peered with another operator in multiple places for many years. You know the other operator’s routes well, as they have not changed for many years. One day, you receive a route from this operator in which everything looks correct, but the route is not contained in this outside database of “correct routes.”
Noting this route is a route you have received from this very same operator for many years, are you going to drop it because it’s not right in some database, or are you going to use it given your past standing and relationship?
The fear of losing business is always the strongest motivator. Which leads to the next issue. It does not matter how wonderful your network is if you have a high customer churn rate. The most certain way to have a high churn rate is to place your customer’s experience in the hands of someone else. Such as a communally managed database, perhaps. This is another reason why local policy will always win over remote policy—the local provider is handed checks by customers, not the community at large. They have more incentive to keep their customers happy than the community.
You cannot secure things you do not tell anyone about. This final one is probably not as obvious as the others, but it is just as important as any other item on this list. There are many backdoor arrangements and sealed contracts in the provider world. People transit traffic without telling anyone else that traffic is being transited. Some people are customers of others only in the event of a massive failure someplace else in their network, but do not want anyone to know about this.
All of these arrangements are perfectly legitimate and legal in their respective jurisdictions. But you cannot secure something that no-one knows about. The more information that is hidden in a system, the harder it is to validate the information that exists is correct.
The bottom line is this: BGP security, like most networking problems, is not a technology problem. BGP security is, at its heart, a business problem. The lesson is here not just for security, but for network engineering in general. Business is the bottom line, not technology.