Can you really trust what a routing protocol tells you about how to reach a given destination? Ivan Pepelnjak joins Nick Russo and Russ White to provide a longer version of the tempting one-word answer: no! Join us as we discuss a wide range of issues including third-party next-hops, BGP communities, and the RPKI.
The security of the global routing table is foundational to the security of the overall Internet as an ecosystem—if routing cannot be trusted, then everything that relies on routing is suspect, as well. Mutually Agreed Norms for Routing Security (MANRS) is a project of the Internet Society designed to draw network operators of all kinds into thinking about, and doing something about, the security of the global routing table by using common-sense filtering and observation. Andrei Robachevsky joins Russ White and Tom Ammon to talk about MANRS.
A long time ago, I worked in a secure facility. I won’t disclose the facility; I’m certain it no longer exists, and the people who designed the system I’m about to describe are probably long retired. Soon after being transferred into this organization, someone noted I needed to be trained on how to change the cipher door locks. We gathered up a ladder, placed the ladder just outside the door to the secure facility, popped open one of the tiles on the drop ceiling, and opened a small metal box with a standard, low security key. Inside this box was a jumper board that set the combination for the secure door.
First lesson of security: there is (almost) always a back door.
I was reminded of this while reading a paper recently published about a backdoor attack on certificate authorities. There are, according to the paper, around 130 commercial Certificate Authorities (CAs). Each of these CAs issue widely trusted certificates used for everything from TLS to secure web browsing sessions to RPKI certificates used to validate route origination information. When you encounter these certificates, you assume at least two things: the private key in the public/private key pair has not been compromised, and the person who claims to own the key is really the person you are talking to. The first of these two can come under attack through data breaches. The second is the topic of the paper in question.
How do CAs validate the person asking for a certificate actually is who they claim to be? Do they work for the organization they are obtaining a certificate for? Are they the “right person” within that organization to ask for a certificate? Shy of having a personal relationship with the person who initiates the certificate request, how can the CA validate who this person is and if they are authorized to make this request?
They could do research on the person—check their social media profiles, verify their employment history, etc. They can also send them something that, in theory, only that person can receive, such as a physical letter, or an email sent to their work email address. To be more creative, the CA can ask the requestor to create a small file on their corporate web site with information supplied by the CA. In theory, these electronic forms of authentication should be solid. After all, if you have administrative access to a corporate web site, you are probably working in information technology at that company. If you have a work email address at a company, you probably work for that company.
These electronic forms of authentication, however, can turn out to be much like the small metal box which holds the jumper board that sets the combination just outside the secure door. They can be more security theater than real security.
In fact, the authors of this paper found that some 70% of the CAs could be tricked into issuing a certificate for just about any organization—by hijacking a route. Suppose the CA asks the requestor to place a small file containing some supplied information on the corporate web site. The attacker creates a web server, inserts the file, hijacks the route to the corporate web site so it points at the fake web site, waits for the authentication to finish, and then removes the hijacked route.
The solution recommended in this paper is for the CAs to use multiple overlapping factors when authenticating a certificate requestor—which is always a good security practice. Another solution recommended by the authors is to monitor your BGP tables from multiple “views” on the Internet to discover when someone has hijacked your routes, and take active measures to either remove the hijack, or at least to detect the attack.
These are all good measures—ones your organization should already be taking.
But the larger point should be this: putting a firewall in front of your network is not enough. Trusting that others will “do their job correctly,” and hence that you can trust the claims of certificates or CAs, is not enough. The Internet is a low trust environment. You need to think about the possible back doors and think about how to close them (or at least know when they have been opened).
Having personal relationships with people you do business with is a good start. Being creative in what you monitor and how is another. Firewalls are not enough. Two-factor authentication is not enough. Security is systemic and needs to be thought about holistically.
There are always back doors.
Much like most other problems in technology, securing the reachability (routing) information in the internet core as much or more of a people problem than it is a technology problem. While BGP security can never be perfect (in an imperfect world, the quest for perfection is often the cause of a good solution’s failure), there are several solutions which could be used to provide the information network operators need to determine if they can trust a particular piece of routing information or not. For instance, graph overlays for path validation, or the RPKI system for origin validation. Solving the technical problem, however, only carries us a small way towards “solving the problem.”
One of the many ramifications of deploying a new system—one we do not often think about from a purely technology perspective—is the legal ramifications. Assume, for a moment, that some authority were to publicly validate that some address, such as 2001:db8:3e8:1210::/64, belongs to a particular entity, say bigbank, and that the AS number of this same entity is 65000. On receiving an update from a BGP peer, if you note the route to x:1210::/64 ends in AS 65000, you might think you are safe in using this path to reach destinations located in bigbank’s network.
What if the route has been hijacked? What if the validator is wrong, and has misidentified—or been fooled into misidentifying—the connection between AS65000 and the x:1210::/64 route? What if, based on this information, critical financial information is transmitted to an end point which ultimately turns out to be an attacker, and this attacker uses this falsified routing information to steal millions (or billions) of dollars?
Who is responsible? This legal question ultimately plays into the way numbering authorities allow the certificates they issue to be used. Numbering authorities—specifically ARIN, which is responsible for numbering throughout North America—do not want the RPKI data misused in a way that can leave them legally responsible for the results. Some background is helpful.
The RPKI data, in each region, is stored in a database; each RPKI object (essentially and loosely) contains an origin AS/IP address pair. These are signed using a private key and can be validated using the matching public key. Somehow the public key itself must be validated; ultimately, there is a chain, or hierarchy, of trust, leading to some sort of root. The trust anchor is described in a file called the Trust Anchor Locator, or TAL. ARIN wraps access to their TAL in a strong indemnification clause to protect themselves from the sort of situation described above (and others). Many companies, particularly in the United States, will not accept the legal contract involved without a thorough investigation of their own culpability in any given situation involving misrouting traffic, which ultimately means many companies will simply not use the data, and RPKI is not deployed.
The essential point the paper makes is: is this clause really necessary? Thy authors make several arguments towards removing the strict legal requirements around the use of the data in the TAL provided by ARIN. First, they argue the bounds of potential liability are uncertain, and will shift as the RPKI is more widely deployed. Second, they argue the situations where harm can come from use of the RPKI data needs to be more carefully framed and understood, and how these kinds of legal issues have been used in the past. To this end, the authors argue strict liability is not likely to be raised, and negligence liability can probably be mitigated. They offer an alternative mechanism using straight contract law to limit the liability to ARIN in situations where the RPKI data is misused or incorrect.
Whether this paper causes ARIN to rethink its legal position or not is yet to be seen. At the same time, while these kinds of discussions often leave network engineers flat-out bored, the implications for the Internet are important. This is an excellent example of an intersection between technology and policy, a realm network operators and engineers need to pay more attention to.
At NANOG on the Road (NotR) in September of 2018, I participated in a panel on BGP security—specifically the deployment of Route Origin Authentication (ROA), with some hints and overtones of path validation by carrying signatures in BGP updates (BGPsec). This is an area I have been working in for… 20 years? … at this point, so I have seen the argument develop across these years many times, and in many ways. What always strikes me about this discussion, whenever and wherever it is aired, is the clash between business realities and the desire for “someone to do something about routing security in the DFZ, already!” What also strikes me about these conversations it the number of times very fundamental concepts end up being explained to folks who are “new to the problem.”
- BGP security is a business problem first, and a technology problem second
- Signed information is only useful insofar as it is maintained
- The cost of deployment must be lower than the return on that cost
- Local policy will always override global policy—as it should
- The fear of losing business is a stronger motivator than gaining new business
Part of the problem here is solutions considered “definitive and final” have been offered, the operator community has rejected them for many years, and yet these same solutions are put on the table year after year—like the perennial fruit cake made by someone’s great great aunt in the mists of Christmastime history that has been regifted so many times no-one really remembers where it came from, nor what sorts of fruit it actually contains.
The business reality, in terms of BGP security, is simple. To deploy some sort of check on the global routing table, at point must be reached where it costs more to not deploy it than to deploy it. This simply business reality is something network designers and architects beat their heads against every day. The solution can be the neatest solution in the world. It might even shop for the ingredients, mix the cookie dough in perfect proportions, bake the cookies, and then transport them to the proper location for the perfect amount of enjoyment from just the right people (insert Goldilocks here, perhaps). But none of this will ever matter if there is no financial upside, or if the financial risks are greater than the financial gains.
So, some hopefully helpful business realities.
Signed information is no more useful than unsigned information if it is not kept up to date. It is great to get everyone out in full force to build a cryptographically secured database of who owns what prefixes and AS numbers. It is wonderful to find a way to distribute that information throughout the ‘net so people can use it as another tool to determine whether or not to accept a route, or what weight to place on that route. People might even spend a bit of time building this database, just because they believe it is good for the community.
The problem is not day one. It is day two, and then day two thousand. What motivation is there to keep the information in this database up-to-date? Unless there is some—and here I mean financial motivation—the database will lose its effectiveness over time. At some point, when the error rate reaches some number (around 30% seems to be about right), people will simply stop trusting it. When the error rate gets high enough, the tools will stop being used, and the ‘net will revert to its old self.
The cost of deployment and operation must at least be close to the gains from deployment. At this point, there is no financial gain any one company can see from deploying anything in the realm of BGP security, so the cost of deployment and maintenance must be close to zero. While there are folks (including me) trying to reduce the cost as close to zero as possible, we are not there yet, and I do not know if we will ever be there.
Local policy will always override global policy. The literature of BGP security is replete with statements like: “if the route meets this criteria, the BGP speaker MUST drop it.” Good luck. The Internet is a confederation of independent companies, each of which runs their network in a way that they believe will make the most money at the lowest cost possible. One of the ways this happens is that people tune their local policies to charge their adjacent autonomous systems as much money as possible, while reducing their OPEX and CAPEX as much as possible. There will always be money to be made in the grey space around local tuning of policies for optimal traffic flow. Hence local policy will always win over what any database anyplace might say.
To give a specific example: assume you run a network, and you have peered with another operator in multiple places for many years. You know the other operator’s routes well, as they have not changed for many years. One day, you receive a route from this operator in which everything looks correct, but the route is not contained in this outside database of “correct routes.”
Noting this route is a route you have received from this very same operator for many years, are you going to drop it because it’s not right in some database, or are you going to use it given your past standing and relationship?
The fear of losing business is always the strongest motivator. Which leads to the next issue. It does not matter how wonderful your network is if you have a high customer churn rate. The most certain way to have a high churn rate is to place your customer’s experience in the hands of someone else. Such as a communally managed database, perhaps. This is another reason why local policy will always win over remote policy—the local provider is handed checks by customers, not the community at large. They have more incentive to keep their customers happy than the community.
You cannot secure things you do not tell anyone about. This final one is probably not as obvious as the others, but it is just as important as any other item on this list. There are many backdoor arrangements and sealed contracts in the provider world. People transit traffic without telling anyone else that traffic is being transited. Some people are customers of others only in the event of a massive failure someplace else in their network, but do not want anyone to know about this.
All of these arrangements are perfectly legitimate and legal in their respective jurisdictions. But you cannot secure something that no-one knows about. The more information that is hidden in a system, the harder it is to validate the information that exists is correct.
The bottom line is this: BGP security, like most networking problems, is not a technology problem. BGP security is, at its heart, a business problem. The lesson is here not just for security, but for network engineering in general. Business is the bottom line, not technology.
The Resource Public Key Infrastructure (RPKI) system is designed to prevent hijacking of routes at their origin AS. If you don’t know how this system works (and it is likely you don’t, because there are only a few deployments in the world), you can review the way the system works by reading through this post here on rule11.tech.
The paper under review today examines how widely Route Origin Validation (ROV) based on the RPKI system has been deployed. The authors began by determining which Autonomous Systems (AS’) are definitely not deploying route origin validation. They did this by comparing the routes in the global RPKI database, which is synchronized among all the AS’ deploying the RPKI, to the routes in the global Default Free Zone (DFZ), as seen from 44 different route servers located throughout the world. In comparing these two, they found a set of routes which the RPKI system indicated should be originated from one AS, but were actually being originated from another AS in the default free zone.
Using this information, the researchers then looked for AS’ through which these routes with a mismatched RPKI and global table origin were advertised. If an AS accepted, and then readvertised, routes with mismatched RPKI and global table origins, they marked this AS as one that does not enforce route origin authentication.
A second, similar check was used to find the mirror set of AS’, those that do perform a route origin validation check. In this case, the authors traced the same type of route—those for which the origin AS the route is advertised with does not match the originating AS in the RPKI–and discovered some AS’ will not readvertise such a route. These AS’ apparently do perform a check for the correct route origin information.
The result is that only one of the 20 Internet Service Providers (ISPs) with the largest number of customers performs route origination validation on the routes they receive. Out of the largest 100 ISPs (again based on customer AS count), 22 appear to perform a route origin validation check. These are very low numbers.
To double check these numbers, the researchers surveyed a group of ISPs, and found that very few of them claim to check the routes they receive against the RPKI database. Why is this? When asked, these providers gave two reasons.
First, these providers are concerned about the problems involved with their connectivity being impacted in the case of an RPKI system failure. For instance, it would be easy enough for a company to become involved in a contract dispute with their naming authority, or with some other organization (two organizations claiming the same AS number, for instance). These kinds of cases could result in many years of litigation, causing a company to effectively lose their connectivity to the global ‘net during the process. This might seem like a minor fear for some, and there might be possible mitigations, but the ‘net is much more statically defined than many people realize, and many operators operate on a razor thin margin. The disruptions caused by such an event could simply put a company out of business.
Second, there is a general perception that the RPKI database is not exactly a “clean” representation of the real world. Since the database is essentially self-reported, there is little incentive to make changes to the database once something in the real world has changed (such as the transfer of address space between organization). It only takes a small amount of old, stale, or incorrect information to reduce the usefulness of this kind of public database. The authors address this concern by examining the contents of the RPKI, and find that it does, in fact, contain a good bit of incorrect information. They develop a tool to help administrators find this information, but ultimately people must use these kinds of tools.
The point of the paper is that the RPKI system, which is seen as crucial to the security of the global Internet, is not being widely used, and deployment does not appear to be increasing over time. One possible takeaway is the community needs to band together and deploy this technology. Another might be that the RPKI is not a viable solution to the problem at hand for various technical and social reasons—it might be time to start looking for another alternative for solving this problem.