IPv6 Security Considerations

When rolling out a new protocol such as IPv6, it is useful to consider the changes to security posture, particularly the network’s attack surface. While protocol security discussions are widely available, there is often not “one place” where you can go to get information about potential attacks, references to research about those attacks, potential counters, and operational challenges. In the case of IPv6, however, there is “one place” you can find all this information: draft-ietf-opsec-v6. This document is designed to provide information to operators about IPv6 security based on solid operational experience—and it is a must read if you have either deployed IPv6 or are thinking about deploying IPv6.

cross posted on CircleID

The draft is broken up into four broad sections; the first is the longest, addressing generic security considerations. The first consideration is whether operators should use Provider Independent (PI) or Provider Assigned (PA) address space. One of the dangers with a large address space is the sheer size of the potential routing table in the Default Free Zone (DFZ). If every network operator opted for an IPv6 /32, the potential size of the DFZ routing table is 2.4 billion routing entries. If you thought converging on about 800,000 routes is bad, just wait ‘til there are 2.4 billion routes. Of course, the actual PI space is being handed out on /48 boundaries, which makes the potential table size exponentially larger. PI space, then, is “bad for the Internet” in some very important ways.

This document provides the other side of the argument—security is an issue with PA space. While IPv6 was supposed to make renumbering as “easy as flipping a switch,” it does not, in fact, come anywhere near this. Some reports indicate IPv6 re-addressing is more difficult than IPv4. Long, difficult renumbering processes indicate many opportunities for failures in security, and hence a large attack surface. Preferring PI space over PA space becomes a matter of reducing the operational attack surface.

Another interesting question when managing an IPv6 network is whether static addressing should be used for some services, or if all addresses should be dynamically learned. There is a perception out there that because the IPv6 address space is so large, it cannot be “scanned” to find hosts to attack. As pointed out in this draft, there is research showing this is simply not true. Further, static addresses may expose specific servers or services to easy recognition by an attacker. The point the authors make here is that either way, endpoint security needs to rely on actual security mechanisms, rather than on hiding addresses in some way.

Other very useful topics considered here are Unique Local Addresses (ULAs), numbering and managing point-to-point links, privacy extensions for SLAAC, using a /64 per host, extension headers, securing DHCP, ND/RA filtering, and control plane security.

If you are deploying, or thinking about deploying, IPv6 in your network, this is a “must read” document.

Research: Tail Attacks on Web Applications

When you think of a Distributed Denial of Service (DDoS) attack, you probably think about an attack which overflows the bandwidth available on a single link; or overflowing the number of half open TCP sessions a device can have open at once, preventing the device from accepting more sessions. In all cases, a DoS or DDoS attack will involve a lot of traffic being pushed at a single device, or across a single link.

TL;DR[time-span]

  • Denial of service attacks do not always require high volumes of traffic
  • An intelligent attacker can exploit the long tail of service queues deep in a web application to bring the service down
  • These kinds of attacks would be very difficult to detect

 

But if you look at an entire system, there are a lot of places where resources are scarce, and hence are places where resources could be consumed in a way that prevents services from operating correctly. Such attacks would not need to be distributed, because they could take much less traffic than is traditionally required to deny a service. These kinds of attacks are called tail attacks, because they attack the long tail of resource pools, where these pools are much thinner, and hence much easier to attack.

There are two probable reasons these kinds of attacks are not often seen in the wild. First, they require an in-depth knowledge of the system under attack. Most of these long tail attacks will take advantage of the interaction surface between two subsystems within the larger system. Each of these interaction surfaces can also be attack surfaces if an attacker can figure out how to access and take advantage of them. Second, these kinds of attacks are difficult to detect, because they do not require large amounts of traffic, or other unusual traffic flows, to launch.

The paper under review today, Tail Attacks on Web Applications, discusses a model for understanding and creating tail attacks in a multi-tier web application—the kind commonly used for any large-scale frontend service, such as ecommerce and social media.

Huasong Shan, Qingyang Wang, and Calton Pu. 2017. Tail Attacks on Web Applications. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS ’17). ACM, New York, NY, USA, 1725-1739. DOI: https://doi.org/10.1145/3133956.3133968

The figure below illustrates a basic service of this kind for those who are not familiar with it.

The typical application at scale will have at least three stages. The first stage will terminate the user’s session and render content; this is normally some form of modified web server. The second stage will gather information from various backend services (generally microservices), and pass the information required to build the page or portal to the rendering engine. The microservices, in turn, build individual parts of the page, and rely on various storage and other services to supply the information needed.

If you can find some way to clog up the queue at one of the storage nodes, you can cause every other service along the information path to wait on the prior service to fulfill its part of the job in hand. This can cause a cascading effect through the system, where a single node struggling because of full queues can cause an entire set of dependent nodes to become effectively unavailable, cascading to a larger set of nodes in the next layer up. For instance, in the network illustrated, if an attacker can somehow cause the queues at storage service 1 to fill up, even for a moment, this can cascade into a backlog of work at services 1 and 2, cascading into a backlog at the front-end service, ultimately slowing—or even shutting—the entire service down. The queues at storage service 1 may be the same size as every other queue in the system (although they are likely smaller, as they face internal, rather than external, services), but storage system 1 may be servicing many hundreds, perhaps thousands, of copies of services 1 and 2.

The queues at storage service 1—and all the other storage services in the system—represent a hidden bottleneck in the overall system. If an attacker can, for a few moments at a time, cause these internal, intra-application queue to fill up, the overall service can be made to slow down to the point of being almost unusable.

How plausible is this kind of attack? The researchers modeled a three-stage system (most production systems have more than three stages) and examined the total queue path through the system. By examining the queue depths at each stage, they devised a way to fill the queues at the first stage in the system by sending millibursts of valid sessions requests to the rend engine, or the use facing piece of the application. Even if these millibursts are spread out across the edge of the application, so long as they are all the same kind of requests, and timed correctly, they can bring the entire system down. In the paper, the researchers go further and show that once you understand the architecture of one such system, it is possible to try different millibursts on a running system, causing the same DoS effect.

This kind of attack, because it is built out of legitimate traffic, and it can be spread across the entire public facing edge of an application, would be nearly impossible to detect or counter at the network edge. One possible counter to this kind of attack would be increasing capacity in the deeper stages of the application. This countermeasure could be expensive, as the data must be stored on a larger number of servers. Further, data synchronized across multiple systems will subject to CAP limitations, which will ultimately limit the speed at which the application can run anyway. Operators could also consider fine grained monitoring, which increases the amount of telemetry that must be recovered from the network and processed—another form of monetary tradeoff.

 

Research: DNSSEC in the Wild

The DNS system is, unfortunately, rife with holes like Swiss Cheese; man-in-the-middle attacks can easily negate the operation of TLS and web site security. To resolve these problems, the IETF and the DNS community standardized a set of cryptographic extensions to cryptographically sign all DNS records. These signatures rely on public/private key pairs that are transitively signed (forming a signature chain) from individual subdomains through the Top Level Domain (TLD). Now that these standards are in place, how heavily is DNSSEC being used in the wild? How much safer are we from man-in-the-middle attacks against TLS and other transport encryption mechanisms?

TL;DR[time-span]

  • DNSSEC is enabled on most top level domains
  • However, DNSSEC is not widely used or deployed beyond these TLDs

 

Crossposted at CircleID

Three researchers published an article in Winter ;login; describing their research into answering this question (membership and login required to read the original article). The result? While more than 90% of the TLDs in DNS are DNSEC enabled, DNSSEC is still not widely deployed or used. To make matter worse, where it is deployed, it isn’t well deployed. The article mentions two specific problems that appear to plague DNSSEC implementations.

First, on the server side, a number of domains either deploy weak or expired keys. An easily compromised key is often worse than having no key at all; there is no way to tell the difference between a key that has or has not been compromised. A weak key that has been compromised does not just impact the domain in question, either. If the weakly protected domain has subdomains, or its key is used to validate other domains in any way, the entire chain of trust through the weak key is compromised. Beyond this, there is a threshold over which a system cannot pass without the entire system, itself, losing the trust of its users. If 30% of the keys returned in DNS are compromised, for instance, most users would probably stop trusting any DNSSEC signed information. While expired keys are more obvious that weak keys, relying on expired keys still works against user trust in the system.

Second, DNSSEC is complex. The net result of a complex protocol combined with low deployment and demand on the server side is poor implementations in client implementations. Many implementations, according to the research in this paper, simply ignore failures in the certification validation process. Some of the key findings of the paper are—

  • One-third of the DNSSEC enabled domains produce responses that cannot be validated
  • While TLD operators widely support DNSSEC, registrars who run authoritative servers rarely support DNSSEC; thus the chain of trust often fails at the fist hop in the resolution process beyond the TLD
  • Only 12% of the resolvers that request DNSSEC records in the query process validate them

To discover the deployment of DNSSEC, the researchers built an authoritative DNS server and a web server to host a few files. They configured subdomains on the authoritative server; some subdomains were configured correctly, while others were configured incorrectly (a certificate was missing, expired, malformed, etc.). By examining DNS requests for the subdomains they configured, they could determine which DNS resolvers were using the included DNSSEC information, and which were not.

Based on their results, the authors of this paper make some specific recommendations, such as enabling DNSSEC on all resolvers, such as the recursive servers your company probably operates for internal and external use. Owners of domain names should also ask their registrars to support DNSSEC on their authoritative servers.

Ultimately, it is up to the community of operators and users to make DNSSEC a reality in the ‘net.

Securing BGP: A Case Study (10)

The next proposed (and actually already partially operational) system on our list is the Router Public Key Infrastructure (RPKI) system, which is described in RFC7115 (and a host of additional drafts and RFCs). The RPKI systems is focused on solving a single solution: validating that the originating AS is authorized to originate a particular prefix. An example will be helpful; we’ll use the network below.

RPKI-Operation

(this is a graphic pulled from a presentation, rather than one of my usual line drawings)

Assume, for a moment, that AS65002 and AS65003 both advertise the same route, 2001:db8:0:1::/64, towards AS65000. How can the receiver determine if both of these two advertisers can actually reach the destination, or only one can? And, if only one can, how can AS65000 determine which one is the “real thing?” This is where the RPKI system comes into play. A very simplified version of the process looks something like this (assuming AS650002 is the true owner of 2001:db8:0:1::/64):

  • AS65002 obtains, from the Regional Internet Registry (labeled the RIR in the diagram), a certificate showing AS65002 has been issued 2001:db8:0:1::/64.
  • AS65002 places this certificate into a local database that is synchronized with all the other operators participating in the routing system.
  • When AS65000 receives a route towards 2001:db8:0:1::/64, it checks this database to make certain the origin AS on the advertisement matches the owning AS.

If the owner and the origin AS match, AS65000 can increase the route’s preference. If it doesn’t AS65000 can reduce the route’s preference. It might be that AS65000 discards the route if the origin doesn’t match—or it may not. For instance, AS65003 may know, from historical data, or through a strong and long standing business relationship, or from some other means, that 2001:db8:0:1::/64 actually belongs to AS65004, even through the RPKI data claims it belongs to AS65002. Resolving such problems falls to the receiving operator—the RPKI simply provides more information on which to act, rather than dictating a particular action to take.

Let’s compare this to our requirements to see how this proposal stacks up, and where there might be objections or problems.

Centralized versus Decentralized: The distribution of the origin authentication information is currently undertaken with rsync, which means the certificate system is decentralized from a technical perspective.

However—there have been technical issues with the rsync solution in the past, such that it can take up to 24 hours to change the distributed database. This is a pretty extreme case of eventual consistency, and it’s a major problem in the global default free zone. BGP might converge very slowly, but it still converges more quickly than 24 hours.

Beyond the technical problems, there is a business side to the centralized/decentralized issue as well. Specifically, many business don’t want their operations impacted by contract issues, negotiation issues, and the like. Many large providers see the RPKI system as creating just such problems, as the “trust anchor” is located in the RIRs. There are ways to mitigate this—just use some other root, or even self sign your certificates—but the RPKI system faces an uphill battle in this are from large transit providers.

Cost: The actual cost of setting up and running a server doesn’t appear to be very high within the RPKI system. The only things you need to “get into the game” are a couple of VMs or physical servers to run rsync, and some way to inject the information gleaned from the RPKI system into the routing decisions along the network edge (which could even be just plugging the information into existing policy mechanisms).

The business issue described above can also be counted as a cost—how much would it cost a provider if their origin authentication were taken out of the database for a day or two, or even a week or two, while a contract dispute with the RIR was worked out?

Information Cost: There is virtually no additional information cost involved in deploying the RPKI.

Other thoughts: The RPKI system wasn’t designed to, and doesn’t, validate anything other than the origin in the AS Path. It doesn’t, therefore, allow an operator to detect AS65003, for instance, claiming to be connected to AS65002 even though it’s not (or it’s not supposed to transit traffic to AS65002). This isn’t really a “lack” on the part of the RPKI, it’s just not something it’s designed to do.

Overall, the RPKI is useful, and will probably be deployed by a number of providers, and shunned by others. It would be a good component of some larger system (again, this was the original intent, so this isn’t a lack), but it cannot stand alone as a complete BGP security system.

Securing BGP: A Case Study (9)

There are a number of systems that have been proposed to validate (or secure) the path in BGP. To finish off this series on BGP as a case study, I only want to look at three of them. At some point in the future, I will probably write a couple of posts on what actually seems to be making it to some sort of deployment stage, but for now I just want to compare various proposals against the requirements outlined in the last post on this topic (you can find that post here).

The first of these systems is BGPSEC—or as it was known before it was called BGPSEC, S-BGP. I’m not going to spend a lot of time explaining how S-BGP works, as I’ve written a series of posts over at Packet Pushers on this very topic:

Part 1: Basic Operation
Part 2: Protections Offered
Part 3: Replays, Timers, and Performance
Part 4: Signatures and Performance
Part 5: Leaks

Considering S-BGP against the requirements:

  • Centralized versus decentralized balance: S-BGP distributes path validation information throughout the internetwork, as this information is actually contained in a new attribute carried with route advertisements. Authorization and authentication are implicitly centralized, however, with the root certificates being held by address allocation authorities. It’s hard to say if this is the correct balance.
  • Cost: In terms of financial costs, S-BGP (or BGPSEC) requires every eBGP speaker to perform complex cryptographic operations in line with receiving updates and calculating the best path to each destination. This effectively means replacing every edge router in every AS in the entire world to deploy the solution—this is definitely not cost friendly. Adding to this cost is the simply increase in the table size required to carry all this information, and the loss of commonly used (and generally effective) optimizations.
  • Information cost: S-BGP leaks new information into the global table as a matter of course—not only can anyone see who is peered with whom by examining information gleaned from route view servers, they can even figure out how many actual pairs of routers connect each AS, and (potentially) what other peerings those same routers serve. This huge new chunk of information about provider topology being revealed simply isn’t acceptable.

Overall, then, BGP-SEC doesn’t meet the requirements as they’ve been outlined in this series of posts. Next week, I’ll spend some time explaining the operation of another potential system, a graph overlay, and then we’ll consider how well it meets the requirements as outlined in these posts.

Securing BGP: A Case Study (8)

Throughout the last several months, I’ve been building a set of posts examining securing BGP as a sort of case study around protocol and/or system design. The point of this series of posts isn’t to find a way to secure BGP specifically, but rather to look at the kinds of problems we need to think about when building such a system. The interplay between technical and business requirements are wide and deep. In this post, I’m going to summarize the requirements drawn from the last seven posts in the series.

Don’t try to prove things you can’t. This might feel like a bit of an “anti-requirement,” but the point is still important. In this case, we can’t prove which path along which traffic will flow. We also can’t enforce policies, specifically “don’t transit this AS;” the best we can do is to provide information and letting other operators make a local decision about what to follow and what not to follow. In the larger sense, it’s important to understand what can, and what can’t, be solved, or rather what the practical limits of any solution might be, as close to the beginning of the design phase as possible.

In the case of securing BGP, I can, at most, validate three pieces of information:

  • That the origin AS in the AS Path matches the owner of the address being advertised.
  • That the AS Path in the advertisement is a valid path, in the sense that each pair of autonomous systems in the AS Path are actually connected, and that no-one has “inserted themselves” in the path silently.
  • The policies of each pair of autonomous systems along the path towards one another. This is completely voluntary information, of course, and cannot be enforced in any way if it is provided, but more information provided will allow for stronger validation.

There is a fine balance between centralized and distributed systems. There are actually things that can be centralized or distributed in terms of BGP security: how ownership is claimed over resources, and how the validation information is carried to each participating AS. In the case of ownership, the tradeoff is between having a widely trusted third party validate ownership claims and having a third party who can shut down an entire business. In the case of distributing the information, there is a tradeoff between the consistency and the accessibility of the validation information. These are going to be points on which reasonable people can disagree, and hence are probably areas where the successful system must have a good deal of flexibility.

Cost is a major concern. There are a number of costs that need to be considered when determining which solution is best for securing BGP, including—

  • Physical equipment costs. The most obvious cost is the physical equipment required to implement each solution. For instance, any solution that requires providers to replace all their edge routers is simply not going to be acceptable.
  • Process costs. Any solution that requires a lot of upkeep and maintenance is going to be cast aside very quickly. Good intentions are overruled by the tyranny of the immediate about 99.99% of the time.

Speed is also a cost that can be measured in business terms; if increasing security decreases the speed of convergence, providers who deploy security are at a business disadvantage relative to their competitors. The speed of convergence must be on the order of Internet level convergence today.

Information costs are a particularly important issue. There are at least three kinds of information that can leak out of any attempt to validate BGP, each of them related to connectivity—

  • Specific information about peering, such as how many routers interconnect two autonomous systems, where interconnections are, and how interconnection points are related to one another.
  • Publicly verifiable claims about interconnection. Many providers argue there is a major difference between connectivity information that can be observed and connectivity information that is claimed.
  • Publicly verifiable information about business relationships. Virtually every provider considers it important not to release at least some information about their business relationships with other providers and customers.

While there is some disagreement in the community over each of these points, it’s clear that releasing the first of these is almost always going to be unacceptable, while the second and third are more situational.

With these requirements in place, it’s time to look at a couple of proposed systems to see how they measure up.

Information wants to be protected: Security as a mindset

George-Orwell-house-big-brotherI was teaching a class last week and mentioned something about privacy to the students. One of them shot back, “you’re paranoid.” And again, at a meeting with some folks about missionaries, and how best to protect them when trouble comes to their door, I was again declared paranoid. In fact, I’ve been told I’m paranoid after presentations by complete strangers who were sitting in the audience.

Okay, so I’m paranoid. I admit it.

But what is there to be paranoid about? We’ve supposedly gotten to the point where no-one cares about privacy, where encryption is pointless because everyone can see everything anyway, and all the rest. Everyone except me, that is—I’ve not “gotten over it,” nor do I think I ever will. In fact, I don’t think any engineer should “get over it,” in terms of privacy and security. Even if you think it’s not a big deal in your own life, engineers should learn to treat other people’s information with the utmost care.

In moving from the person to the digital representation of the person, we often forget it’s someone’s life we’re actually playing with. I think it’s time for engineers to take security—and privacy—personally. It’s time to actually do what we say we do, and make security a part of the design from day one, rather than something tacked on to the end.

And I don’t care if you think I’m paranoid.

Maybe it’s time to replace the old saying information wants to be free. Perhaps we should replace it with something a little more realistic, like:

Information wants to be protected.

It’s true that there are many different kinds of information. For instance, there’s the information contained in a song, or the information contained in a book, or a blog, or information about someone’s browsing history. Each piece of information has a specific intent, or purpose, a goal for which it was created. Engineers should make their default design such that information is only used for its intended purpose by the creator (or owner) of that information. We should design this into our networks, into our applications, and into our thought patterns. It’s all too easy to think, “we’ll get to security once things are done, and there’s real data being pushed into the system.” And then it’s too easy to think, “no-one has complained, and the world didn’t fall apart, so I’ll do it later.”

But what does it mean to design security into the system from day one? This is often, actually, the hard part. There are tradeoffs, particularly costs, involved with security. These costs might be in terms of complexity, which makes our jobs harder, or in terms of actual costs to bring the system up in the first place.

But if we don’t start pushing back, who will? The users? Most of them don’t even begin to understand the threat. The business folks who pay for the networks and applications we build? Not until they’re convinced there’s an ROI they can get their minds around. Who’s going to need to build that ROI? We are.

A good place to start might be here.

And we’re not going to until we all start nurturing the little security geek inside every engineer, until we start taking security (and privacy) a little more seriously. Until we stop thinking about this stuff as just bits on the wire, and start thinking about it as people’s lives. Until we reset our default to “just a little paranoid,” perhaps.


P.S. I’m not so certain we should get over it. Somehow I think we’re losing something of ourselves in this process of opening our lives to anyone and everyone, and I fear that by the time we figure out what it is we’re losing, it’ll be too late to reverse the process. Somehow I think that treating other people as a product (if the service is free, you are the product) is just wrong in ways we’ve not yet been able to define.