Dispersing a DDoS: Initial thoughts on DDoS protection

[time-span]

Distributed Denial of Service is a big deal—huge pools of Internet of Things (IoT) devices, such as security cameras, are compromised by botnets and being used for large scale DDoS attacks. What are the tools in hand to fend these attacks off? The first misconception is that you can actually fend off a DDoS attack. There is no magical tool you can deploy that will allow you to go to sleep every night thinking, “tonight my network will not be impacted by a DDoS attack.” There are tools and services that deploy various mechanisms that will do the engineering and work for you, but there is no solution for DDoS attacks.

One such reaction tool is spreading the attack. In the network below, the network under attack has six entry points.

Assume the attacker has IoT devices scattered throughout AS65002 which they are using to launch an attack. Due to policies within AS65002, the DDoS attack streams are being forwarded into AS65001, and thence to A and B. It would be easy to shut these two links down, forcing the traffic to disperse across five entries rather than two (B, C, D, E, and F). By splitting the traffic among five entry points, it may be possible to simply eat the traffic—each flow is now less than one half the size of the original DDoS attack, perhaps within the range of the servers at these entry points to discard the DDoS traffic.

However—this kind of response plays into the attacker’s hand, as well. Now any customer directly attached to AS65001, such as G, will need to pass through AS65002, from whence the attacker has launched the DDoS, and enter into the same five entry points. How happy do you think the customer at G would be in this situation? Probably not very…

Is there another option? Instead of shutting down these two links, it would make more sense to try to reduce the volume of traffic coming through the links and leave them up. To put it more shortly—if the DDoS attack is reducing the total amount of available bandwidth you have at the edge of your network, it does not make a lot of sense to reduce the available amount of bandwidth at your edge in response. What you want to do, instead, is reapportion the traffic coming in to each edge so you have a better chance of allowing the existing servers to simply discard the DDoS attack.

One possible solution is to prepend the AS path of the anycast address being advertised from one of the service instances. Here, you could add one prepend to the route advertisement from C, and check to see if the attack traffic is spread more evenly across the three sites. As we’ve seen in other posts, however, this isn’t always an effective solution (see these three posts). Of course, if this is an anycast service, we can’t really break up the address space into smaller bits. So what else can be done?

There is a way to do this with BGP—using communities to restrict the scope of the routes being advertised by A and B. For instance, you could begin by advertising the routes to the destinations under attack towards AS65001 with the NO_PEER community. Given that AS65002 is a transit AS (assume it is for the this exercise), AS65001 would accept the routes from A and B, but would not advertise them towards AS65002. This means G would still be able to reach the destinations behind A and B through AS65001, but the attack traffic would still be dispersed across five entry points, rather than two. There are other mechanisms you could use here; specifically, some providers allow you to set a community that tells them not to advertise a route towards a specific AS, whether than AS is a peer or a customer. You should consult with your provider about this, as every provider uses a different set of communities, formatted in slightly different ways—your provider will probably point you to a web page explaining their formatting.

If NO_PEER does not work, it is possible to use NO_ADVERTISE, which blocks the advertisement of the destinations under attack to any of AS65001’s connections of whatever kind. G may well still be able to use the connections to A and B from AS65001 if it is using a default route to reach the Internet at large.

It is, of course, to automate this reaction through a set of scripts—but as always, it is important to keep a short leash on such scripts. Humans need to be alerted to either make the decision to use these communities, or to continue using these communities; it is too easy for a false positive to lead to a real problem.

Of course, this sort of response is also not possible for networks with just one or two connection points to the Internet.

But in all cases, remember that shutting down links the face of DDoS is rarely ever a real solution. You do not want to be reducing your available bandwidth when you are under attack specifically designed to exhaust available bandwidth (or other resources). Rather, if you can, find a way to disperse the attack.

P.S. Yes, I have covered this material before—but I decided to rebuild this post with more in depth information, and to use to kick off a small series on DDoS protection.

DNS Cookies and DDoS Attacks

DDoS attacks, particularly for ransom—essentially, “give me some bitcoin, or we’ll attack your server(s) and bring you down,” seem to be on the rise. While ransom attacks rarely actually materialize, the threat of DDoS overall is very large, and very large scale. Financial institutions, content providers, and others regularly consume tens of gigabits of attack traffic in the normal course of operation. What can be done about stopping, or at least slowing down, these attacks?

To answer, this question, we need to start with some better idea of some of the common mechanisms used to build a DDoS attack. It’s often not effective to simply take over a bunch of computers and send traffic from them at full speed; the users, and the user’s providers, will often notice machine sending large amounts of traffic in this way. Instead, what the attacker needs is some sort of public server that can (and will) act as an amplifier. Sending this intermediate server should cause the server to send an order of a magnitude more traffic towards the attack target. Ideally, the server, or set of servers, will have almost unlimited bandwidth, and bandwidth utilization characteristics that will make the attack appear as close to “normal operation” as possible.

It is at this point that DNS servers often enter the picture. There are thousands (if not tens of thousands) of DNS servers out there, they all have public facing DNS services ripe for exploitation, and they tend to be connected to large/fat pipes. What the attacker can do is send a steady stream of DNS queries that appear to be originating at the target towards the DNS server. The figure below illustrates the concept.

dns-cookies-01

The attacker will carefully examine the DNS table, of course, choosing a large record—the largest TXT record possible is ideal—and send a request for this item, or a series of similar large items, asking the DNS server to send the response to the target. The DNS server, having no idea where the query actually originates, will normally reply with the information requested.

But one DNS server isn’t going to be enough. As shown in the illustration, the attacker can actually send queries to hundreds or thousands of DNS servers; an investment of ten to fifteen packets per second in queries can generate tens of thousands replies each second in return. To ramp the attack up, the attacker can install software (probably through a botnet) that can use hundreds or thousands of hosts to generate packets towards thousands of DNS servers—this kind of amplification can easily overwhelm even the largest edge circuits available.

How can this sort of attack be countered? The key point is to remove amplification wherever possible—in this case, DNS servers seem like a likely point at which the amplification attack can be cut down to size. But how? RFC7873, DNS Cookies, published in May of 2016, provides a weak authentication mechanism to reduce the effectiveness of DNS as an amplification platform.

Essentially, this RFC specifies that each DNS query should include a cookie, or what might be called a nonce. This cookie is calculated using a consistent algorithm (a hash) computed based on several items, such as the IP address of the client (sender), the IP address of the server, etc. The idea is this: the first time a host queries a DNS server, it will send a client cookie. The server can respond with an error, simply ignore the request, or respond with a DNS packet containing the server’s cookie. The rate at which the server hands the cookies out can be rather limited; assuming clients can cache the server’s cookie as well as previous DNS records retrieved from the server, the additional rate limiting should have very little impact on the host’s operation. The next time the client sends a request, it will include this server cookie. The server, on seeing a valid cookie, will reply to the request as normal.

The figure below illustrates.

dns-cookies-02

This process accomplishes two things:

  • Rate limiting responses to “cookie requests” effectively caps the speed at which any given client can send packets with a forged source address to the server and expect a reply. It doesn’t matter how fast the attacker sends packets, it will only receive a cookie making it possible to send a valid request on an infrequent basis.
  • Given the attacker will not be able to successfully forge a cookie for the target, the only thing the attacker can force the DNS server to send to the target is a (purposefully) short error message. These error messages will have nowhere near the impact of the carefully chosen large TXT (and other) records.

This is, as the draft says, a “weak protection,” but it is enough to remove DNS servers out of the realm of “easy targets” for amplification duty in DDoS attacks.