Flowspec and RFC1998?

In a recent comment, Dave Raney asked:

Russ, I read your latest blog post on BGP. I have been curious about another development. Specifically is there still any work related to using BGP Flowspec in a similar fashion to RFC1998. In which a customer of a provider will be able to ask a provider to discard traffic using a flowspec rule at the provider edge. I saw that these were in development and are similar but both appear defunct. BGP Flowspec-ORF https://www.ietf.org/proceedings/93/slides/slides-93-idr-19.pdf BGP Flowspec Redirect https://tools.ietf.org/html/draft-ietf-idr-flowspec-redirect-ip-02.

This is a good question—to which there are two answers. The first is this service does exist. While its not widely publicized, a number of transit providers do, in fact, offer the ability to send them a flowspec community which will cause them to set a filter on their end of the link. This kind of service is immensely useful for countering Distributed Denial of Service (DDoS) attacks, of course. The problem is such services are expensive. The one provider I have personal experience with charges per prefix, and the cost is high enough to make it much less attractive.

Why would the cost be so high? The same reason a lot of providers do not filter for unicast Reverse Path Forwarding (uRPF) failures at scale—per packet filtering is very performance intensive, sometimes requiring recycling the packet in the ASIC. A line card normally able to support x customers without filtering may only be able to support x/2 customers with filtering. The provider has to pay for additional space, power, and configuration (the flowspec rules must be configured and maintained on the customer facing router). All of these things are costs the provider is going to pass on to their customers. The cost is high enough that I know very few people (in fact, so few as to be 0) network operators who will pay for this kind of service.

The second answer is there is another kind of service that is similar to what Dave is asking about. Many DDoS protection services offer their customers the ability to signal a request to the provider to block traffic from a particular source, or to help them manage a DDoS in some other way. This is very similar to the idea of interdomain flowspec, only using a different signaling mechanism. The signaling mechanism, in this case, is designed to allow the provider more leeway in how they respond to the request for help countering the DDoS. This system is called DDoS Open Threats Signaling; you can read more about it at this post I wrote at the ECI Telecom blog. You can also head over to the IETF DOTS WG page, and read through the drafts yourself.

Yes, I do answer reader comments… Sometimes just in email, and sometimes with a post—so comment away, ask questions, etc.

On the ‘web: A new way to deal with DDoS

Most large scale providers manage Distributed Denial of Service (DDoS) attacks by spreading the attack over as many servers as possible, and simply “eating” the traffic. This traffic spreading routine is normally accomplished using Border Gateway Protocol (BGP) communities and selective advertisement of reachable destinations, combined with the use of anycast to regionalize and manage load sharing on inbound network paths. But what about the smaller operator, who may only have two or three entry points, and does not have a large number of servers, or a large aggregate edge bandwidth, to react to DDoS attacks?

I write for ECI about once a month; this month I explain DOTS over there. What to know what DOTS is? Then you need to click on the link above and read the story. 🙂

Reaction: Offensive Destruction of Attack Assets

It is certainly true that DDoS and hacking are on the rise; there have been a number of critical hacks in the last few years, including apparent attempts to alter the outcome of elections. The reaction has been a rising tide of fear, and an ever increasing desire to “do something.” The something that seems to be emerging is, however, not necessarily the best possible “something.” Specifically, governments are now talking about attempting to “wipe out” the equipment used in attacks—

Berlin was studying what legal changes were needed to allow authorities to purge stolen data from third-party servers, and to potentially destroy servers used to carry out cyber attacks. “We believe it is necessary that we are in a position to be able to wipe out these servers if the providers and the owners of the servers are not ready to ensure that they are not used to carry out attacks,” Maassen said. —Reuters

“Wiping out” (destroying?) a server because the owner cannot ensure the server will be used in a way the government agrees with—sounds like a good idea, right? And how do we make certain such laws are not extended to destroy the servers of those who host “hate speech” and “fake news” at some point in the future? Will we have server burnings to match the printing press burnings of yesteryear (like this, or this, or this).

What if the owner of that server is actually the proud owner of a newly minted “connected” television set or toaster, and who does not know enough about technology to secure the device properly? Is it okay to “wipe out” the server then?

The obvious answer to such objections is that the capability to “wipe out a server” will only be used when authorized through the proper channels. Scope creep, however, is always real, and people who work for the government are still people who have desires and fears, and who make mistakes.

Maybe being able to “wipe out” a server remotely, and break into third party networks to erase data you don’t think they should have, is all justified. But there seems to be some dangerous precedent being set here, and this story will not end in a happy place for anyone on the Internet.

Distributed Denial of Service Open Threat Signaling (DOTS)

When the inevitable 2AM call happens—”our network is under attack”—what do you do? After running through the OODA loop (1, 2, 3, 4), used communities to distribute the attack as much as possible, mitigated the attack where possible, and now you realist there little you can do locally. What now? You need to wander out on the ‘net and try to figure out how to stop this thing. You could try to use flowspec, but many providers do not like to support flowspec, because it directly impacts the forwarding performance of their edge boxes. Further, flowspec, used in this situation, doesn’t really work to walk the attack back to its source; the provider’s network is still impact by the DDoS attack.

This is where DOTS comes in. There are four components of DOTS, as shown below (taken directly from the relevant draft)—

The best place to start is with the attack target—that’s you, at 6AM, after trying to chase this thing down for a few hours, panicked because the office is about to open, and your network is still down. Within your network there would also be a DOTS client; this would be a small piece of software running on a virtual machine, or in a container, someplace, for instance. This might be commercially developed, provided by your provider, or perhaps an open source version available off of Git or some other place. The third component is the DOTS server, which resides in the provider’s network. The diagram only shows one DOTS server, but the reality any information about an ongoing DDoS attack would be relayed to other DOTS servers, pushing the mitigation effort as close to the originating host(s) as possible. The mitigator then takes any actions required to slow or eliminate the attack (including using mechanisms such as flowspec).

The DOTS specifications in the IETF are related primarily to the signaling between the client and the server; the remainder of the ecosystem around signaling and mitigation are outside the scope of the working group (at least currently). There are actually two channels in this signaling mechanism, as shown below (again, taken directly from the draft)—

The signal channel carries information about the DDoS attack in progress, requests to mitigate the attack, and other meta information. The information is marshaled into a set of YANG models, and binary encoded into CoAP for efficiency in representation and processing. The information encoded in these models includes the typical five tuple sets expanded to sets—a range of source and destination address, a range of source and destination ports, etc.

The data channel is designed to carry a sample of the DDoS flow(s), so the receiving server can perform further analytics, or even examine the flow to verify the information being transmitted over the signal channel.

How is this different from flowspec mitigation techniques?

First, the signaling runs to a server on the provider side, rather than directly to the edge router. This means the provider can use whatever means might make sense, rather than focusing on performance impacting filters applied directly by a customer. This also means some intelligence can be built into the server to prevent DOTS from becoming a channel for attacks (an attack surface), unlike flowspec.

Second, DOTS is designed with third party DDoS mitigation services in mind. This means that your upstream provider is not necessarily the provider you signal to using DOTS. You can purchase access from one provider, and DDoS mitigation services from another provider.

Third, DOTS is designed to help providers drive the DDoS traffic back to its source (or sources). This allows the provider to gain through the DDoS protection, rather than just the customer. DOTS-like systems have already been deployed by various providers; standardizing the interface between the client and the server will allow the ‘net as a whole to push DDoS back more effectively in coming years.

What can you do to help?

You can ask your upstream and DDoS providers to support DOTS in their services. You can also look for DOTS servers you can look at and test today, to get a better understanding of the technology, and how it might interact with your network. You can ask your vendors(or your favorite open source project) to support DOTS signaling in their software, or you can join with others in helping to develop open source DOTS clients.

You can also read the drafts—

Use cases for DDoS Open Threat Signaling
Distributed Denial of Service (DDoS) Open Threat Signaling Requirements
Distributed-Denial-of-Service Open Threat Signaling (DOTS) Architecture
Distributed Denial-of-Service Open Threat Signaling (DOTS) Signal Channel

Each of these drafts can use readers and suggestions in specific areas, so you can join the DOTS mailing list and participate in the discussion. You can keep up with the DOTS WG page at the IETF to see when new drafts are published, and make suggestions on those, as well.

DOTS is a great idea; it is time for the Internet to have a standardized signaling channel for spotting and stopping DDoS attacks.

Blocking a DDoS Upstream

In the first post on DDoS, I considered some mechanisms to disperse an attack across multiple edges (I actually plan to return to this topic with further thoughts in a future post). The second post considered some of the ways you can scrub DDoS traffic. This post is going to complete the basic lineup of reacting to DDoS attacks by considering how to block an attack before it hits your network—upstream.

The key technology in play here is flowspec, a mechanism that can be used to carry packet level filter rules in BGP. The general idea is this—you send a set of specially formatted communities to your provider, who then automagically uses those communities to create filters at the inbound side of your link to the ‘net. There are two parts to the flowspec encoding, as outlined in RFC5575bis, the match rule and the action rule. The match rule is encoded as shown below—

There are a wide range of conditions you can match on. The source and destination addresses are pretty straight forward. For the IP protocol and port numbers, the operator sub-TLVs allow you to specify a set of conditions to match on, and whether to AND the conditions (all conditions must match) or OR the conditions (any condition in the list may match). Ranges of ports, greater than, less than, greater than or equal to, less than or equal to, and equal to are all supported. Fragments, TCP header flags, and a number of other header information can be matched on, as well.

Once the traffic is matched, what do you do with it? There are a number of rules, including—

  • Controlling the traffic rate in either bytes per second or packets per second
  • Redirect the traffic to a VRF
  • Mark the traffic with a particular DSCP bit
  • Filter the traffic

If you think this must be complicated to encode, you are right. That’s why most implementations allow you to set pretty simple rules, and handle all the encoding bits for you. Given flowspec encoding, you should just be able to detect the attack, set some simple rules in BGP, send the right “stuff” to your provider, and watch the DDoS go away. …right… If you have been in network engineering since longer than “I started yesterday,” you should know by now that nothing is ever that simple.

If you don’t see a tradeoff, you haven’t looked hard enough.

First, from a provider’s perspective, flowspec is an entirely new attack surface. You cannot let your customer just send you whatever flowspec rules they like. For instance, what if your customer sends you a flowspec rule that blocks traffic to one of your DNS servers? Or, perhaps, to one of their competitors? Or even to their own BGP session? Most providers, to prevent these types of problems, will only apply any flowspec initiated rules to the port that connects to your network directly. This protects the link between your network and the provider, but there is little way to prevent abuse if the provider allows these flowspec rules to be implemented deeper in their network.

Second, filtering costs money. This might not be obvious at a single link scale, but when you start considering how to filter multiple gigabits of traffic based on deep packet inspection sorts of rules—particularly given the ability to combine a number of rules in a single flowspec filter rule—filtering requires a lot of resources during the actual packet switching process. There is a limited number of such resources on any given packet processing engine (ASIC), and a lot of customers who are likely going to want to filter. Since filtering costs the provider money, they are most likely going to charge for flowspec, limit which customers can send them flowspec rules (generally grounded in the provider’s perception of the customer’s cluefulness), and even limit the number of flowspec rules that can be implemented at any given time.

There is plenty of further reading out there on configuring and using flowspec, and it is likely you will see changes in the way flowspec is encoded in the future. Some great places to start are—

One final thought as I finish this post off. You should not just rely on technical tools to block a DDoS attack upstream. If you can figure out where the DDoS is coming from, or track it down to a small set of source autonomous systems, you should find some way to contact the operator of the AS and let them know about the DDoS attack. This is something Mara and I will be covering in an upcoming webinar over at ipspace.net—watch for more information on this as we move through the summer.

Mitigating DDoS

[time-span]

Your first line of defense to any DDoS, at least on the network side, should be to disperse the traffic across as many resources as you can. Basic math implies that if you have fifteen entry points, and each entry point is capable of supporting 10g of traffic, then you should be able to simply absorb a 100g DDoS attack while still leaving 50g of overhead for real traffic (assuming perfect efficiency, of course—YMMV). Dispersing a DDoS in this way may impact performance—but taking bandwidth and resources down is almost always the wrong way to react to a DDoS attack.

But what if you cannot, for some reason, disperse the attack? Maybe you only have two edge connections, or if the size of the DDoS is larger than your total edge bandwidth combined? It is typically difficult to mitigate a DDoS attack, but there is an escalating chain of actions you can take that often prove useful. Let’s deal with local mitigation techniques first, and then consider some fancier methods.

  • TCP SYN filtering: A lot of DDoS attacks rely on exhausting TCP open resources. If all inbound TCP sessions can be terminated in a proxy (such as a load balancer), the proxy server may be able to screen out half open and poorly formed TCP open requests. Some routers can also be configured to hold TCP SYNs for some period of time, rather than forwarding them on to the destination host, in order to block half open connections. This type of protection can be put in place long before a DDoS attack occurs.
  • Limiting Connections: It is likely that DDoS sessions will be short lived, while legitimate sessions will be longer lived. The different may be a matter of seconds, or even milliseconds, but it is often enough to be a detectable difference. It might make sense, then, to prefer existing connections over new ones when resources start to run low. Legitimate users may wait longer to connect when connections are limited, but once they are connected, they are more likely to remain connected. Application design is important here, as well.
  • Aggressive Aging: In cache based systems, one way to free up depleted resources quickly is to simply age them out faster. The length of time a connection can be held open can often be dynamically adjusted in applications and hosts, allowing connection information to be removed from memory faster when there are fewer connection slots available. Again, this might impact live customer traffic, but it is still a useful technique when in the midst of an actual attack.
  • Blocking Bogon Sources: While there is a well known list of bogon addresses—address blocks that should never be routed on the global ‘net—these lists should be taken as a starting point, rather than as an ending point. Constant monitoring of traffic patterns on your edge can give you a lot of insight into what is “normal” and what is not. For instance, if your highest rate of traffic normally comes from South America, and you suddenly see a lot of traffic coming from Australia, either you’ve gone viral, or this is the source of the DDoS attack. It isn’t alway useful to block all traffic from a region, or a set of source addresses, but it is often useful to use the techniques listed above more heavily on traffic that doesn’t appear to be “normal.”

There are, of course, other techniques you can deploy against DDoS attacks—but at some point, you are just not going to have the expertise or time to implement every possible counter. This is where appliance and service (cloud) based services come into play. There are a number of appliance based solutions out there to scrub traffic coming across your links, such as those made by Arbor. The main drawback to these solutions is they scrub the traffic after it has passed over the link into your network. This problem can often be resolved by placing the appliance in a colocation facility and directing your traffic through the colo before it reaches your inbound network link.

There is one open source DDoS scrubbing option in this realm, as well, which uses a combination of FastNetMon, InfluxDB, Grefana, Redis, Morgoth, and Bird to create a solution you can run locally on a spun VM, or even bare metal on a self built appliance wired in between your edge router and the rest of the network (in the DMZ). This option is well worth looking at, if not to deploy, but to better understand how the kind of dynamic filtering performed by commercially available appliances works.

If the DDoS must be stopped before it reached your edge link, and you simply cannot handle the volume of the attacks, then the best solution might be a cloud based filtering solution. These tend to be expensive, and they also tend to increase latency for your “normal” user traffic in some way. The way these normally work is the DDoS provider advertises your routes, or redirects your DNS address to their servers. This draws all your inbound traffic into their network, which it is scrubbed using advanced techniques. Once the traffic is scrubbed, it is either tunneled or routed back to your network (depending on how it was captured in the first place). Most large providers offer scrubbing services, and there are several public offerings available independent of any upstream you might choose (such as Verisign’s line of services).

A front line defense against DDoS is to place your DNS name, and potentially your entire site, behind a DDoS detection and mitigation DNS service and/or content distribution network. For instance, CloudFlare is a widely used service that not only proxies and caches your web site, it also protect you against DDoS attacks.