When you think of a Distributed Denial of Service (DDoS) attack, you probably think about an attack which overflows the bandwidth available on a single link; or overflowing the number of half open TCP sessions a device can have open at once, preventing the device from accepting more sessions. In all cases, a DoS or DDoS attack will involve a lot of traffic being pushed at a single device, or across a single link.
- Denial of service attacks do not always require high volumes of traffic
- An intelligent attacker can exploit the long tail of service queues deep in a web application to bring the service down
- These kinds of attacks would be very difficult to detect
But if you look at an entire system, there are a lot of places where resources are scarce, and hence are places where resources could be consumed in a way that prevents services from operating correctly. Such attacks would not need to be distributed, because they could take much less traffic than is traditionally required to deny a service. These kinds of attacks are called tail attacks, because they attack the long tail of resource pools, where these pools are much thinner, and hence much easier to attack.
There are two probable reasons these kinds of attacks are not often seen in the wild. First, they require an in-depth knowledge of the system under attack. Most of these long tail attacks will take advantage of the interaction surface between two subsystems within the larger system. Each of these interaction surfaces can also be attack surfaces if an attacker can figure out how to access and take advantage of them. Second, these kinds of attacks are difficult to detect, because they do not require large amounts of traffic, or other unusual traffic flows, to launch.
The paper under review today, Tail Attacks on Web Applications, discusses a model for understanding and creating tail attacks in a multi-tier web application—the kind commonly used for any large-scale frontend service, such as ecommerce and social media.
Huasong Shan, Qingyang Wang, and Calton Pu. 2017. Tail Attacks on Web Applications. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS ’17). ACM, New York, NY, USA, 1725-1739. DOI: https://doi.org/10.1145/3133956.3133968
The figure below illustrates a basic service of this kind for those who are not familiar with it.
The typical application at scale will have at least three stages. The first stage will terminate the user’s session and render content; this is normally some form of modified web server. The second stage will gather information from various backend services (generally microservices), and pass the information required to build the page or portal to the rendering engine. The microservices, in turn, build individual parts of the page, and rely on various storage and other services to supply the information needed.
If you can find some way to clog up the queue at one of the storage nodes, you can cause every other service along the information path to wait on the prior service to fulfill its part of the job in hand. This can cause a cascading effect through the system, where a single node struggling because of full queues can cause an entire set of dependent nodes to become effectively unavailable, cascading to a larger set of nodes in the next layer up. For instance, in the network illustrated, if an attacker can somehow cause the queues at storage service 1 to fill up, even for a moment, this can cascade into a backlog of work at services 1 and 2, cascading into a backlog at the front-end service, ultimately slowing—or even shutting—the entire service down. The queues at storage service 1 may be the same size as every other queue in the system (although they are likely smaller, as they face internal, rather than external, services), but storage system 1 may be servicing many hundreds, perhaps thousands, of copies of services 1 and 2.
The queues at storage service 1—and all the other storage services in the system—represent a hidden bottleneck in the overall system. If an attacker can, for a few moments at a time, cause these internal, intra-application queue to fill up, the overall service can be made to slow down to the point of being almost unusable.
How plausible is this kind of attack? The researchers modeled a three-stage system (most production systems have more than three stages) and examined the total queue path through the system. By examining the queue depths at each stage, they devised a way to fill the queues at the first stage in the system by sending millibursts of valid sessions requests to the rend engine, or the use facing piece of the application. Even if these millibursts are spread out across the edge of the application, so long as they are all the same kind of requests, and timed correctly, they can bring the entire system down. In the paper, the researchers go further and show that once you understand the architecture of one such system, it is possible to try different millibursts on a running system, causing the same DoS effect.
This kind of attack, because it is built out of legitimate traffic, and it can be spread across the entire public facing edge of an application, would be nearly impossible to detect or counter at the network edge. One possible counter to this kind of attack would be increasing capacity in the deeper stages of the application. This countermeasure could be expensive, as the data must be stored on a larger number of servers. Further, data synchronized across multiple systems will subject to CAP limitations, which will ultimately limit the speed at which the application can run anyway. Operators could also consider fine grained monitoring, which increases the amount of telemetry that must be recovered from the network and processed—another form of monetary tradeoff.