Research: BBR Congestion-Based Congestion Control
Congestion control has proven to be one of the hardest problems to solve in packet based networks. The “easy” way to solve this problem is with admission control, but this “easy” solution is actually quit deceptive; creating the algrorithms and centralized control to manage admission control is much more difficult than it seems. This is why many circuit switched networks just use some form of Time Division Multiplexing (TDM), giving each device connected to the network a single “slot,” and filling empty slots with idle frames, ultimately throwing bandwidth away in the name of simpler computation of fairness.
The problem space has, however, attracted a lot of research. In this post, I’ll be looking at one such effort, a research paper published in the October 2016 edition of ACM Queue describing a system called BBR, a congestion-based congestion control system. At the heart of this system is the concept of the bottleneck link, or bottleneck in the path, which is the lowest bandwidth, highest delay, or perhaps the most congested link in the path between two hosts. The authors use the following figure to describe the current operational point of most congestion control systems, and then the optimal point of operation.
The point they make is that congestion control mechanisms that rely on the loss of a packet, or a series of packets, to detect congestion, operate at the far right side of this figure. This might seem to be optimal from a bandwidth usage perspective, but it is not. Research (referenced in the paper) has shown that the optimal operating point is actually on the left side of this figure. But how can a congestion control mechanism operate here?
One key point is the system must somehow detect both the maximum bandwidth through the bottleneck link, and the average round trip time (RTT, or RTO). This is not as simple as might seem, because the presence of congestion on the link prevents them from being measured at the same time. The following figure illustrates.
If the link is not full enough to cause packets to be queued, the sender is not transmitting packets as quickly as the bottleneck link can forward them. This means the sender is not transmitting at the highest possible bandwidth. On the other hand, if the sender is sending fast enough to cause packets to be queued at the bottleneck link, then the RTT cannot be measured, as the time spent in the queue causes the RTT to be miscalculated.
The solution the BBR system described is to transmit packets at a fairly low rate to determine the RTT. The RTT can most accurately be measured by determining when it increases; this is the point where a queue of packets is starting to build up at the bottelneck link. The rate of transmission at the point where the RTT starts increasing is the maximum bandwidth at the bottleneck link. In order to measure the RTT and the bandwidth at the bottleneck link, the system periodically increases the transmit rate, then decreases it once the RTT increases, dropping it slightly below the just discovered maximum bandwidth in order to drain the queues back off.
This kind of system does not require the addition of new information, but only “better use” of the information already available in the processing of stream based transport protocols, like TCP.
You can find the paper here at the ACM site, or here at the Google research site.