TCP, Congestion Control, and Buffer Bloat

Cardwell, Neal, Yuchung Cheng, C. Stephen Gunn, Soheil Hassas Yeganeh, and Van Jacobson. “BBR: Congestion-Based Congestion Control.” Queue 14, no. 5 (October 2016): 50:20–50:53. doi:10.1145/3012426.3022184.

Article available here
Slides available here

In the “old days,” packet loss was a major problems; so much so that just about every routing protocol has a number of different mechanisms to ensure the reliable delivery of packets. For instance, in IS-IS, we have—

  1. Local reliability between peers using CSNPs and PSNPs
  2. On some links, a periodic check using CSNPs to ensure no packets were dropped
  3. Acknowledgements for packets on transmission
  4. Periodic timeouts and retransmissions of LSPs

It’s not that early protocol designers were dumb, it’s that packet loss was really this much of a problem. Congestion in the more recent sense was not even something you would not have even thought of; memory was expensive, so buffers were necessarily small, and hence a packet would obviously be dropped before it was buffered for any amount of time. TCP’s retransmission mechanism, the parameters around the window size, and the slow start mechanism, were designed to react to packet drops. Further, it might be obvious to think that any particular stream might provide more bandwidth if it uses the maximum available bandwidth, hence keeping the buffers full at every node along the path.

The problem is: all of these assumptions are wrong today. Buffers are cheap, and hence tend to be huge (probably not a good thing, actually), so packets tend to be buffered rather than dropped. Further, the idea that the best use of a link comes when a stream uses as much of it as possible has been proven wrong. So what is the solution?

One possible solution is to rebuild the TCP window size and slow start calculations to account for something other than packet drops. What will produce the best results? The authors of this paper argue the correct measures are the delay across the entire path, which they call BBR for Bottleneck Bandwidth and Round trip time. The thesis of the paper is that if the sender could estimate the delay across the link and the smallest bandwidth link in the path, then the sender can transmit packets at a rate that will just fill the lowest bandwidth link on the path, and hence as the actual maximum rate possible along this path. They illustrate the concept like this—

Mechanisms that focus on preventing packets from dropping assume the operational point with the highest throughput is at the right hand vertical line, just where packets start dropping. The reality, however, is that the operational point with the highest throughput is at the left hand vertical line, which is just where packets start being buffered. The authors have developed a new formula for calculating not only the window size in TCP, but also when to send a packet. Essentially, they are using the RTT, along with an estimate of the minimum bandwidth along the link derived from the delivery rate of packets. The delivery rate is calculated from the rate at which packets are delivered, as witnessed by ack’s, along a particular time period.

The result is a system of windowing and send rate that maximizes the throughout while minimizing buffering along the path. The following figure, from the paper, shows the buffering along the path for the best known TCP windowing algorithm compared to BBR.

You can see that TCP adjusts its send rate so the queues on slowest link fill, and then tries to overflow the link buffers every few seconds to ‘test’ for available buffer space. Since TCP is testing mainly for dropped packets, available buffer on the slowest link appears to be available bandwidth. The green line represents the buffer utilization of BBR; the buffer is filled in order to detect the available bandwidth, and then TCP with BBR drops to using almost no buffer. As contrary as this might seem, the result is BBR transfers data faster using TCP than even the most advanced windowing technique available can.

This research illustrates something that network engineers and application developers need to get used to—the network works better if we build networks and applications that work together. Rather than assuming the network’s job is simply not to drop packets, BBR takes a more intelligent direction, assuming that while the network needs to be able to handle microbursts, and not drop packets, it is the job of the application to properly figure out the network’s limits and try to live within them.

Interesting work, indeed; you should read the entire paper.


  1. Renato Gentil on 21 February 2017 at 7:23 pm

    The challenge here is, how do you build a network thiking on this ? Most of the times the network is already built, there are new applications servers coming into the network, you only have to make sure there will be enough bw, connectivity, etc. In this case, how would you design the network ?

    • Russ on 23 February 2017 at 7:11 pm

      The research folks working on this have been testing this with existing hardware — it really just wants changed TCP stacks, rather than a completely new network.