Policing, Shaping, and Performance
Policing traffic and shaping traffic are two completely different things, but it is hard to know, in the wild, what the impact of one or the other will have on a particular traffic flow, or on the performance of applications in general. While the paper under review here, An Internet-Wide Analysis of Traffic Policing, is largely focused on the global ‘net, specifically from a content provider’s perspective, it contains lessons for just about every network operator who needs to manage Quality of Service (QoS) in a sane and meaningful way.
Traffic policing involves setting up a queue with a pool of tokens. For some unit of traffic—assume a packet here—received, a token is consumed. When a packet is transmitted, the token is added back to the pool. If the pool is sized correctly, short bursts in the traffic stream will be allowed through, but if the application attempts to establish a session using more bandwidth than the policer allows, the packets will be dropped. The idea sounds good in theory, but it does not seem to work out well in practice.
To understand why, it is important to examine the behavior of TCP, the stream protocol used by most applications. TCP uses a slow start mechanism that attempts to find the largest window, and hence the highest bandwidth utilization, possible between the transmitter and receiver. The window size is increased fairly rapidly until a packet is dropped. The transmitter then backs off the window size, slowly increasing again until the transmitter reaches a point where bandwidth is maximized, and only a minimal number of packets are dropped (ideally none). The chart below illustrates this process.
Policing is supposed to emulate the effect of a link that is lower bandwidth than the actual link by dropping traffic that exceeds what the policer is configured to allow. The problem is found in the initial description of what a policer does: it allows the stream to burst until the tokens run out. When the TCP stream first starts, then, a policer will allow the TCP slow start process to open the window much higher than the policer’s configured bandwidth. Once the tokens run out, the policer will drop packets until there are tokens in the pool again, which effectively lowers the effective bandwidth. From TCP’s perspective, a policer is a link with a constantly changing link bandwidth.
TCP, in effect, attempts to burst in order to find the maximum bandwidth. A policer treats this burst as a temporary condition, bringing the flow back under its bandwidth limit after some amount of “reasonable burst.” What TCP treats as an attempt to find the optimal flow rate (window size), the policer interprets as a set of bursts requiring dropped packets. The result of this rather bad interaction is very poor performance. The paper reports that up to 20% of packets can be dropped in a policed flow, causing major performance problems. Given the measurements were taken from video servers, the authors note there is a discernible impact on the quality of video across policed links. Impact levels of this kind indicate policing will probably have a bad effect on just about any sort of application that relies on TCP transport services.
Given this, should network operators configure policing? Is it counterproductive? The answer cannot be to eliminate policing, as operators need some way to manage which applications receive particular percentages of the available bandwidth. If the problem is the interaction of TCP and the policer, perhaps some other QoS mechanism can be combined with policing to provide a better balance between controlling load and allowing TCP to operate effectively.
The authors tried various mechanisms to this end, including modifying the TCP stack in the server. This is not going to be a generally available solution, so the authors sought out some other solution that could be implemented in “standard hardware.” What they found is that by placing a traffic shaper in line with a policer, the bad interaction between TCP and the policer could largely be mitigated. The shaper smooths the bursts out, so the policer does not end up taking such drastic action. If policing is being configured, the burst size should be small, rather than large, so TCP is more effective at finding the right window size in the face of the inevitable packet drops.