The Hedge 41: Centralized Architectures with Jari Arkko

Consolidation is a well-recognized trend in the Internet ecosystem—but what does this centralization mean in terms of distributed systems, such as the DNS? Jari Arkko joins this episode of the Hedge, along with Alvaro Retana, to discuss the import and impact of centralization on the Internet through his draft, draft-arkko-arch-infrastructure-centralisation.

download

Research: Off-Path TCP Attacks

I’s fnny, bt yu cn prbbly rd ths evn thgh evry wrd s mssng t lst ne lttr. This is because every effective language—or rather every communication system—carried enough information to reconstruct the original meaning even when bits are dropped. Over-the-wire protocols, like TCP, are no different—the protocol must carry enough information about the conversation (flow data) and the data being carried (metadata) to understand when something is wrong and error out or ask for a retransmission. These things, however, are a form of data exhaust; much like you can infer the tone, direction, and sometimes even the content of conversation just by watching the expressions, actions, and occasional word spoken by one of the participants, you can sometimes infer a lot about a conversation between two applications by looking at the amount and timing of data crossing the wire.

The paper under review today, Off-Path TCP Exploit, uses cleverly designed streams of packets and observations about the timing of packets in a TCP stream to construct an off-path TCP injection attack on wireless networks. Understanding the attack requires understanding the interaction between the collision avoidance used in wireless systems and TCP’s reaction to packets with a sequence number outside the current window.

Beginning with the TCP end of things—if a TCP packet is received with a window falling outside the current window, TCP implementations will send a duplicate of the last ACK it sent back to the transmitter. From the Wireless network side of things, only one talker can use the channel at a time. If a device begins transmitting a packet, and then hears another packet inbound, it should stop transmitting and wait some random amount of time before trying to transmit again. These two things can be combined to guess at the current window size.

Assume an attacker sends a packet to a victim which must be answered, such as a probe. Before the victim can answer, the attacker than sends a TCP segment which includes a sequence number the attacker thinks might be within the victim’s receive window, sourcing the packet from the IP address of some existing TCP session. Unless the IP address of some existing session is used in this step, the victim will not answer the TCP segment. Because the attacker is using a spoofed source address, it will not receive the ACK from this segment, so it must find some other way to infer if an ACK was sent by the victim.

How can the attacker infer this? After sending this TCP sequence, the attacker sends another probe of some kind to the victim which must be answered. If the TCP segment’s sequence number is outside the current window, the victim will attempt to send a copy of its previous ACK. If the attacker times things correctly, the victim will attempt to send this duplicate ACK while the attacker is transmitting the second probe packet; the two packets will collide, causing the victim to back off, slowing the receipt of the probe down a bit from the attacker’s perspective.

If the answer to the second probe is slower than the answer to the first probe, the attacker can infer the sequence number of the spoofed TCP segment is outside the current window. If the two probes are answered in close to the same time, the attacker can infer the sequence number of the spoofed TCP segment is within the current window.

Combining this information with several other well-known aspects of widely deployed TCP stacks, the researchers found they could reliably inject information into a TCP stream from an attacker. While these injections would still need to be shaped in some way to impact the operation of the application sending data over the TCP stream, the ability to inject TCP segments in this way is “halfway there” for the attacker.

There probably never will be a truly secure communication channel invented that does not involve encryption—the data required to support flow control and manage errors will always provide enough information to an attacker to find some clever way to break into the channel.

Measuring the Core

This last week I was a guest on the TechSequences podcast with Leslie and Alexa discussing the centralization of the routed infrastructure in the ‘net. When that episode posts, I’ll cross post it here (but, of course, you should really just subscribe to their podcast, as they always have interesting guests—I’ll have Leslie and Alexa on the Hedge at some point, as well). The topic is related to this post on CircleID about the death of transit, which was a reaction to Geoff Huston’s article on the death of transit some time before.

All that to say… while reading through some research papers this week, I ran into a recent (2018) paper where Carisimo et al. try out different ways of measuring which autonomous systems belong to the “core” of the ‘net. They went about this by taking a set of AS’ “everyone” acknowledges to be “part of the core,” and then trying to find some measurement that successfully describes something all of them have in common.

The result is the k-metric, which measures the connectivity of an AS’ peers. If an AS has peers who are just as connected as they are, then k-metric is high. Otherwise, the k-metric is low. It does make sense this measure would be able to pick out “core” AS’, because it picks out the set of most highly interconnected AS’ in the ‘net.

Once they determined the k-metric is a good way to determine which AS’ are in the core of the ‘net, they calculated the membership of the core over time. Their graph is below.

The way the chart is laid out is a little difficult to see, but the green is transit providers and the blue is content providers. Certainly enough, the percentage of content providers in the core of the ‘net, in terms of sheer connectivity, has increased over time. These same content providers now account for some 80% (or more?) of the traffic on the ‘net. All this means is the centralization of content is visible in objective measurements, so its a real thing. Content providers are currently “only” 20% of the core but given their traffic levels this is a much bigger deal than it seems. There are many parts of the world where the population or access density is not high enough for large content providers to justify building out so they touch the last mile. If communities build out last mile optical networks, however, its likely these large content providers will consume ever-larger percentages of the “core” AS’.

The Hedge 39: Dan York and Open Standards Everywhere

The Internet Society exists to support the growth of the global ‘net across the world by working with stakeholders, building local connectivity like IXs and community based networks, and encouraging the use of open standards. On this episode of the Hedge, Dan York joins us to talk about the Open Standards Everywhere project which is part of the Internet Society. More information about Open Standards Everywhere can be found—

download

Is QUIC really Quicker?

QUIC is a relatively new data transport protocol developed by Google, and currently in line to become the default transport for the upcoming HTTP standard. Because of this, it behooves every network engineer to understand a little about this protocol, how it operates, and what impact it will have on the network. We did record a History of Networking episode on QUIC, if you want some background.

In a recent Communications of the ACM article, a group of researchers (Kakhi et al.) used a modified implementation of QUIC to measure its performance under different network conditions, directly comparing it to TCPs performance under the same conditions. Since the current implementations of QUIC use the same congestion control as TCP—Cubic—the only differences in performance should be code tuning in estimating the round-trip timer (RTT) for congestion control, QUIC’s ability to form a session in a single RTT, and QUIC’s ability to carry multiple streams in a single connection. The researchers asked two questions in this paper: how does QUIC interact with TCP flows on the same network, and does UIC perform better than TCP in all situations, or only some?

To answer the first question, the authors tried running QUIC and TCP over the same network in different configurations, including single QUIC and TCP sessions, a single QUIC session with multiple TCP sessions, etc. In each case, they discovered that QUIC consumed about 50% of the bandwidth; if there were multiple TCP sessions, they would be starved for bandwidth when running in parallel with the QUIC session. For network folk, this means an application implemented using QUIC could well cause performance issues for other applications on the network—something to be aware of. This might mean it is best, if possible, to push QUIC-based applications into a separate virtual or physical topology with strict bandwidth controls if it causes other applications to perform poorly.

Does QUIC’s ability to consume more bandwidth mean applications developed on top of it will perform better? According to the research in this paper, the answer is how many balloons fit in a bag? In other words, it all depends. QUIC does perform better when its multi-stream capability comes into play and the network is stable—for instance, when transferring variably sized objects (files) across a network with stable jitter and delay. In situations with high jitter or delay, however, TCP consistently outperforms QUIC.

TCP outperforming QUIC is a bit of a surprise in any situation; how is this possible? The researchers used information from their additional instrumentation to discover QUIC does not tolerate out-of-order packet delivery very well because of its fast packet retransmission implementation. Presumably, it should be possible to modify these parameters somewhat to make QUIC perform better.

This would still leave the second problem the researchers found with QUIC’s performance—a large difference between its performance on desktop and mobile platforms. The difference between these two comes down to where QUIC is implemented. Desktop devices (and/or servers) often have smart NICs which implement TCP in the ASIC to speed packet processing up. QUIC, because it runs in user space, only runs on the main processor (it seems hard to see how a user space application could run on a NIC—it would probably require a specialized card of some type, but I’ll have to think about this more). The result is that QUIC’s performance depends heavily on the speed of the processor. Since mobile devices have much slower processors, QUIC performs much more slowly on mobile devices.

QUIC is an interesting new transport protocol—one everyone involved in designing or operating networks is eventually going to encounter. This paper gives good insight into the “soul” of this new protocol.

The Hedge 38: Evan Knox and Personal Marketing

Personal branding and marketing are two key topics that surface from time to time, but very few people talk about how to actually do these things. For this episode of the Hedge, Evan Knox from Caffeine Marketing to talk about the importance of personal marketing and branding, and some tips and tricks network engineers can follow to improve their personal brand.

Evan Knox’s personal site
Caffeine Marketing

download