CONTENT TYPE
The Hedge 40: Greg Ferro and the Path from Automated to Autonomic
The indomitable Greg Ferro joins this episode of the Hedge to talk about the path from automated to autonomic, including why you shouldn’t put everything into “getting automation right,” and why you still need to know the basics even if we reach a completely autonomic world.
Measuring the Core
This last week I was a guest on the TechSequences podcast with Leslie and Alexa discussing the centralization of the routed infrastructure in the ‘net. When that episode posts, I’ll cross post it here (but, of course, you should really just subscribe to their podcast, as they always have interesting guests—I’ll have Leslie and Alexa on the Hedge at some point, as well). The topic is related to this post on CircleID about the death of transit, which was a reaction to Geoff Huston’s article on the death of transit some time before.
All that to say… while reading through some research papers this week, I ran into a recent (2018) paper where Carisimo et al. try out different ways of measuring which autonomous systems belong to the “core” of the ‘net. They went about this by taking a set of AS’ “everyone” acknowledges to be “part of the core,” and then trying to find some measurement that successfully describes something all of them have in common.
The result is the k-metric, which measures the connectivity of an AS’ peers. If an AS has peers who are just as connected as they are, then k-metric is high. Otherwise, the k-metric is low. It does make sense this measure would be able to pick out “core” AS’, because it picks out the set of most highly interconnected AS’ in the ‘net.
Once they determined the k-metric is a good way to determine which AS’ are in the core of the ‘net, they calculated the membership of the core over time. Their graph is below.

The way the chart is laid out is a little difficult to see, but the green is transit providers and the blue is content providers. Certainly enough, the percentage of content providers in the core of the ‘net, in terms of sheer connectivity, has increased over time. These same content providers now account for some 80% (or more?) of the traffic on the ‘net. All this means is the centralization of content is visible in objective measurements, so its a real thing. Content providers are currently “only” 20% of the core but given their traffic levels this is a much bigger deal than it seems. There are many parts of the world where the population or access density is not high enough for large content providers to justify building out so they touch the last mile. If communities build out last mile optical networks, however, its likely these large content providers will consume ever-larger percentages of the “core” AS’.
The Hedge 39: Dan York and Open Standards Everywhere
The Internet Society exists to support the growth of the global ‘net across the world by working with stakeholders, building local connectivity like IXs and community based networks, and encouraging the use of open standards. On this episode of the Hedge, Dan York joins us to talk about the Open Standards Everywhere project which is part of the Internet Society. More information about Open Standards Everywhere can be found—
Is QUIC really Quicker?
QUIC is a relatively new data transport protocol developed by Google, and currently in line to become the default transport for the upcoming HTTP standard. Because of this, it behooves every network engineer to understand a little about this protocol, how it operates, and what impact it will have on the network. We did record a History of Networking episode on QUIC, if you want some background.
In a recent Communications of the ACM article, a group of researchers (Kakhi et al.) used a modified implementation of QUIC to measure its performance under different network conditions, directly comparing it to TCPs performance under the same conditions. Since the current implementations of QUIC use the same congestion control as TCP—Cubic—the only differences in performance should be code tuning in estimating the round-trip timer (RTT) for congestion control, QUIC’s ability to form a session in a single RTT, and QUIC’s ability to carry multiple streams in a single connection. The researchers asked two questions in this paper: how does QUIC interact with TCP flows on the same network, and does UIC perform better than TCP in all situations, or only some?
To answer the first question, the authors tried running QUIC and TCP over the same network in different configurations, including single QUIC and TCP sessions, a single QUIC session with multiple TCP sessions, etc. In each case, they discovered that QUIC consumed about 50% of the bandwidth; if there were multiple TCP sessions, they would be starved for bandwidth when running in parallel with the QUIC session. For network folk, this means an application implemented using QUIC could well cause performance issues for other applications on the network—something to be aware of. This might mean it is best, if possible, to push QUIC-based applications into a separate virtual or physical topology with strict bandwidth controls if it causes other applications to perform poorly.
Does QUIC’s ability to consume more bandwidth mean applications developed on top of it will perform better? According to the research in this paper, the answer is how many balloons fit in a bag? In other words, it all depends. QUIC does perform better when its multi-stream capability comes into play and the network is stable—for instance, when transferring variably sized objects (files) across a network with stable jitter and delay. In situations with high jitter or delay, however, TCP consistently outperforms QUIC.
TCP outperforming QUIC is a bit of a surprise in any situation; how is this possible? The researchers used information from their additional instrumentation to discover QUIC does not tolerate out-of-order packet delivery very well because of its fast packet retransmission implementation. Presumably, it should be possible to modify these parameters somewhat to make QUIC perform better.
This would still leave the second problem the researchers found with QUIC’s performance—a large difference between its performance on desktop and mobile platforms. The difference between these two comes down to where QUIC is implemented. Desktop devices (and/or servers) often have smart NICs which implement TCP in the ASIC to speed packet processing up. QUIC, because it runs in user space, only runs on the main processor (it seems hard to see how a user space application could run on a NIC—it would probably require a specialized card of some type, but I’ll have to think about this more). The result is that QUIC’s performance depends heavily on the speed of the processor. Since mobile devices have much slower processors, QUIC performs much more slowly on mobile devices.
QUIC is an interesting new transport protocol—one everyone involved in designing or operating networks is eventually going to encounter. This paper gives good insight into the “soul” of this new protocol.
The Hedge 38: Evan Knox and Personal Marketing
Personal branding and marketing are two key topics that surface from time to time, but very few people talk about how to actually do these things. For this episode of the Hedge, Evan Knox from Caffeine Marketing to talk about the importance of personal marketing and branding, and some tips and tricks network engineers can follow to improve their personal brand.
To Route or Not?
When you are building a data center fabric, should you run a control plane all the way to the host? This is question I encounter more often as operators deploy eVPN-based spine-and-leaf fabrics in their data centers (for those who are actually deploying scale-out spine-and-leaf—I see a lot of people deploying hybrid sorts of networks designed as “mini-hierarchical” designs and just calling them spine-and-leaf fabrics, but this is probably a topic for another day). Three reasons are generally given for deploying the control plane all on the hosts attached to the fabric: faster down detection, load sharing, and traffic engineering. Let’s consider each of these in turn.
Faster Down Detection. There’s no simple way for ToR switches to determine when the connection to a host has failed, whether the host is single or dual-homed. Somehow the set of routes reachable through the host must be related to the interface state, or some underlying fast hello state (such as BFD), so that if a link fails the ToR knows to pull the correct set of routes from the routing table. It’s simpler to just let the host itself advertise the correct reachability information; when the link fails, the routing session will fail, and the correct routes will automatically be withdrawn.
Load Sharing. While this only applies to hosts with two connections into the fabric (dual-homed hosts), this is still an important use case. If a dual-homed host only has two default routes to work from, the host is blind to network conditions, and can only load share equally across the available paths. Equal load sharing, however, may not be ideal in all situations. If the host is running routing, it is possible to inject more intelligence into the load sharing between the upstream links.
Traffic Engineering. Or traffic shaping, steering, etc. In some cases, traffic engineering requires injecting a label or outer header onto the packet as it enters the fabric. In others, more specific routes might be sent along one path and not another to draw specific kinds of traffic through a more optimal route in the fabric. This kind of traffic engineering is only possible if the control plane is running on the host.
All these reasons are well and good, but they all assume something that should be of great interest to the network designer: which control plane are we talking about?
Most DC fabric designs I see today assume there is a single control plane running on the fabric—generally this single control plane is BGP, and it’s being used both to provide basic IP connectivity through the fabric (the infrastructure underlay control plane) and to provide tunneled overlay reachability (the infrastructure overlay control plane—generally eVPN).
This entangling of the infrastructure underlay and overlay has always seemed, to me, to be less than ideal. When I worked on large-scale transit provider networks in my more youthful days, we intentionally designed networks that separated customer routes from infrastructure routes. This created two separate failure and security domains in the network, as well as dividing the telemetry data in ways that allowed faster troubleshooting of common problems.
The same principles should apply in a DC fabric—after all, the workloads are essentially customers of the fabric, while the basic underlay connectivity counts as infrastructure. The simplest way to adopt this sort of division of labor is the same way large-scale transit providers did (and do)—use two different routing protocols for the underlay and overlay. For instance, IS-IS or RIFT for the underlay and eVPN using BGP for the overlay.
If you move to two layers of control plane, the question above becomes a bit more nuanced—should the overlay control plane run on the hosts? Should the underlay control plane run on the hosts?
For faster down detection—for those hosts that need faster down detection, BFD tied to IGP neighbor state can remove the correct nexthop from the local routing table at a ToR, causing the correct reachable destinations to be withdrawn. Alternatively, the host can run an instance of the overlay control plane, which allows it to advertise and withdraw “customer routes” directly. In neither case is the underlay control plane required to run on the host.
For load sharing and traffic engineering—if something like SRm6, or even other more traditional forms of traffic engineering, the information needed will be carried in the overlay rather than the underlay—so the underlay routing protocol does not need to run on the host.
On the other side of the coin, not running the underlay protocol on the host can help the overall network security posture. Assume a public facing host connected to the fabric is somehow pwned… If the host is running the underlay protocol, its pretty simple to DoS the entire fabric to take it down, or to inject incorrect routing information. If the overlay is configured correctly, however, only the virtual topology which the host has access to can be impacted by an attack—and if microsegmentation is deployed, that damage can be minimized as well.
From a complexity perspective, running the underlay control plane on the host dramatically increases the amount of state the host must maintain; there is no effective filter you can run to reduce state on the host without destroying some of the advantages gained by running the underlay control plane there. On the other hand, the ToR can be configured to filter routing information the host receives, controlling the amount of state the host needs to manage.
Control plane on the host or not? This is one of those questions where properly modularized and layered network design can make a big difference in what the right answer should be.
The Hedge 37: Stephane Bortzmeyer and DNS Privacy
In this episode of the Hedge, Stephane Bortzmeyer joins Alvaro Retana and Russ White to discuss draft-ietf-dprive-rfc7626-bis, which “describes the privacy issues associated with the use of the DNS by Internet users.” Not many network engineers think about the privacy implications of DNS, a important part of the infrastructure we all rely on to make the Internet work.
