The Hedge 35: Peter Jones and Single Pair Ethernet

When you think of new Ethernet standards, you probably think about faster and optical. There is, however, an entire world of buildings out there with older copper cabling, particularly in the industrial realm, that could see dramatic improvements in productivity if their control and monitoring systems could be moved to IP. In these cases, what is needed is an Ethernet standard that runs over a single copper pair, and yet offers enough speed to support industrial use cases. Peter Jones joins Jeremy Filliben and Russ White to discuss single pair Ethernet.
Understanding DC Fabric Complexity

When I think of complexity, I mostly consider transport protocols and control planes—probably because I have largely worked in these areas from the very beginning of my career in network engineering. Complexity, however, is present in every layer of the networking stack, all the way down to the physical. I recently ran across an interesting paper on complexity in another part of the network I had not really thought about before: the physical plant of a data center fabric.
Quitting Certifications: When?

At what point in your career do you stop working towards new certifications?
Daniel Dibb’s recent post on his blog is, I think, an excellent starting point, but I wanted to add a few additional thoughts to the answer he gives there.
Daniel’s first question is how do you learn? Certifications often represent a body of knowledge people who have a lot of experience believe is important, so they often represent a good guided path to holistically approaching a new body of knowledge. In the professional learning world this would be called a ready-made mental map. There is a counterargument here—certifications are often created by vendors as a marketing tool, rather than as something purely designed for the betterment of the community, or the dissemination of knowledge. This doesn’t mean, however, that certifications are “evil.” It just means you need to evaluate each certification on its own merits.
The Hedge 33: Balazs Varga and DETNET

Balazs Varge joins Alvaro Retana and Russ White on this episode of the Hedge to discuss the working going on in the IETF around deterministic networking. This work is important for applications requiring networks providing low latency and loss.
The Resilience Problem

This quote set me to thinking about how efficiency and resilience might interact, or trade off against one another, in networks. The most obvious extreme cases are two routers connected via a single long-haul link and the highly parallel data center fabrics we build today. Obviously adding a second long-haul link would improve resilience—but at what cost in terms of efficiency? Its also obvious highly meshed data center fabrics have plenty of resilience—and yet they still sometimes fail. Why?
The Hedge 32: Overcommunication

Michael Natkin, over at Glowforge, writes: “That’s a funny thing about our minds. In the absence of information, they fill in the gaps and make up all sorts of plausible things, without the owners of said minds even realizing it is happening.” The answer, he says, is to overcommunicate. Michael joins Eyvonne Sharpe, Tom Ammon, and Russ White on this episode of the Hedge to discuss what it means to overcommunicate.
Reflections on Intent

No, not that kind. 🙂
BGP security is a vexed topic—people have been working in this area for over twenty years with some effect, but we continuously find new problems to address. Today I am looking at a paper called BGP Communities: Can of Worms, which analyses some of the security problems caused by current BGP community usage in the ‘net. The point I want to think about here, though, is not the problem discussed in the paper, but rather some of the larger problems facing security in routing.
The Hedge 31: Network Operator Groups

Many engineers have heard about the wide variety of Network Operator Group (NOG) meetings, from smaller regional organizations through larger multinational ones. What is the value of attending a NOG? How can you convince your business leadership of this value? In this episode of the Hedge Vincent Celindro and Edward McNair join Russ White to consider these questions.
Learning from Failure at Scale

One of the difficulties for the average network operator trying to understand their failure rates and reasons is they just don’t have enough devices, or enough incidents, to make informed observations. If you have a couple of dozen switches, it is often hard to understand how often software defects take a device down versus human error (Mean Time Between Mistakes, or MTBM). As networks become larger, however, more information becomes available, and more interesting observations can be made. A recent paper written in conjunction with Facebook uses information from Facebook’s data center fabrics to make some observations about the rate and severity of different kinds of failures—needless to say, the results are fairly interesting.

