I’ve been on a bit of a writer’s break after finishing the CCST book, but it’s time to rekindle my “thousand words a day” habit. As always, one part of this is thinking about how I write—is there anything I need to change? Tools, perhaps, or style?
Network engineering is not “going away.” Network engineering is not less important than it was yesterday, last year, or even a decade ago.
But there still seems to be a gap somewhere. There are fewer folks interested than we need. We need more folks who want to work as full-time network engineers, and more folks with network engineering skills diffused within the larger IT community. More of both is the right answer if we’re going to continue building large-scale systems that work. The real lack of enthusiasm for learning network engineering is hurting all of IT, not just network engineering.
How do we bridge this gap? We’re engineers. We solve problems. This seems to be a problem we might be able to solve
Is network engineering still cool?
It certainly doesn’t seem like it, does it? College admissions seem to be down in the network engineering programs I know of, and networking certifications seem to be down, too. Maybe we’ve just passed the top of the curve, and computer networking skills are just going the way of coopering. Let’s see if we can sort out the nature of this malaise and possible solutions. Fair warning—this is going to take more than one post.
As we close out 2023, some random observations about engineering, culture, and life.
Network engineering needs help. I am hearing, from all over the place, that network engineering is “not cool.” There is a dearth of students entering the pipeline. College programs are struggling, and many organizations are struggling with a lack of engineering talent—in fact, I would guess the most common reason for companies to move to “the cloud” is because they cannot find anyone who knows how to build an operate a network any longer.
A few weeks ago, Daniel posted a piece about using different underlay and overlay protocols in a data center fabric. He says:
There is nothing wrong with running BGP in the overlay but I oppose to the argument of it being simpler.
One of the major problems we often face in network engineering—and engineering more broadly—is confusing that which is simple with that which has lower complexity. Simpler things are not always less complex. Let me give you a few examples, all of which are going to be controversial.
From the question pile: Route servers (as opposed to route reflectors) don’t change anything about a BGP route when re-advertising it to a peer, whether iBGP or eBGP. Why don’t route servers cause routing loops (or other problems) in a BGP network?
Route servers are often used by Internet Exchange Points (IXPs) to distribute routes between connected BGP speakers. BGP route servers
- Don’t change anything about a received BGP route when advertising the route to its peers (other BGP speakers)
- Don’t install routes received through BGP into the local routing table
Shouldn’t using route servers in a network—pontentially, at least—cause routing loops or other BGP routing issues?
While RFC9199 (are we really in the 9000’s?) is targeted at large-scale DNS deployments–specifically root zone operators–so it might seem the average operator won’t find a lot of value here.
This is, however, far from the truth. Every lesson we’ve learned in deploying large-scale DNS root servers applies to any other large-scale user-facing service. Internally deployed DNS recursive servers are an obvious instance, but the lessons here might well apply to a scheduling, banking, or any other multi-user application accessed from a lot of places by a lot of different users. There are some unique points in DNS, such as the relatively slower pace of database synchronization across nodes, but the network-side lessons can still be useful for a lot of applications.
Have you ever taught a kid to ride a bike? Kids always begin the process by shifting their focus from the handlebars to the pedals, trying to feel out how to keep the right amount of pressure on each pedal, control the handlebars, and keep moving … so they can stay balanced. During this initial learning phase, the kid will keep their eyes down, looking at the pedals, the handlebars, and . . . the ground.
After some time of riding, though, managing the pedals and handlebars are embedded in “muscle memory,” allowing them to get their head up and focus on where they’re going rather than on the mechanical process of riding. After a lot of experience, bike riders can start doing wheelies, or jumps, or off-road riding that goes far beyond basic balance.
Network engineer—any kind of engineering, really—is the same way.
How do you balance loyalty to yourself and loyalty to the company you work for?
This might seem like an odd question, but it’s an important component of work/life balance many of us just don’t think about any longer because, as Pete Davis says in Dedicated, we live in a world of infinite browsing. We’re afraid of sticking to one thing because it might reduce our future options. If we dedicate ourselves to something bigger than ourselves, then we might lose control of our direction. In particular, we should not dedicate ourselves to any single company, especially for too long.
My video on BGP convergence elicited a lot of . . . feedback, mainly concerning the difference between convergence in a data center fabric and convergence in the DFZ. Let’s begin here—BGP hunt and the impact of the MRAI are very real in the DFZ. Withdrawing a route can take several minutes.
What about the much more controlled environment of a data center fabric?
Several folks pointed out that the MRAI is often set to 0 in DC fabrics (and many implementations by default). Further, almost all implementations will use an MRAI of 0 for the first received update, holding the second and subsequent advertisements by the MRAI. Several folks also pointed out that all the paths through a DC fabric are the same length, so the second part of the equation is also very small.
These are good points—how do they impact BGP convergence? Let’s use the network below, a small slice of a five-stage butterfly fabric, to think it through. Assume every router is in a different AS, so all the peering sessions are eBGP.