Posts Tagged ‘engineering skills’

Overvaluing Experience

“Sure, great candidate—so long as you just look at the paper. They don’t have any experience.

I wonder how many times I’ve heard this in my networking career—I wonder how many times this has been said about me, in fact, after I’ve walked out of an interview room. We all know the tale of the paper tigers. And we all know how hard it is to land a position without experience, and how hard it is to get experience without landing a job (I have a friend in just this position right now, in fact). But let me tell you a story…

I don’t fish any longer, but I used to fish quite a bit—with my Grandfather. Now, like most Grandfathers, mine was not ordinary. He was, in fact, a County Agent, working for the US Forestry Service. This meant he spent his time blasting ponds, helping farmers figure out how to increase yield on their fields, and growing all sort of odd new types of things on his small plot of land. He also had mules (I’ll tell you about the mules some time later, I’m certain), and an old Forestry Green pickup truck.

Anyway, to return to fishing… He was absolutely no fun to fish with. He would sit down in the chair, cast in, and catch his limit before you could get your first fish on the line. I spent years trying to figure this out. All anyone in my family would tell me was he was a really, really, experience fisherman. I never quite believed this. He would say, “cast in just over there,” and you’d have a hook. Cast anyplace else, and you’d sit there for hours, waiting. I know a lot of experienced fishermen, but his “experience” was something else. In fact, if you ever go out on a lake with a professional or semi-professional fisher-person, you’re going to feel the same way. There’s not just “luck,” and there’s not just “you can case faster and farther than I can.”

Finally, one day I broke down and asked him directly about his fishing abilities. As it turns out, my Grandfather either knew the lay of the land under every lake in the area because he was there before the lake was dammed, or he had actually had a hand in blasting and damming it. In other words, he knew the bottom, the currents, the structure, and all the rest. In the same way, a modern fisher-person will spend hours looking over a map, running around a lake looking at the water temperature in various places, and recording sonar charts to figure out the structure that lays on the bottom of the lake.

Of course, experience matters in fishing. But so does knowledge. And, come to that, so does theory. It’s fine if you have experience fishing, if you don’t know the lake, then you’re not going to get anything on your hook. It’s fine if you know the structure of the lake, but if you don’t understand the way fish act, then you’re still not going to get anything on your hook.

The truth is that it takes all three—experience, knowledge, and theory—to hook a fish. And what’s true of hooking a fish is also true of building a network, or troubleshooting a network, or just about anything else in life. As W. Edwards Deming said—

Experience by itself teaches nothing…Without theory, experience has no meaning. Without theory, one has no questions to ask. Hence without theory there is no learning.

Learn theory, and ask about theory. All the experience in the world isn’t going to teach you anything unless you have a framework from within to ask questions.

New Ways of Thinking

Rule 11 definitely applies to most new technology that’s being hyped (and overhyped) in the networking world. But while some things stay the same, others actually do change. From one of my readers—

Much of the current “trends” in networking are largely just new marketing-speak on old concepts, but some (I’ll propose) are actually new, or require new ways of thinking—which is which, or for a simpler version: how (really) should I change my thinking to reflect the new-networking-order?

This question rebounds through the networking industry today—how, really, do I need to change my thinking to cope with the new networking order? There are, on the face of it, three options available. Let me begin with a story from a prior career to set the stage.

A long time ago, in a galaxy far away, I worked on airfield electronics and communication systems. Things like RADAR systems, wind speed measurement systems, TACANs, VORs, crypto hardware, MUX’s, inverse MUX’s, and even telephone switches. There was a point when I saw something interesting happening where I lived and spent my time. The TACAN and VOR, for instance, were replaced by new gear. Instead of half splitting, measuring things, and replacing individual components, the new gear required me to sit at a terminal and select from a menu system. The menu system drove diagnostics which, in turn, told me which part to replace. I could walk over to the supply point, grab the right part, replace the part, and then do a turn-around, sending the part to the depot for repair.

I thought, at first, this transferred all the work to the depot folks. “I’d really rather be in the depot,” I thought, “taking in all these broken parts and doing the hard troubleshooting and repair.” Then I visited one—it wasn’t quite what I thought it would be. In reality, there was a test bench with a small computer (for the time, at least), and a set of test harnesses. The depot person would grab a part out of the box, plug it into the test harness, and select some menu items that would tell them what part needed to be replaced. They would pull the larger unit apart, replace the part, tossing the old one in the garbage, repackage the now repaired unit, and prepare it for shipment back to some other Air Force base.

As it turned out, all the actual work of troubleshooting and repair had been moved to the designers, leaving essentially nothing but manual labor for the actual tech on site and for the person in the depot. Over my time in airfield electronics, I saw this same pattern repeat—the TACAN, VOR, storm detection RADAR, wind speed, runway visual range, etc.—one by one they fell to the automation monster, as it were.

This is the lesson I originally tried to apply to the networking field and devops, when I first encountered them. The problem is the lesson is only partially true. There are some places where the automation monster is, in fact, taking over. Hyperconvergence seems to be one such place. If it’s true that 80% or more of the enterprise data centers can be replaced with a few racks of hyperconverged equipment, then 80% of the network engineers out there focusing on enterprise scale data center design are destined for some bad news of one sort or another.

There are others, though, where this paradigm just doesn’t fit. So, to wit, what is a network engineer to do? What skills do I need to learn, and how do I cope? There is a part of the story above I didn’t tell you. There were a certain number of engineers who, on seeing the changes taking place, stopped focusing on troubleshooting the parts, and started focusing on troubleshooting the whole. Instead of trying to figure out which part of a landing system was broken, they learned how the landing system worked so they could figure out why it wasn’t working right even when the manual didn’t tell them the right answer. In the real world, there are always things the designer doesn’t think about. The engineer who asks “why” can always solve the problem, even if they don’t know the details of which resister is soldered in where.

With all this mind, let’s return to the three options—

First, you can keep learning things the way we’ve always learned them. You could learn each new product as it comes out, figuring out the configuration interface the vendor gives you, learning the operation of each protocol, etc. This is useful, of course, because vendors always seem to be coming out with new products, and standards bodies always seem to be coming out with new protocols. These bits of information are really needed in day to day life right now, hype or not. In other words, you can move into the depot in my example.

Second, you could opt out, in a sense, and move into managing relationships from the strength of technical knowledge, rather than managing technology directly. You could move all your company’s work to “the cloud,” for instance, and let someone else take care of racking and stacking, while you take care of measuring performance and managing contracts.

Third, you could opt up, in a sense, and move into asking “why,” rather than “what.” In this case, you learn why things work they way they do, so you can learn to see the patterns across multiple technologies, and hence learn how to be a designer. It doesn’t matter if you’re a designer working for a cloud company, or a designer working for an enterprise, or a designer working for a vendor; no matter how automated the world gets, there will still be a need for designers (“meaning makers,” to put it in philosophical terms).

There is one point to make in terms of the options here versus the options I had in electronics when this transition took place: there is a point, in the real world, where there are enough airfields. There’s a time when we’re saturated with advanced airfield landing systems. I don’t know that we’re anywhere near that point for networks. Maybe, some day, we will be, but I don’t think we’re near there yet.

Reflecting on these choices, I don’t think one is “right,” and another “wrong.” It all depends on your bent of mind, what you enjoy, where you are in life, and what’s important to you. Some people just like to dig around in the CLI and API. Others just like to manage contracts and measure performance. Others like to think through how things work and why they work that way.

The skills you’re actually going to need to survive are going to depend on which of these options you choose. This blog, ‘net Work, is all about the third option (if you’ve not figured that out before now, now you know). My direction has been to learn how things work and why, rather than focusing on keeping up with the command line. I won’t say another option is wrong, but if you want to understand the path I’ve taken, then you should stick around and read my posts. I’m still learning how to explain, and I’m still actually learning how to blend knowing the what, the how, and the why in the right measures.

In a future post, I’ll try to give a more definitive answer to the original question, perhaps; this will have to do for now.

Reaction: Should routing react to the data plane?

Over at Packet Pushers, there’s an interesting post asking why we don’t use actual user traffic to detect network failures, and hence to drive routing protocol convergence—or rather, asking why routing doesn’t react to the data place.

I have been looking at convergence from a user perspective, in that the real aim of convergence is to provide a stable network for the users traversing the network, or more specifically the user traffic traversing the network. As such, I found myself asking this question: “What is the minimum diameter (or radius) of a network so that the ‘loss’ of traffic from a TCP/UDP ‘stream’ seen locally would indicate a network outage FASTER than a routing update?”

This is, indeed, an interesting question—and ones that’s highly relevant in our current software defined/drive world. So why not? Let me give you two lines of thinking that might be used to answer this question.

First, let’s consider the larger problem of fast convergence. Anyone who’s spent time in any of my books, or sat through any of my presentations, should know the four steps to convergence—but just in case, let’s cover them again, using a slide from my forthcoming LiveLesson on IS-IS:

Convergence Steps

There are four steps—detect, report, calculate, and install. The primary point the original article makes is that we might be able to detect a failure faster by seeing traffic flows stop than we can through some other form of detection in the control plane. But is this true? Let’s try to build such a system in our “imagination space” (think of your brain as just another VM maybe) and see what we can figure out.

Since event driven mechanisms are (almost) always faster than polling driven mechanisms, let’s construct this system in an event driven way. Let’s say we build a router that keeps track of flows and, when it sees a set of flows from a particular host or destination stop working, takes the route to that destination out of its local table, and then notifies any local routing processes that the destination is down. This is similar to the way a router treats an interface, only at a flow level.

But this idea creates two more questions.

First, how do I know all the flows to this device aren’t supposed to be stopped for some reason? It might seem suspicious, to a router, that every flow being transmitted to a single host would stop at the same time, but it might also mean something as simple as the host processes finishing all their jobs. It could mean all the traffic going to this host has suddenly switched to another path for some reason, so the route is still valid, it’s just no longer used.

How can I tell the difference between these different situations? Let’s say I start monitoring the state of each flow, rather than just the existence, so I can see all the TCP FIN’s, and say, “oh, all these flows are ending, so the host really isn’t going off line, it’s just done working for the moment.” But now, rather than just monitoring flows, I’m actually monitoring the state of those flows. And even with this solution, I still have some more problems to address. For instance, what if all the TCP sessions end just as the host actually crashes? This might seem unlikely, but in a network that’s large enough, all sorts of odd things are going to happen. It’s better to consider the corner cases before they happen, rather than at 2am when you’re trying to resolve a problem caused by one.

In terms of the complexity model, the control plane and the data plane are two different systems, and there is an interaction surface between these two systems. In this proposal, we’re deepening the interaction surface, which means we’re increasing complexity. The tradeoff might be (though it will rarely be) faster convergence, but at the cost of systems that must interact more deeply, and hence become more like one system than two.

Second, how do I know which route to remove? IP networks hide information in order to scale—there’s almost no way to scale to something like the Internet without aggregating information someplace. To put this in other terms, the Internet already doesn’t converge. How much worse would it be if we were keeping track of the state of each host, rather than the state of each subnet? Probably not too well, I’m thinking. I can’t tell the network to stop sending traffic to 2001:db8:0:1::1 if the only route I have in the local table is to 2001:db8:0:1::/64—I’d probably cause more problems than I’m solving.

In terms of the complexity model, adding per host state into the network would actually be adding complexity to the state side of the triangle. I’m not only adding to the amount of state—there are more hosts than subnets—but also to the speed at which the state changes—as hosts will change state more often than subnets will.

Using the complexity model here helps me to see where I’m adding complexity, which is why you should care about understanding complexity as a network designer. In fact, this is why I chose to write a reply to my friends over at Packet Pushers—because this is such a good example of how understanding the complexity tradeoffs can help you analyze a particular question and come to a solid conclusion.

In the end, then, I’d judge this “not the best idea in the world.” I can see where it might be useful in a small range of cases, but it probably isn’t generalizable to the larger networking world.

The Design Mindset (3)

So you’ve spent time asking what, observing the network as a system, and considering what has actually been done in the past. And you’ve spent time asking why, trying to figure out the purpose (or lack of purpose) behind the configuration and design choices made in the past. You’ve followed the design mindset to this point, so now you can jump in and make like a wrecking ball (or a bull in a china shop), changing things so they’re better, and the new requirements you have can fit right in. Right?


As an example, I want to take you back to another part of a story I told here about my early days in the networking world. Before losing the war over Banyan Vines, I actually encountered an obstacle that should have been telling—but I was too much of a noob at the time to recognize it for the warning it really was. At the time, I had written a short paper comparing Vines to Netware; the paper was, perhaps, ten pages long, and I thought it did a pretty good job of comparing the two network operating systems. Heck, I’d even put together a page showing how Vines was a better fit with the (then mandated) OSI model (don’t ask how you mandate a model, it’s an even longer story).

I proudly ran off 20 or 30 copies of my paper, and passed them around to various folks. At a follow-up meeting, one of the more experienced folks said: “This is a great paper, if you’re just trying to justify your choice. If you’re actually trying to compare the two solutions, however, it’s not so great.” Ouch.

The problem is this is precisely what we do most of the time. Just like the “what versus why” in the first two steps, we tend to have solutions we’re comfortable with, we read about in that rag in the airport lounge, we heard about at some conference, or we really want to learn because it’s new and shiny (or it helps us study for a certification—I once worked on a network where the admins changed routing protocols because they wanted to study for the CCIE). We all worry about our skill set going stale, and we’ve all thought, at one time or another, that if we don’t deploy this new technology, we won’t have the skill set we need to find that next job. The fear of being left out drives far too many of our decisions.

How do you prevent yourself from going down one of these side alleys and making a bad decision? Even worse, how do you prevent incurring technical debt by reaching too far in the other direction—not being bold about getting rid of the old clutter that’s ossified into your network over the years and replacing it with newer ideas?

I’m going to suggest, as always, that we return to the complexity model, carefully asking about each side of the triangle to find the answers we need to decide what we should do. Specifically, ask—

  • What state (the amount of state and the speed at which it changes) will this new technology add to the network overall, and to already existing systems in the network?
  • What is being optimized for (here you need to go back to the business drivers), and why?
  • What new interaction surfaces am I creating, or which interaction surfaces am I making deeper? Where am I increasing the dependencies between two existing systems, and where am I adding new ones? Make certain you look for leaky abstractions here, as all abstractions leak in some way.

Remembering this simple rule of thumb might help: If you’ve not found the tradeoffs, you’ve not looked hard enough.

Cultivate questions

Imagine that you’re sitting in a room interviewing a potential candidate for a position on your team. It’s not too hard to imagine, right, because it happens all the time. You know the next question I’m going to ask: what questions will you ask this candidate? I know a lot of people who have “set questions” they use to evaluate a candidate, such as “what is the OSPF type four for,” or “why do some states in the BGP peering session not have corresponding packets?” Since I’ve worked on certifications in the past (like the CCDE), I understand the value of these sorts of questions. They pinpoint the set and scope of the candidate’s knowledge, and they’re easy to grade. But is easy to grade what we should really be after?

Let me expand the scope a little: isn’t this the way we see our own careers? The engineer with the most bits of knowledge stuffed away when they die wins? I probably need to make a sign that says that, actually, just to highlight the humor of such a thought.

The problem is it simply isn’t a good way to measure an engineer, including the engineer reading this post (you). For one thing, as Ethan so eloquently pointed out this week—

The future of IT is not compatible with a network that waits for a human to make a change in accordance with a complex process that takes weeks. And thus it is that the future of networking becomes important. Yes, we grumpy old network engineers know how to build networks in a reliable, predictable way. But that presumes a reliable, predictable demand from business that just isn’t so in many cases.

The question becomes: how do we cultivate this culture among network engineers? It’s nice enough to say, but what do I do? I’m going to make a simple suggestion. Perhaps, in fact, it’s too simple. But it’s worth a try.

Instead of cultivating knowledge, cultivate questions.

Let’s take my current series on security BGP as an example. In part two of the series, from last week, I pointed out that it’s a long slog through the world of security for BGP. You have to ask a lot of questions, beginning with one that doesn’t even seem to make sense: what can I actually secure? Cultivating question asking is important because it helps us to actually feel our way around the problem at hand, understanding it better, and finding new ways to solve it.

Okay, so given we want to encourage engineers to ask more questions—that networks must change, now—and the path to changing networks is changing engineers, what do we do?

First, we need to rethink our certifications around cultivating questions. I think we did a pretty good job with the CCDE here, but the concept of asking if the candidate understands the right question to ask at any given phase of the process is an important skill to measure. I haven’t taken a CCIE lab since 1997, but I remember my proctor asking me if I knew what I was looking for at various times—he was trying to make certain I knew what questions to ask.

Second, we need to start thinking in models, rather than in technologies. I’ve written a lot about this; there’s an entire chapter on models in The Art of Network Architecture, and more on models in Navigating Network Complexity, but we really need to start thinking about why rather than how more often. Why do you think I talk about this stuff so often? It’s not because I don’t know the inner guts of IS-IS (I have an upcoming video series on this being published by Cisco Press), but because I think the ability to turn models and networks into questions is more important than knowing the guts of any particular protocol.

Third, we need to follow Ethan’s lead and start thinking about a broader set of skills and technology.

Finally, maybe—just maybe—we need to start setting up interviews so we can find out if the candidate knows the right questions, rather than focusing on the esoteric game, and whether or not they know all the right answers.

Memorize — or Think?

I have several friends with either photographic, or near photographic, memories. For instance, I work with someone (on the philosophical side of my life) who is just astounding in this respect. If you walk into his office and ask about some concept you’ve just run across, no matter how esoteric, he can give you a rundown of every major book in the field covering the topic, important articles from several journals, and even book chapters that are germane and important. I’ve actually had him point me to the text of a footnote when I asked about a specific concept.

It seems, to me, that the networking industry often focuses on this sort of thing. Quick, can you name the types of line cards available for the Plutobarb CNX1000, how many of each you can put in the chassis, what the backplane speed is, and what the command is to configure OSPF type 3 filters on Tuesdays between three and four o’clock? When we hit this sort of question, and can’t answer it, people look at us like we’re silly or something.


I know, because I’ve been there. I’ve had people ask me the strangest questions in interviews, such as how many spine and leaf boxes it would take to support a specific number of edge ports given a specific set of boxes (sorry folks, I can’t work it out in my head, I need a calculator and hopefully a white board), how many subnets are there in a /22 (these I can calculate in my head for v4, and I’m trying to get to the point of being able to do it in v6), how to configure a specific feature on three different boxes, and a few other odds and ends. In fact, the majority of the interviews I’ve been involved in, across my career, have involved at least one person asking me these sorts of questions.

Most of the time, the answer I really want to blurt out is, “I don’t care.” Most of the time, I don’t. I did, one time, challenge the interviewer to an esoteric match, though. I was asked about something odd to which I didn’t know the answer, so I answered, “I’ll make a deal with you — for every esoteric question you ask, I get to ask one back. If I stump you more than you stump me, I pass the interview. Deal?”

I think I understand why we do this — one of the first temptations of the teacher is to ask questions that are easy to grade, rather than questions that actually cover the required material.

But I also think we need to think more about what we’re trying to build as an engineering community. Should we memorize more, or think more? I know, to some degree, that this is a false dichotomy; you can’t think without something to think about. Memorization is a critical skill. Even as a PhD student in a philosophy program I need to memorize things (in fact, lots of things). If I don’t memorize the author’s line of argument in a text we’re studying, for instance, I’m pretty useless in class discussion.

There needs to be a balance here, though. The question is — how do we reach the balance? What does that balance look like? Of course, the balance isn’t going to be the same for everyone, and every position. Sometimes you just have to develop a habit of actions that will serve you well in times of crisis, for instance. But other times you don’t. How do you know? I have some suggestions here, but feel free to add more in the comments. My suggestions are…

First, ask why do I care? If the job is in a NOC, and the candidate is going to be troubleshooting OSPF adjacency problems at 2AM, they probably need to know the OSPF adjacency process, including things like what multicast addresses OSPF uses for particular reasons, the stages of the process, etc., very well. So first, know what they need to know.

Second, ask how should I ask this? If you’re asking about troubleshooting OSPF adjacencies, do you ask what commands you would use to troubleshoot an OSPF adjacency problem? Or do you ask what the stages of an OSPF adjacency are (in detail)? Or do you set up a broken adjacency and ask the candidate to mock up fixing it? Each of these tries to get to the same information, but in different ways. Which one makes the most sense for your environment? And which one asks more for thinking skills rather than memorization skills?

Let me try to encapsulate these two into a simpler form, though.

What would you allow someone to search for during an interview?

Command line information? Protocol operation details? Protocol operation theory? Or nothing?

Intellectual virtue and the engineer

Plane_crash_into_Hudson_River_(crop)On the 19th of January in 2009, Captain “Sully” Sullenberger glided an Airbus A320 into the Hudson River just after takeoff from LaGuardia airport in New York City. Both engines failed due to multiple bird strikes, so the ditching was undertaken with no power, in a highly populated area. Captain Sullenberger could have attempted to land on one of several large highways, but all of these tend to have heavy traffic patterns; he could not make it to any airport with the power he had remaining, so he ditched the plane in the river. Out of the 155 passengers on board, only one needed overnight hospitalization.

There are a number of interesting things about this story, but there is one crucial point that applies directly to life at large, and engineering in detail. Here’s a simple question that exposes the issue at hand—

Do you think the Captain had time to read the manual while the plane was gliding along in the air after losing both engines? Or do you think he just knew what to do?

Way back in the mists of time, a man named Aristotle struggled over the concept of ethics. Not only was he trying to figure out where ethics come from (normative ethics), he was also trying to figure out how to transfer those normative ethics to individual people (aretaic ethics). Aristotle was, above all, a practical man; he wanted to know how any given person could live the “good life,” which he defined as “in accordance with the normative ethics that produce the greatest happiness for an individual.”

What does Aristotle have to do with Captain Sullenberger gliding an airplane into a clean ditch in 2009? Let’s go back to the question just above — do you think he had time to read the manual before ditching that airplane? I’m pretty certain we all know the answer to this question: he didn’t need to read the manual, because he knew what to do. Which leads to the next question: how did he know what to do? As it turns out Captain Sullenberger was a glide instructor, so he not only knew what to do in theory, he’d actually practiced it before. And this, I think, is a crucial point.

What Aristotle posited was that action and belief are not independent “things,” as we often believe, but rather interconnected. As we do things we change our beliefs. As we believe things, we change our actions. The two run in tandem like a Möbius strip, each one supporting the other. This theory of aretaic ethics is called the virtue ethic.

What does any of this have to do with engineering?

Have you ever met an engineer who can quickly assess a network design, seemingly at a glance, tell you where the problems will be, and propose a set of solutions? Have you ever met an engineer who can sit calmly through a huge network outage, looking at the various outputs and figuring out what is wrong, apparently oblivious to the storms surrounding them?

What you’ve witnessed is the virtue ethic in operation. Knowledge builds on experience, and experience builds knowledge. It takes time to build the two, but neither one can exist without the other. To put it in another context, in the shooting world there is a saying: when you’re under pressure, you don’t shoot the way you think, you shoot the way you’ve practiced.

We can apply this concept to our lives in many ways as engineers. To be the engineer who can take in a network “all at once,” and “grock” it, you have to both gain knowledge and practice. The knowledge I’m talking about, by the way, isn’t about command lines; it’s about understanding how networks work at a systemic, integral level. It’s about having a set of mental models you can put around any situation to make sense of it quickly and efficiently — for instance, the OODA loop. It’s about understanding complexity, and all the rest. The virtue ethic says we need to experience and know; that these two things need to reinforce one another.

But how can we learn? Learning isn’t just about reading a few things here and there, or watching the occasional webinar. Learning is, itself, a learned skill — subject to the virtue ethic. You can learn how to learn (a subject for another entire set of posts, and at least part of the point of every other post here).

In fact, the concept of virtue is closely tied to something else that’s always close to the surface of my thinking: culture. One possible definition of a group’s culture is the intertwined surfaces of knowledge and action built up through repetition over time — the virtue ethic.

What defines culture for a group defines culture for you, as well. What are the habits you’re building today? Are you learning to learn? Are you intentionally building a set of practiced skills that will carry you through when there’s no time to read the manual?

What’s your culture? After all, culture eats technology for breakfast.

And you didn’t think Aristotle could teach you anything.