For any field of study, there are some mental habits that will make you an expert over time. Whether you are an infrastructure architect, a network designer, or a network reliability engineer, what are the habits of mind those involved in the building and operation of networks follow that mark out expertise?
Experts involve the user
Experts don’t just listen to the user, they involve the user. This means taking the time to teach the developer or application owner how their applications interact with the network, showing them how their applications either simplify or complicate the network, and the impact of these decisions on the overall network.
Experts think about data
Rather than applications. What does the data look like? How does the business use the data? Where does the data need to be, when does it need to be there, how often does it need to go, and what is the cost of moving it? What might be in the data that can be harmful? How can I protect the data while at rest and in flight?
Experts think in modules, surfaces, and protocols
Devices and configurations can, and should, change over time. The way a problem is broken up into modules and the interaction surfaces (interfaces) between those modules can be permanent. Choosing the wrong protocol means choosing a different protocol to solve every problem, leading to accretion of complexity, ossification, and ultimately brittleness. Break the problem up right the first time, and choose the protocols carefully, and let the devices and configurations follow.
Choosing devices first is like selecting the hammer you’re going to use to build a house, and then selecting the design and materials used in the house based on what you can use the hammer for.
Experts think about tradeoffs
State, optimization, and surface is an ironclad tradeoff. If you increase state, you increase complexity while also increasing optimization. If you increase surfaces through abstraction, you are both increasing and decreasing state, which has an impact both on complexity and optimization. All nontrivial abstractions leak. Every time you move data you are facing the speed of serialization, queueing, and light, and hence you are dealing with the choice between consistency, availablity, and partitioning.
If you haven’t found the tradeoffs, you haven’t looked hard enough.
Experts focus on the essence
Every problem has an essential core—something you are trying to solve, and a reason for solving it. Experts know how to divide between the essential and the nonessential. Experts think about what they are not designing, and what they are not trying to accomplish, as well as what they are. This doesn’t mean the rest isn’t there, it just means it’s not quite in focus all the time.
Experts are mentally stimulated to simulate
Labs are great—but moving beyond the lab and thinking about how the system works as a whole is better. Experts mentally simulate how the data moves, how the network converges, how attackers might try to break in, and other things besides.
Experts look around
Interior designers go to famous spaces to see how others have designed before them. Building designers walk through cities and famous buildings to see how others have designed before them. The more you know about how others have designed, the more you know about the history of networks, the more of an expert you will be.
Experts reshape the problem space
Experts are unafraid to think about the problem in a different way, to say “no,” and to try solutions that have not been tried before. Best common practice is a place to start, not a final arbiter of all that is good and true. Experts do not fall to the “is/ought” fallacy.
Experts treat problems as opportunities
Whether the problem is a mistake or a failure, or even a little bit of both, every problem is an opportunity to learn how the system works, and how networks work in general.
Simplification is a constant theme not only here, and in my talks, but across the network engineering world right now. But what does this mean practically? Looking at a complex network, how do you begin simplifying?
The first option is to abstract, abstract again, and abstract some more. But before diving into deep abstraction, remember that abstraction is both a good and bad thing. Abstraction can reduce the amount of state in a network, and reduce the speed at which that state changes. Abstraction can cover a multitude of sins in the legacy part of the network, but abstractions also leak!!! In fact, all nontrivial abstractions leak. Following this logic through: all non-trivial abstractions leak; the more non-trivial the abstraction, the more it will leak; the more complexity an abstraction is covering, the less trivial the abstraction will be. Hence: the more complexity you are covering with an abstraction, the more it will leak.
Abstraction, then, is only one part of the solution. You must not only abstract, but you must also simplify the underlying bits of the system you are covering with the abstraction. This is a point we often miss.
Which returns us to our original question. The first answer to the question is this: minimize.
Minimize the number of technologies you are using. Of course, minimization is not so … simple … because it is a series of tradeoffs. You can minimize the number of protocols you are using to build the network, or you can minimize the number of things you are using each protocol for. This is why you layer things, which helps you understand how and where to modularize, focusing different components on different purposes, and then thinking about how those components interact. Ultimately, what you want is precisely the number of modules required to do the job to a specific level of efficiency, and not one module more (or less).
Minimize the kinds of “things” you are using. Try to use one data center topology, one campus topology, one regional topology, etc. Try to use one kind of device (whether virtual or physical) in each “role.” Try to reduce the number of “roles” in the network.
Think of everything, from protocols to “places,” as “modules,” and then try to reduce the number of modules. Modules should be chosen for repeatability, functional division, and optimal abstraction.
The second answer to the original question is: architecture should move slowly, components quickly.
The architecture is not the network, nor even the combination of all the modules.
Think of a building. Every building has bathrooms (I assume). All those bathrooms have sinks (I assume). The sinks need to fit the style of the building. The number of sinks need to match the needs of the building overall. But—the sinks can change rapidly, and in response to the changing architecture of the building, but the building, it’s purpose, and style, change much more slowly. Architecture should change slowly, components more rapidly.
This is another reason to create modules: each module can change as needed, but the architecture of the overall system needs to change more slowly and intentionally. Thinking in systemic terms helps differentiate between the architecture and the components. Each component should fit within the overall architecture, and each component should play a role in shaping the architecture. Does the organization you support rely on deep internal communication across a wide geographic area? Or does it rely on lots of smaller external communications across a narrow geographic area? The style of communication in your organization makes a huge difference in the way the network is built, just like a school or hospital has different needs in terms of sinks than a shopping mall.
So these are, at least, two rules for simplification you can start thinking about how to apply in practical ways: modularize, choose modules carefully, reduce the number of the kinds of modules, and think about what things need to change quickly and what things need to change slowly.
Throwing abstraction at the problem does not, ultimately, solve it. Abstraction must be combined with a lot of thinking about what you are abstracting and why.