In the early 1990’s, several large network equipment vendors sold equipment targeted at large scale dial banks, including Cisco, ADTRAN, and Cabletron. These servers were often tied to banks of modems on one side, and to servers of some sort on the other, creating a dial bank. Dial banks grew in scale and sophistication on the back of Bulletin Board Systems (BBSs), run by dedicated system operators (SYSOPs) for the support of individual communities. At one time, there were tens of thousands, potentially even hundreds of thousands, of these systems in the United States alone, serving large communities of users. @ECI LighTALK
In Systemantics: How Systems Really Work and How They Fail, John Gall says:
A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.
In the software development world, this is called Gall’s Law (even though Gall himself never calls it a law) and is applied to organizations and software systems. How does this apply to network design and engineering? The best place to begin in answering this question is to understand what, precisely, Gall is arguing for; there is more here than what is visible on the surface.
What does a simple system mean? It is, first of all, an argument for underspecification. This runs counter to the way we instinctively want to design systems. We want to begin by discovering all the requirements (problems to be solved and constraints), and then move into an orderly discussion of all the possible solutions and sets of solutions, and then into an orderly discussion of an overall architecture, then into a nice UML chart showing all the interaction surfaces and how they work, and … finally … into building or buying the individual components.
This is beautiful on paper, but it does not often work in real life. What Gall is arguing for is building a small, simple system first that only solves some subset of the problems. Once that is done, add onto the core system until you get to a solution that solves the problem set. The initial system, then, needs to be underspecified. The specification for the initial system does not need to be “complete;” it just needs to be “good enough to get started.”
If this sounds something like agile development, that’s because it is something like agile development.
This is also the kind of thinking that has been discussed on the history of networking series (listen to this episode on the origins of DNS with Paul Mockapetris as an example). There are a number of positive aspects to this way of building systems. First, you solve problems in small enough chunks to see real progress. Second, as you solve each problem (or part of the problem), you are creating a useable system that can be deployed and tested and solves a specific problem. Third, you are more likely to “naturally” modularize a system if you build it in pieces. Once some smaller piece is in production, it is almost always going to be easier to build another small piece than to try to add new functionality and deploy the result.
How can this be applied to network design and operations?
The most obvious answer is to build the network in chunks, starting with the simple things first. For instance, if you are testing a new network design, focus on building just a campus or data center fabric, rather than trying to replace the entire existing network with a new one. This use of modularization can be extended to use cases beyond topologies within the network, however. You could allow multiple overlays to co-exist, each one solving a specific problem, in the data center.
This latter example, however—multiple overlays—shows how and where this kind of strategy can go wrong. In building multiple overlays you might be tempted to build multiple kinds of overlays by using different kinds of control planes, or different kinds of transport protocols. This kind of ad-hoc building can fit well within the agile mindset but can result in a system that is flat-out unmaintainable. I have been in two-day meetings where the agenda was just to go over every network management related application currently deployed in the network. A printed copy of the spreadsheet, on tool per line, came out to tens of pages. This is agile gone wildly wrong, driving unnecessary complexity.
Another problem with this kind of development model, particularly in network engineering, is it is easy to ignore lateral interaction surfaces, particularly among modules that do not seem to interact. For instance, IS-IS and BGP are both control planes, and hence seem to fit at the same “layer” in the design. Since they are lateral modules, each one providing different kinds of reachability information, it is easy to forget they also interact with one another.
Gall’s law, like all laws in the engineering world, can be a good rule of thumb—so long as you keep up a system-level view of the network, and maintain discipline around a basic set of rules (such as “don’t use different kinds of overlays, even if you use multiple overlays”).
Today’s enterprises make use of applications and services that live on-premises, in public clouds and even in edge data centers. IT has entered the multicloud era and connecting the modern enterprise isn’t easy. —aganesan
Open source code is vital to software development at most organizations, but that doesn’t mean that enterprises have figured out how to use open source without inadvertently introducing vulnerabilities into their code. —Curtis Franklin, Jr
World Password Day is a day in which companies around the world post blogs with advice, sometimes questionable, the obligatory XKCD comic and talk about the importance of Multi-Factor Authentication (MFA). At Juniper Networks, we thought instead we’d have “the talk” about the foundation of password protection as a foundation of security itself. —Trevor Pott
Last May, Europe imposed new data privacy guidelines that carry the hopes of hundreds of millions of people around the world — including in the United States — to rein in abuses by big tech companies. Almost a year later, it’s apparent that the new rules have a significant loophole: The designated lead regulator — the tiny nation of Ireland — has yet to bring an enforcement action against a big tech firm. —Nicholas Vinocur
For those of you interested in the world of network disaggregation, the LiveLesson Dinesh Dutt and I recorded back in January is up on Safari Books Online as a “rough cut.” I’m not entirely certain when the official release will be available, but the rough cut versions are usually pretty good anyway. The one humorous mistake I see on the current page is the topic is listed as “travel.” Well, I do travel a lot, but I’ve never made a video on travel.
Danger Storm Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 3.0 License
It was quite difficult to prepare a tub full of bath water at many points in recent history (and it probably still is in some many parts of the world). First, there was the water itself—if you do not have plumbing, then the water must be manually transported, one bucket at a time, from a stream, well, or pump, to the tub. The result, of course, would be someone who was sweaty enough to need the forthcoming bath. Then there is the warming of the water. Shy of building a fire under the tub itself, how can you heat enough water quickly enough to make the eventual bathing experience? According to legend, this resulted in the entire household using the same tub of water to bathe. The last to bathe was always the smallest, the baby. By then, the water would be murky with dirt, which means the child could not be seen in the tub. When the tub was thrown out, then, no-one could tell if the baby was still in there.
But it doesn’t take a dirty tub of water to throw the baby out with the bath. All it really takes is an unwillingness to learn from the lessons of others because, somehow, you have convinced yourself that your circumstances are so different there is nothing to learn. Take, for instance, the constant refrain, “you are not Google.”
I should hope not.
But this phrase, or something similar, is often used to say something like this: you don’t have the problems of any of the hyperscalers, so you should not look to their solutions to find solutions for your problems. An entertaining read on this from a recent blog:
Software engineers go crazy for the most ridiculous things. We like to think that we’re hyper-rational, but when we have to choose a technology, we end up in a kind of frenzy — bouncing from one person’s Hacker News comment to another’s blog post until, in a stupor, we float helplessly toward the brightest light and lay prone in front of it, oblivious to what we were looking for in the first place. —Oz Nova
There is a lot of truth here—you should never choose a system or solution because it solves someone else’s problem. Do not deploy Kafka if you you need the scale Kafka represents. Maybe you don’t need four links between every pair of routers “just to be certain you have enough redundancy.”
On the other hand, there is a real danger here of throwing the baby out with the bathwater—the water is murky with product and project claims, so just abandon the entire mess. To see where the problem is here, let’s look at another large scale system we don’t think about very much any longer: the NASA space program from the mid-1960’s. One of the great things the folks at NASA have always liked to talk about is all the things that came out of the space program. Remember Tang? Or maybe not. It really wasn’t developed for the space program, and it’s mostly sugar and water, but it was used in some of the first space missions, and hence became associated with hanging out in space.
There are a number of other inventions, however, that really did come directly out of research into solving problems folks hanging out in space would have, such as the space pen, freeze-dried ice cream, exercise machines, scratch-resistant eyeglass lenses, cameras on phones, battery powered tools, infrared thermometers, and many others.
Since you are not going to space any time soon, you refuse to use any of these technologies, right?
Do not be silly. Of course you still use these technologies. Because you are smart enough not to throw the baby out with the bathwater, right?
You should apply the same level of care to the solutions Google, Amazon, LinkedIn, Microsoft, and the other hyperscalers. Not everything is going to fit in your environment, of course. On the other hand, some things might fit. And regardless of whether any particular technology fits or not, you can still learn something about how systems work by considering how they are building things to scale to their needs. You can adopt operational processes that make sense based on what they have learned. You can pick out technologies and ways of thinking that make sense.
No, you’re (probably not) Google. On the other hand, we are all building complex networks. The more we can learn from those around us, the better what we build will be. Don’t throw the baby out with the bathwater.