WRITTEN
Innovation Myths
Innovation has gained a sort-of mystical aura in our world. Move fast and break stuff. We recognize and lionize innovators in just about every way possible. The result is a general attitude of innovate or die—if you cannot innovate, then you will not progress in your career or life. Maybe it’s time to take a step back and bust some of the innovation myths created by this near idolization of innovation.
You can’t innovate where you are. Reality: innovation is not tied to a particular place and time. “But I work for an enterprise that only uses vendor gear… Maybe if I worked for a vendor, or was deeply involved in open source…” Innovation isn’t just about building new products! You can innovate by designing a simpler network that meets business needs, or by working with your vendor on testing a potential new product. Ninety percent of innovation is just paying attention to problems, along with a sense of what is “too complex,” or where things might be easier.
You don’t work in open source or open standards? That’s not your company’s problem, that’s your problem. Get involved. It’s not just about protocols, anyway. What about certifications, training, and the many other areas of life in information technology? Just because you’re in IT doesn’t mean you have to only invent new technologies.
Innovation must be pursued—it doesn’t “just happen.” We often tell ourselves stories about innovation that imply it “is the kind of thing we can accomplish with a structured, linear process.” The truth is the process of innovation is unpredictable and messy. Why, then, do we tell innovation stories that sound so purposeful and linear?
Innovation just happens. Either the inspiration just strikes, or it doesn’t, right? You’re just walking along one day and some really innovative idea just jumps out at you. You’re struck by lightning, as it were. This is the opposite of the previous myth, and just as wrong in the other direction.
Innovation requires patience. According to Keith’s Law, any externally obvious improvement in a product is really the result of a large number of smaller changes hidden within the abstraction of the system itself. Innovation is a series of discoveries over months and even years. Innovations are gradual, incremental, and collective—over time.
Innovation often involves combining existing components. If you don’t know what’s already in the field (and usefully adjacent fields), you won’t be able to innovate. Innovation, then, requires a lot of knowledge across a number of subject areas. You have to work to learn to innovate—you can’t fake this.
Innovation often involves a group of people, rather than lone actors. We often emphasize lone actors, but they rarely work alone. To innovate, you have to inteniontally embed yourself in a community with a history of innovation, or build such a community yourself.
Innovation must take place in an environment where failure is seen as a good thing (at least your were trying) rather than a bad one.
Innovative ideas don’t need to be sold. Really? Then let’s look at Qiubi, which “failed after only 7 months of operation and after having received $2 billion in backing from big industry players.” The idea might have been good, but it didn’t catch on. The idea that you can “build a better mousetrap” and “the world will beat a path to your door,” just isn’t true, and it never has been.
The bootom line is…Innovation does require a lot of hard work. You have to prepare your mind, learn to look for problems that can be solved in novel ways, be inquisitive enough to ask why, and if there is a better way, stubborn enough to keep trying, and confident enough to sell your innovation to others. But you can innovate where you are—to believe otherwise is a myth.
The Senior Trap
How do you become a “senior engineer?” It’s a question I’m asked quite often, actually, and one that deserves a better answer than the one I usually give. Charity recently answered the question in a round-a-bout way in a post discussing the “trap of the premature senior.” She’s responding to an email from someone who is considering leaving a job where they have worked themselves into a senior role. Her advice?
Quit!
This might seem to be counter-intuitive, but it’s true. I really wanted to emphasize this one line—
Exactly! Knowing the CLI for one vendor’s gear, or even two vendor’s gear, is not nearly the same as understanding how BGP actually works. Quoting the layers in the OSI model is just not the same thing as being able to directly apply the RINA model to a real problem happening right now. You’re not going to gain the understanding of “the whole ball of wax” by staying in one place, or doing one thing, for the rest of your life.
If I have one piece of advice other than my standard two, which are read a lot (no, really, A LOT!!!!) and learn the fundamentals, it has to be do something else.
Charity says this is best done by changing jobs—but this is a lot harder in the networking world than it is in the coding world. There just aren’t as many network engineering jobs as there are coding jobs. So what can you do?
First, do make it a point to try to work for both vendors and operators throughout your career. These are different worlds—seriously. Second, even if you stay in the same place for a long time, try to move around within that company. For instance, I was a Cisco for sixteen years. During that time, I was in tech support, escalation, engineering, and finally sales (yes, sales). Since then, I’ve worked in a team primarily focused on research at an operator, in engineering at a different vendor, then in an operationally oriented team at a provider, then marketing, and now (technically) software product management. I’ve moved around a bit, to say the least, even though I’ve not been at a lot of different companies.
Even if you can’t move around a lot like this for whatever reason, always take advantage of opportunities to NOT be the smartest person in the room. Get involved in the IETF. Get involved in open source projects. Run a small conference. Teach at a local college. I know it’s easy to say “but this stuff doesn’t apply to the network I’m actually working on.” Yes, you’re right. And yet—that’s the point, isn’t it? You don’t expand your knowledge by only learning things that apply directly to the problem you need to solve right now.
Of course, if you’re not really interested in becoming a truly great network engineer, then you can just stay “senior” in a single place. But I’m guessing that if you’re reading this blog, you’re interested in becoming a truly great network engineer.
Pay attention to the difference between understanding things and just being familiar with them. The path to being great is always hard, it always involves learning, and it always involves a little risk.
Technologies that Didn’t: Asynchronous Transfer Mode
One of the common myths of the networking world is there were no “real” networks before the early days of packet-based networks. As myths go, this is not even a very good myth; the world had very large-scale voice and data networks long before distributed routing, before packet-based switching, and before any of the packet protocols such as IP. I participated in replacing a large scale voice and data network, including hundreds of inverse multiplexers that tied a personnel system together in the middle of the 1980’s. I also installed hundreds of terminal emulation cards in Zenith Z100 and Z150 systems in the same time frame to allow these computers to connect to mainframes and newer minicomputers on the campus.
All of these systems were run through circuit-switched networks, which simply means the two end points would set up a circuit over which data would travel before the data actually traveled. Packet switched networks were seen as more efficient at the time because the complexity of setting these circuits up, along with the massive waste of bandwidth because the circuits were always over provisioned and underused.
The problem, at that time, with packet-based networks was the sheer overhead of switching packets. While frames of data could be switched in hardware, packets could not. Each packet could be a different length, and each packet carried an actual destination address, rather than some sort of circuit identifier—a tag. Packet switching, however, was quickly becoming the “go to” technology solution for a lot of problems because of its efficient use of network resources, and simplicity of operation.
Asynchronous Transfer Mode, or ATM, was widely seen as a compromise technology that would provide the best circuit and packet switching in a single technology. Data would be input into the network in the form of either packets or circuits. The data would then be broken up into fixed sized cells, which would then be switched based on a fixed label-based header. This would allow hardware to switch the cells in a way that is like circuit switching, while retaining many of the advantages of a circuit switched network. In fact, ATM allowed for both circuit- and packet-switched paths to be both be used in the same network.
With all this goodness under one technical roof, why didn’t ATM take off? The charts from the usual prognosticators showed markets that were forever “up and to the right.”
The main culprit in the demise of ATM turned out to be the size of the cell. In order to support a good combination of voice and data traffic, the cell size was set to 53 octets. A 48-octet packet, then, should take up a single cell with a little left over. Larger packets, in theory, should be able to be broken into multiple cells and carried over the network with some level of efficiency, as well. The promise of the future was ATM to the desktop, which would solve the cell size overhead problem, since applications would generate streams pre-divided into the correctly sized packets to use the fixed cell size efficiently.
The reality, however, was far different. The small cell size, combined with the large overhead of carrying both a network layer header, the ATM header, and the lower layer data link header, caused ATM to be massively inefficient. Some providers at the time had research showing that while they were filling upwards of 80% of any given link’s bandwidth, the goodput, the amount of data being transmitted over the link, was less than 40% of the available bandwidth. There were problems with out of order cells and reassembly to add on top of this, causing entire packets worth of data, spread across multiple cells, to be discarded. The end was clearly near when articles appeared in popular telecommunications journals comparing ATM to shredding the physical mail, attaching small headers to each resulting chad, and reassembling the original letters at the recipient’s end of the postal system. The same benefits were touted—being able to pack mail trucks more tightly, being able to carry a wider array of goods over a single service, etc.
In the end, ATM to the desktop never materialized, and the inefficiencies of ATM on long-haul links doomed the technology to extinction.
Lessons learned? First, do not count on supporting “a little inefficiency” while the ecosystem catches up to a big new idea. Either the system has immediate, measurable benefits, or it does not. If it does not, it is doomed from the first day of deployment. Second, do not try to solve all the problems in the world at once. Build simple, use it, and then build it better over time. While we all hate being beta testers, sometimes real-world beta testing is the only way to know what the world really wants or needs. Third, up-and-to-the-right charts are easy to justify and draw. They are beautiful and impressive on glossy magazine pages, and in flashy presentations. But they should always be considered carefully. The underlying technology, and how it matches real-world needs, are more important than any amount of forecasting and hype.
Technical Debt (or Is Future Proofing Even a Good Idea?)
What, really, is “technical debt?” It’s tempting to say “anything legacy,” but then why do we need a new phrase to describe “legacy stuff?” Even the prejudice against legacy stuff isn’t all that rational when you think about it. Something that’s old might also just be well-tested, or well-worn but still serviceable. Let’s try another tack.
Technical debt, in the software world, can be defined as working on a piece of software for long periods of time by only adding features, and never refactoring or reorganizing the code to meet current conditions. The general idea is that as new features are added on top of the old, two things happen. First, the old stuff becomes a sort of opaque box that no-one understands. Second, the stuff being added to the old increasingly relies on public behavior that might be subject to unintended consequences or leaky abstractions.
To resolve this problem in the software world, software is “refactored.” In refactoring, every use of a public API is examined, including what information is being drawn out, or what the expected inputs and outputs are. The old code is then “discarded,” in a sense, and a new underlying function written that meets the requirements discovered in the existing codebase. The refactoring process allows the calling functions and called functions, the clients and the servers, to “move together.” As the overall understanding of the system changes, the system itself can change with that understanding. This brings the implementation closer to the understanding of the current engineering team.
So technical debt is really the mismatch between the understanding of the current engineering team imposed onto the implementation of a prior engineering team with a different understanding of the requirements (even if the older engineering team is the same people at a different time).
How can we apply this to networking? In the networking world, we actively work against refactoring by future proofing.
Future proofing: building a network that will never need to be replaced. Or, perhaps, never needing to go to your manager and say “I need to buy new hardware,” because the hardware you bought last week does not have the functionality you need any longer. This sounds great in theory, but the problem with theory is it often does not hold up when the glove hits the nose (every has a plan until they are punched in the face). The problem with future proofing is you can’t see the punch that’s coming until the future actually throws it.
If something is future proofed, it either cannot be refactored, or there will be massive resistence to the refactoring process.
Maybe its time we tried to change our view of things so we can refactor networks. What would refactoring a network look like? Maybe examining all the configurations within a particular module, figuring what they do, and then trying to figure out what application or requirement led to that particular bit of configuration. Or, one step higher, looking at every protocol or “feature” (whatever a feature might be), figuring out what purpose it might serve, and then intentionally setting about finding some other way, perhaps a simpler way, to provide that same service.
One key point in this process is to begin by refusing to look at the network as a set of appliances, and starting to see it as a system made up of hardware and software. Mentally disaggregating the software and hardware can allow you to see what can be changed and what cannot, and inject some flexibility into the network refactoring process.
When you refactor a bit of code, what you typically end up with a simpler piece of software that more closely matches the engineering team’s understanding of current requirements and conditions. Aren’t simplicity and coherence goals for operational networks, too?
If so, ehen was the last time you refactored your network?
The Dangers of Flying Pigs (RFC1925, rule 3)
There are many times in networking history, and in the day-to-day operation of a network, when an engineer has been asked to do what seems to be impossible. Maybe installing a circuit faster than a speeding bullet or flying over tall buildings to make it to a remote site faster than any known form of conveyance short of a transporter beam (which, contrary to what you might see in the movies, has not yet been invented).
One particular impossible assignment in the early days of network engineering was the common request to replicate the creation of the works of Shakespeare making use of the infinite number of monkeys (obviously) connected to the Internet. The creation of appropriate groups of monkeys, the herding of these groups, and the management of their output were once considered a nearly impossible task, similar to finding a token dropped on the floor or lost in the ether.
This problem proved so intractable that the IETF finally created an entire suite of management tools for managing the infinite monkeys used for these experiments, which is described in RFC2795. This RFC describes the Infinite Monkey Protocol Suite (IMPS), which runs on top of the Internet Protocol, the Infinite Threshold Accounting Gadget (I-TAG), and the KEEPER specification, which provides a series of messages to manage the infinite monkeys. The problem raised a number of problems about the construction of the experiment, such as whether the compilation of works should take place on a letter-by-letter or word-by-word basis. Ultimately, the problem was apparently solved through the creation of infinite monkey simulators, such as this one.
For those situations, such as assembling and managing an infinite suite of monkeys gathered for test, when a network engineer is asked to perform something which is apparently impossible, the first thing that is required is a lot of hot, caffeinated beverage. And there is no better way to make such beverages than through a hypertext-controlled hot beverage device. This device is so important, in fact, that the IETF described the interface and protocols for it fairly early, in RFC2324. While having a hypertext control interface to such devices is important, sometimes the making of caffeinated beverages should be automated; an interface which can be used for automation is described in RFC2325. If the engineer prefers some form of caffeine other than coffee, the procedures in RFC7168 should be followed.
Another common problem posed to network engineers is to make pigs fly. While it has often been reported that pigs cannot, in fact, fly, those who report this are apparently not well acquainted with engineers who have been given large amounts of a hot, caffeinated beverage. In fact, that which is probable, and yet impossible, is often more likely to occur than that which is possible, and yet improbable, once a network engineer has been given enough of this kind of beverage.
There is a danger, however, with attempting to perform the possible, no matter how good the intentions or plan. As RFC1925 states in rule 3: “With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead.” Network engineers plus hot caffeinated beverages may just achieve the impossible.
Or you might end up with a pig on your head. It’s hard to tell, so be careful what you ask for.
Strong Reactions and Complexity
In the realm of network design—especially in the realm of security—we often react so strongly against a perceived threat, or so quickly to solve a perceived problem, that we fail to look for the tradeoffs. If you haven’t found the tradeoffs, you haven’t looked hard enough—or, as Dr. Little says, you have to ask what is gained and what is lost, rather than just what is gained. This failure to look at both sides often results in untold amounts of technical debt and complexity being dumped into network designs (and application implementations), causing outages and failures long after these decisions are made.
A 2018 paper on DDoS attacks, A First Joint Look at DoS Attacks and BGP Blackholing in the Wild provides a good example of causing more damage to an attack than the attack itself. Most networks are configured to allow the operator to quickly configure a remote triggered black hole (RTBH) using BGP. Most often, a community is attached to a BGP route that points the next-hop to a local discard route on each eBGP speaker. If used on the route advertising the destination of the attack—the service under attack—the result is the DDoS attack traffic no longer has a destination to flow to. If used on the route advertising the source of the DDoS attack traffic, the result is the DDoS traffic will no pass any reverse-path forwarding policies at the edge of the AS, and hence be dropped. Since most DDoS attacks are reflected, blocking the source traffic still prevents access to some service, generally DNS or something similar.
In either case, then, stopping the DDoS through an RTBH causes damage to services rather than just the attacker. Because of this, remote triggered black holes should really only be used in the most extreme cases, where no other DDoS mitigation strategy will work.
The authors of the Joint Look use publicly avaiable information to determine the answers to several questions. First, what scale of DDoS attacks are RTBHs used against? Second, how long after an attack begins is the RTBH triggered? Third, for how long is the RTBH left in place after the attack has been mitigated?
The answer to the first question should be—the RTBH is only used against the largest-scale attacks. The answer to the second question should be—the RTBH should be put in place very quickly after the attack is detected. The answer to the third question should be—the RTBH should be taken down as soon as the attack has stopped. The researchers found that RTBHs were most often used to mitigate the smallest of DDoS attacks, and almost never to mitigate larger ones. The authors also found that RTBHs were often left in place for hours after a DDoS attack had been mitigated. Both of these imply that current use of RTBH to mitigate DDoS attacks is counterproductive.
How many more design patterns do we follow that are simply counterproductive in the same way? This is not a matter of “following the data,” but rather one of really thinking through what it is you are trying to accomplish, and then how to accomplish that goal with the simplest set of tools available. Think through what it would mean to remove what you have put in, whether you really need to add another layer or protocol, how to minimize configuration, etc.
If you want your network to be less complex, examine the tradeoffs realistically.
Random Thoughts on IoT
Let’s play the analogy game. The Internet of Things (IoT) is probably going end up being like … a box of chocolates, because you never do know what you are going to get? a big bowl of spaghetti with a serious lack of meatballs? Whatever it is, the IoT should have network folks worried about security. There is, of course, the problem of IoT devices being attached to random places on the network, exfiltrating personal data back to a cloud server you don’t know anything about. Some of these devices might be rogue, of course, such as Raspberry Pi attached to some random place in the network. Others might be more conventional, such as those new exercise machines the company just brought into the gym that’s sending personal information in the clear to an outside service.
While there is research into how to tell the difference between IoT and “larger” devices, the reality is spoofing and blurred lines will likely make such classification difficult. What do you do with a virtual machine that looks like a Raspberry Pi running on a corporate laptop for completely legitimate reasons? Or what about the Raspberry Pi-like device that can run a fully operational Windows stack, including “background noise” applications that make it look like a normal compute platform? These problems are, unfortunately, not easy to solve.
To make matters worse, there are no standards by which to judge the security of an IoT device. Even if the device manufacturer–think about the new gym equipment here–has the best intentions towards security, there is almost no way to determine if a particular device is designed and built with good security. The result is that IoT devices are often infected and used as part of a botnet for DDoS, or other, attacks.
What are our options here from a network perspective? The most common answer to this is segmentation–and segmentation is, in fact, a good start on solving the problem of IoT. But we are going to need a lot more than segmentation to avert certain disaster in our networks. Once these devices are segmented off, what do we do with the traffic? Do we just allow it all (“hey, that’s an IoT device, so let it send whatever it wants to… after all, it’s been segmented off the main network anyway”)? Do we try to manage and control what information is being exfiltrated from our networks? Is machine learning going to step in to solve these problems? Can it, really?
To put it another way–the attack surface we’re facing here is huge, and the smallest mistake can have very bad ramifications in individual lives. Take, for instance, the problem of data and IoT devices in abusive relationships. Relationships are dynamic; how is your company going to know when an employee is in an abusive relationship, and thus when certain kinds of access should be shut off? There is so much information here it seems almost impossible to manage it all.
It looks, to me, like the future is going to be a bit rough and tumble as we learn to navigate this new realm. Vendors will have lots of good ideas (look at Mists’ capabilities in tracking down the location of rogue devices, for instance), but in the end it’s going to be the operational front line that is going to have to figure out how to manage and deploy networks where there is a broad blend of ultimately untrustable IoT devices and more traditional devices.
Now would be the time to start learning about security, privacy, and IoT if you haven’t started already.
