If you haven’t found the tradeoffs, you haven’t looked hard enough. Something I say rather often—as Eyvonne would say, a “Russism.” Fair enough, and it’s easy enough to say “if you haven’t found the tradeoffs, you haven’t looked hard enough,” but what does it mean, exactly? How do you apply this to the everyday world of designing, deploying, operating, and troubleshooting networks?

Humans tend to extremes in their thoughts. In many cases, we end up considering everything a zero-sum game, where any gain on the part of someone else means an immediate and opposite loss on my part. In others, we end up thinking we are going to get a free lunch. The reality is there is no such thing as a free lunch, and while there are situations that are a zero-sum game, not all situations are. What we need is a way to “cut the middle” to realistically appraise each situation and realistically decide what the tradeoffs might be.

This is where the state/optimization/surface (SOS) model comes into play. You’ll find this model described in several of my books alongside some thoughts on complexity theory (see the second chapter here, for instance, or here), but I don’t spend a lot of time discussing how to apply this concept. The answer lies in the intersection between looking for tradeoffs and the SOS model.

TL;DR version: the SOS model tells you where you should look for tradeoffs.

Take the time-worn example of route aggregation, which improves the operation of a network by reducing the “blast radius” of changes in reachability. Combining aggregation with summarization (as is almost always the intent), it reduces the “blast radius” for changes in the network topology as well. The way aggregation and summarization reduce the “blast radius” is simple: if you define a failure domain as the set of devices which must somehow react to a change in the network (the correct way to define a failure domain, by the way), then aggregation and summarization reduce the failure domain by hiding changes in one part of the network from devices in some other part of the network.

Note: the depth of the failure domain is relevant, as well, but not often discussed; this is related to the depth of an interaction surface, but since this is merely a blog post . . .

According to SOS, route aggregation (and topology summarization) is a form of abstraction, which means it is a way of controlling state. If we control state, we should see a corresponding tradeoff in interaction surfaces, and a corresponding tradeoff in some form of optimization. Given these two pointers, we can search for your tradeoffs. Let’s start with interaction surfaces.

Observe aggregation is normally manually configured; this is an interaction surface. The human-to-device interaction surface now needs to account for the additional work of designing, configuring, maintaining, and troubleshooting around aggregation—these things add complexity to the network. Further, the routing protocol must also be designed to support aggregation and summarization, so the design of the protocol must also be more complex. This added complexity is often going to come in the form of . . . additional interaction surfaces, such as the not-to-stubby external conversion to a standard external in OSPF, or something similar.

Now let’s consider optimization. Controlling failure domains allows you to build larger, more stable networks—this is an increase in optimization. At the same time, aggregation removes information from the control plane, which can cause some traffic to take a suboptimal path (if you want examples of this, look at the books referenced above). Traffic taking a suboptimal path is a decrease in optimization. Finally, building larger networks means you are also building a more complex network—so we can see the increase in complexity here, as well.

Experience is often useful in helping you have more specific places to look for these sorts of things, of course. If you understand the underlying problems and solutions (hint, hint), you will know where to look more quickly. If you understand common implementations and the weak points of each of those implementations, you will be able to quickly pinpoint an implementation’s weak points. History might not repeat itself, but it certainly rhymes.

I have spent many years building networks, protocols, and software. I have never found a situation where the SOS model, combined with a solid knowledge of the underlying problems and solutions (or perhaps technologies and implementations used to solve these problems) have led me astray in being able to quickly find the tradeoffs so I could see, and then analyze, them.

There was a man I saw last week in the Salvador Dali museum, a middle aged tourist in a Nike t-shirt, who acted as if he was doing a scavenger hunt speed-run of the absurd artistic labyrinth designed by the famed artist. His phone camera permanently on, he rushed from framed painting to hand-carved sculpture to meticulously-made mechanical inventions, tapping away at the button to capture the blurry images of ornate creations. —Ben Domenech

All of this fiber activity is going to mean a shortfall of industry resources of all kinds. I’ve already witnessed construction delays in projects this year due to resource shortages and I fear delays will increase in 2020 and beyond. —Doug Dawson

When I walked out the door on my last day as Google’s Head of International Relations, I couldn’t help but think of my first day at the company. I had exchanged a wood-paneled office, a suit and tie, and the job of wrestling California’s bureaucracy as Governor Schwarzenegger’s deputy chief of staff for a laptop, jeans, and a promise that I’d be making the world better and more equal, under the simple but powerful guidance “Don’t be evil.” —Ross LaJeunesse

Poorly secured mail servers can be a malicious actor’s best friend — they can enable social engineering, phishing, fraud, and the spread of malware, not to mention that mail servers allowing open relay create the perfect conditions for the spoofing of sender addresses and the sending of spam. —Adli Wahid

Organizations’ pursuit of increased workplace collaboration has led managers to transform traditional office spaces into ‘open’, transparency-enhancing architectures with fewer walls, doors and other spatial boundaries, yet there is scant direct empirical research on how human interaction patterns change as a result of these architectural changes. —Ethan S. Bernstein and Stephen Turban

TCP congestion control algorithms have continued to evolve for more than 30 years. Much of their success is rooted in the fact that they are loss-based, whereby they use packet loss as the congestion signal. For example, Linux’s default TCP algorithm, Cubic, reduces its congestion window by 30% when encountering packet loss. —Yi Cao

Many of those who work in the corporate world are constantly peppered with questions about their “career progression.” The Internet is saturated with articles providing tips and tricks on how to develop a never-fail game plan for professional development. —Casey Chalk

Anyone searching for a primer on how to spot clever phishing links need look no further than those targeting customers of Apple, whose brand by many measures remains among the most-targeted. Past stories here have examined how scammers working with organized gangs try to phish iCloud credentials from Apple customers who have a mobile device that is lost or stolen. —Krebs on Security

The bad news is that the ecosystem of the underlying ad tech industry has not changed and still does not respect user privacy. A new report, called Out of Control: How Consumers Are Exploited by the Online Advertising Industry, published today by the Norwegian Consumer Council (NCC), looks at the hidden side of the data economy and its findings are alarming. —Christoph Schmon

Workers who did not show potential employers their pay history had double-digit jumps in their wages and were able to bargain better wages than workers who revealed their past pay, according to a study circulated Monday by the National Bureau of Economic Research. —Andrew Keshner

Network engineers do not need to become full-time coders to succeed—but some coding skills are really useful. In this episode of the Hedge, David Barrosso (you can find David’s github repositories here), Phill Simonds, and Russ White discuss which programming skills are useful for network engineers.


Raise your hand if you think moving to platform as a service or infrastructure as a service is all about saving money. Raise it if you think moving to “the cloud” is all about increasing business agility and flexibility.

Put your hand down. You’re wrong.

Before going any further, let me clarify things a bit. You’ll notice I did not say software as a service above—for good reason. Move email to the cloud? Why not? Word processing? Sure, word processing is (relatively) a commodity service (though I’m always amazed at the number of people who say “word processor x stinks,” opting to learn complex command sets to “solve the problem,” without first consulting a user manual to see if they can customize “word processor x” to meet their needs).

What about supporting business-specific, or business-critical, applications? You know, the ones you’ve hired in-house developers to create and curate?

Will you save money by moving these applications to a platform as a service? There is, of course, some efficiency to be gained. It is cheaper for a large-scale manufacturer of potato chips to make a bag of chips than for you to cook them in your own home. They have access to specialized slicers, fryers, chemists, and even special potatoes (with more starch than the ones you can buy in a grocery store). Does this necessarily mean that buying potato chips in a bag is always cheaper? In other words, does the manufacturer pass all these savings on to you, the consumer? To ask the question is to know the answer.

And once you’ve turned making all your potato chips over to the professionals, getting rid of the equipment needed to make them, and letting the skill of making good potato chips atrophy, what is going to happen to the price? Yep, thought so.

This is not to say cost is not a factor. Rather, the cost of supporting customized applications on the cloud or local infrastructure needs to be evaluated on a case-by-case basis—either might be cheaper than the other, and the cost of both will change over time.
Does using the cloud afford you more business flexibility? Sometimes, yes. And sometimes, no. Again, the flexibility benefit normally comes from “business agnostic” kinds of flexibility. The kind of flexibility you need to run your business efficiently may, or may not, be the same as the majority of other business. Moving your business to another cloud provider is not always as simple as it initially seems.

The cost and flexibility benefit come from relatively customer-agnostic parts of the business models. To that extent, you rely more on them than they rely on you. Yes, you can vote with your feet if the mickey is taken, but if we’re honest, this kind of supply is almost as inelastic as your old IT service deal. There are few realistic options for supply at scale, and the act of reversing out of a big contract, selecting a new supplier, and making the operational switch can bleed any foreseeable benefits out of a change—something all parties in the procurement process know too well.

So… saving money is sometimes a real reason to outsource things. In some situations, flexibility or agility is going to be a factor. But… there is a third factor I have not mentioned yet—probably the most important, but almost never discussed. Risk aversion.

Let’s be honest. For the last twenty years we network engineers have specialized in building extremely complex systems and formulating the excuses required when things don’t go right. We’ve specialized in saying “yes” to every requirement (or even wish) because we think that by saying “yes” we will become indispensable. Rather than building platforms on which the business can operate, we’ve built artisanal, complex, pets that must be handled carefully lest they turn into beasts that devour time and money. You know, like the person who tries to replicate store-bought chips by purchasing expensive fryers and potatoes, and ends up just making a mess out of the kitchen?

If you want to fully understand your infrastructure, and the real risk of complexity, you need to ask about risk, money, and flexibility—all three. When designing a network, or modifying things to deploy a new service onto an existing network, you need to think about risk as well as cost and flexibility.

How do you manage risk? Sarah Clarke, in the article I quoted above, gives us a few places to start (which I’ve modified to fit the network engineering world). First, ask the question about risk. Don’t just ask “how much money is this going to cost or save,” ask “what risk is being averted or managed here?” You can’t ever think the problem through if you don’t ever ask the question. Second, ask about how you are going to assess the solution against risk, money, and flexibility. How will you know if moving in a particular direction worked? Third, build out clear demarcation points. This is both about the modules within the system as well as responsibilities.

Finally, have an escalation plan. Know what you are going to do when things go wrong, and when you are going to do it. Think about how you can back out of a situation entirely. What are the alternatives? What does it take to get there? You can’t really “unmake” decisions, but you can come to a point where you realize you need to make a different decision. Know what that point is, and at least have the information on hand to know what decision you should make when you get there.

But first, ask the question. Risk aversion drives many more decisions than you might think.

Maintaining a bespoke codebase, training new team members on it, handling operational issues, and adding features is expensive. For many (most?) teams, the cost of rolling your own deployment orchestration system, DSL, or Javascript framework will grow less and less acceptable over time. Save that overhead for the most important things, things that define and differentiate your business. —Forrest Brazeal

The 5G story is everywhere in the American press these days, and not just the American press. You can barely turn around to scratch some needy body part without encountering another article about the wireless telecommunications technology. But the stovepiping in this coverage—the narrowing of the questions asked or answered—is acute. —Adam Garfinkle

First observed in 2009, Slow Drip attacks hit the world stage in a dramatic fashion in early-2014, wreaking havoc on the important middle-level infrastructure of the DNS, particularly on ISPs. Japanese service provider QTNet described the disruption not just of caching resolvers, but of load balancers too. —Renée Burton

A system is more than its central processor, and perhaps at no time in history has this ever been true than right now. Except, perhaps, in the future spanning out beyond the next decade until CMOS technologies finally reach their limits. Looking ahead, all computing will be hybrid, using a mix of CPUs, GPUs, FPGAs, and other forms of ASICs that run or accelerate certain functions in applications. —Timothy Prickett Morgan

Late last year saw the re-emergence of a nasty phishing tactic that allows the attacker to gain full access to a user’s data stored in the cloud without actually stealing the account password. The phishing lure starts with a link that leads to the real login page for a cloud email and/or file storage service. Anyone who takes the bait will inadvertently forward a digital token to the attackers that gives them indefinite access to the victim’s email, files and contacts — even after the victim has changed their password. —Brian Krebs

But what if, instead of focusing on Big Tech’s sins of commission, we paid equal attention to its sins of omission—the failures, the busts, the promises unfulfilled? The past year has offered several lurid examples. WeWork, the office-sharing company that claimed it would reinvent the workplace, imploded on the brink of a public offering. —Derek Thomspon

In the past half decade, a tremendous amount of effort has been put into securing Internet communications. TLS has evolved to version 1.3 and various parts of the Web platform have been conditioned to require a secure context. Let’s Encrypt was established to lower the barrier to getting a certificate, and work continues to make secure communication easy to deploy, easy to use, and eventually the only option. —Mark Nottingham

There has never been a more critical time when experienced infosec professionals are needed. From targeted intrusions, ransomware outbreaks, and relentless cyber-crime attacks, every industry is racing to build infosec muscle. It is said that it takes 10,000 hours to make an expert. —John Lambert

When acquiring big-ticket cybersecurity solutions, especially those that have hardware attached, buyers must remember that these solutions require a lot of coordination and advanced skills to utilize them correctly. Deploying a sophisticated cybersecurity solution doesn’t take place in a matter of days. You must build out advanced use cases, baseline the technology in your environment, then update and configure it to the risks your business is most likely to face. It’s a process that takes several weeks or even months. —Chris Schueler

Unfortunately, email is unprepared for today’s threats, because it was designed nearly 40 years ago when its eventual global reach and security challenges were unimaginable. Decades of work by the email industry has largely contained spam, but phishing and email-based malware remain enormous threats, with email involved in over 90% of all cyberattacks, according to various estimates. —Seth Blank

Roughly speaking it’s due to an observation which I’m going to call Dawson’s first law of computing: O(n^2) is the sweet spot of badly scaling algorithms: fast enough to make it into production, but slow enough to make things fall down once it gets there. —Bruce Dawson