Russ’ Rules of Network Design

We have the twelve truths of networking, and possibly Akin’s Laws, but is there a set of rules for network design? I couldn’t find one, so I decided to create one, containing 18 laws I’ve listed below.

Russ’ Rules of Network Design

  1. If you haven’t found the tradeoffs, you haven’t looked hard enough.
  2. Design is an iterative process. You probably need one more iteration than you’ve done to get it right.
  3. A design isn’t finished when everything needed is added, it’s finished when everything possible is taken away.
  4. Good design isn’t making it work, it’s making it fail gracefully.
  5. Effective, elegant, efficient. All other orders are incorrect.
  6. Don’t fix blame; fix problems.
  7. Local and global optimization are mutually exclusive.
  8. Reducing state always reduces optimization someplace.
  9. Reducing state always creates interaction surfaces; shallow and narrow interaction surfaces are better than deep and broad ones.
  10. The easiest place to improve or screw up a design is at the interaction surfaces.
  11. The optimum is almost always in the middle someplace; eschew extremes.
  12. Sometimes its just better to start over.
  13. There are a handful of right solutions; there is an infinite array of wrong ones.
  14. You are not immensely smarter than anyone else in networking.
  15. A bad design with a good presentation is doomed eventually; a good design with a bad presentation is doomed immediately.
  16. You can only know your part of the system and a little bit about the parts around your part. The rest is rumor and pop psychology.
  17. To most questions the correct initial answer should be “how many balloons fit in a bag?”
  18. Virtual environments still have hard physical limits.

You can find a handy printable version here.

Hedge 99

Two things have been top of mind for those who watch the ‘net and global Internet policy—the increasing number of widespread outages, and the logical and physical centralization of the ‘net. How do these things relate to one another? Alban Kwan joins us to discuss the relationship between centralization and widespread outages. You can read Alban’s article on the topic here.

download

Hedge 098: DRIP with Stuart Card

Drones are becoming—and in many cases have already become—an everyday part of our lives. Drones are used in warfare, delivery services, photography, and recreation. One of the problems facing the world of drones, however, is the strong tie-in between the controller and the drone; this proprietary link limits innovation and reduces the information available to public officials to manage traffic, and even to protect the privacy of drone operators. The DRIP working group is building protocols designed to standardize the drone-to-controller interface, advancing the state of the art in drones and opening up the field for innovation. Stuart Card joins Alvaro Retana and Russ White to discuss DRIP.

download

Marketing Wins

Off-topic post for today …

In the battle between marketing and security, marketing always wins. This topic came to mind after reading an article on using email aliases to control your email—

For example, if you sign up for a lot of email newsletters, consider doing so with an alias. That way, you can quickly filter the incoming messages sent to that alias—these are probably low-priority, so you can have your provider automatically apply specific labels, mark them as read, or delete them immediately.

One of the most basic things you can do to increase your security against phishing attacks is to have two email addresses, one you give to financial institutions and another one you give to “everyone else.” It would be nice to have a third for newsletters and marketing, but this won’t work in the real world. Why?

Because it’s very rare to find a company that will keep two email addresses on file for you, one for “business” and another for “marketing.” To give specific examples—my mortgage company sends me both marketing messages in the form of a “newsletter” as well as information about mortgage activity. They only keep one email address on file, though, so they both go to a single email address.

A second example—even worse in my opinion—is PayPal. Whenever you buy something using PayPal, the vendor gets the email address associated with the account. That’s fine—they need to send me updates on the progress of the item I ordered, etc. But they also use this email address to send me newsletters … and PayPal sends any information about account activity to the same email address.

Because of the way these things are structured, I cannot separate information about my account from newsletters, phishing attacks, etc. Since modern Phishing campaigns are using AI to create the most realistic emails possible, and most folks can’t spot a Phish anyway, you’d think banks and financial companies would want to give their users the largest selection of tools to fight against scams.

But they don’t. Why?

Because—if your financial information is mingled with a marketing newsletter, you’ll open the email to see what’s inside … you’ll pay attention. Why spend money helping your users not pay attention to your marketing materials by separating them from “the important stuff?”

When it comes to marketing versus security, marketing always wins. Somehow, we in IT need to do better than this.

Hedge 97: Low Context DevOps

Language is deeply contextual—one of my favorite sayings from the theological world is if you take the text out of its context, you are just left with the con. What does context have to do with development and operations, though? Can there be low and high context situations in the daily life of building and running systems? Thomas Limoncelli joins Tom Ammon and Russ White to discuss the idea of low context devops, and the larger issue of context in managing projects and teams, on this episode of the Hedge.

download

It always takes longer than you think

Everyone is aware that it always takes longer to find a problem in a network than it should. Moving through the troubleshooting process often feels like swimming in molasses—you’re pulling hard, and progress is being made, but never fast enough or far enough to get the application back up and running before that crucial deadline. The “swimming in molasses effect” doesn’t end when the problem is found out, either—repairing the problem requires juggling a thousand variables, most of which are unknown, combined with the wit and sagacity of a soothsayer to work with vendors, code releases, and unintended consequences.

It’s enough to make a network engineer want to find a mountain top and assume an all-knowing pose—even if they don’t know anything at all.
The problem of taking longer, though, applies in every area of computer networking. It takes too long for the packet to get there, it takes to long for the routing protocol to converge, it takes too long to support a new application or server. It takes so long to create and validate a network design change that the hardware, software and processes created are obsolete before they are used.

Why does it always take too long? A short story often told to me by my Grandfather—a farmer—might help.
One morning a farmer got early in the morning, determined to throw some hay down to the horses in the stable. While getting dressed, he noticed one of the buttons on his shirt was loose. “No time for that now,” he thought, “I’ll deal with it later.” Getting out to the barn, he climbed up the ladder to the loft, and picked up a pitchfork. When he drove the fork into the hay, the handle broke.

He sighed, took the broken pieces down the ladder, and headed over to his shed to replace the handle—but when he got there, he realized he didn’t have a new handle that would fit. Sighing again, he took the broken pieces to his old trusty truck and headed into town—arriving before the hardware store opened. “Well, I’m already here, might as well get some coffee,” he thought, so he headed to the diner. After a bit, he headed to the store to buy a handle—but just as he walked out past the door, the loose button caught on the handle, popping off.

It took a few minutes to search for the lost button, but he found it and headed over to the cleaners to have it sewn back on “real fast.” Well, he couldn’t wander around town in his undershirt, so he just stepped next door to the barber’s, where there were a few friendly games of checkers already in progress. He played a couple of games, then the barber came out to remind him that he needed a haircut (a thing barbers tend to do all the time for some reason), so he decided to have it done. “Might was well not waste the time in town now I’m here,” he thought.

The haircut finished, he went back to get his shirt, and realized it was just about lunch. Back to the diner again. Once he was done, he jumped in his truck and headed back to the farm. And then he realized—the horses were hungry, the hay hadn’t been pitched, and … his pitchfork was broken.

And this is why it always takes longer than it should to get anything done with a network. You take the call and listen to the customer talk about what the application is doing, which takes a half an hour. You then think about what might be wrong, perhaps kicking a few routers “just for good measure” before you start troubleshooting in earnest. You look for a piece of information you need to understand the problem, only to find the telemetry system doesn’t collect that data “yet”—so you either open a ticket (a process that takes a half an hour), or you “fix it now” (which takes several hours). Once you have that information, you form a theory, so you telnet into a network device to check on a few things… only to discover the device you’re looking at has the wrong version of code… This requires a maintenance window to fix, so you put in a request…

Once you even figure out what the problem is, you encounter a series of hurdles lined up in front of you. The code needs to be upgraded, but you have to contact the vendor to make certain the new code supports all the “stuff” you need. The configuration has to be changed, but you have to determine if the change will impact all the other applications on the network. You have to juggle a seemingly infinite number of unintended consequences in a complex maze of software and hardware and people and processes.

And you wonder, the entire time, why you just didn’t learn to code and become a backend developer, or perhaps a mountain-top guru.

So the next time you think it’s taking to long to fix the problem, or design a new addition to the network, or for the vendor to create that perfect bit of code, remember the farmer, and the button that left the horses hungry.

Hedge 96: Mark Nottingham and the Future of Standardization

It often seems like the IETF is losing steam—building standards, particularly as large cloud-scale companies a reducing their participation in standards bodies and deploying whatever works for them. Given these changes, what is the future of standards bodies like the IETF? Mark Nottingham joins Tom Ammon and Russ White in a broad-ranging discussion around this topic.

download