Mean Time to Innocence is not Enough

A long time ago, I supported a wind speed detection system consisting of an impeller, a small electric generator, a 12 gauge cable running a few miles, and a voltmeter. The entire thing was calibrated through a resistive bridge–attach an electric motor to the generator, run it at a series of fixed speed, and adjust the resistive bridge until the voltmeter, marked in knots of wind speed, read correctly.

The primary problem in this system was the several miles of 12 gauge cable. It was often damaged, requiring us to dig the cable up (shovel ready jobs!), strip the cable back, splice the correct pairs together, seal it all in a plastic container filled with goo, and bury it all again. There was one instance, however, when we could not get the wind speed system adjusted correctly, no matter how we tried to tune the resistive bridge. We pulled things apart and determined there must be a problem in one of the (many) splices in the several miles of cable.

At first, we ran a Time Domain Reflectometer (TDR) across the cable to see if we could find the problem. The TDR turned up a couple of hot spots, so we dug those points up … and found there were no splices there. Hmmm … So we called in a specialized cable team. They ran the same TDR tests, dug up the same places, and then did some further testing and found … the cable was innocent.

This set up an argument, running all the way to the base commander level, between our team and the cable team. Who’s fault was this mess? Our inability to measure the wind speed at one end of the runway was impacting flight operations, so this had to be fixed. But rather than fixing the problem, we were spending our time arguing about who’s fault the problem was, and who should fix it.

When I read this line in a recent CAIDA research paper–

“Measurement is political, and often adversarial.”

It rang very true. In Internet terms, speed, congestion, and even usage are often political and adversarial. Just like the wind speed system, two teams were measuring the same thing to prove the problem wasn’t their’s–rather than to figure out what the problem is and how to fix it.

In other words, our goal is too often Mean Time to Innocence (MTTI), rather than Mean Time to Repair (MTTR).

MTTI is not enough. We need to work with our application counterparts to find and fix problems, rather than against them. Measurement should not be adversarial, it should be cooperative.

We need to learn to fix the problem, not the blame.

This is a cultural issue, but it also impacts the way we do telemetry. For instance, in the case of the wind speed indicator, the problem was ultimately a connection that “worked,” but with high capacive reactance such that some kinds of signals were attenuated while others were not. None of us were testing the cable using the right kind of signal, so we all just sat around arguing about who’s problem it was rather than solving the problem.

When a user brings a problem to you, resist the urge to go prove yourself–or your system–innocent. Even if your system isn’t the problem, your system can provide information that can help solve the problem. Treat problems as opportunities to help rather than as opportunies to swish your superhero cape and prove your expertise.

Hedge 153: Security Perceptions and Multicloud Roundtable

Tom, Eyvonne, and Russ hang out at the hedge on this episode. The topics of discussion include our perception of security—does the way IT professionals treat security and privacy helpful for those who aren’t involved in the IT world? Do we discourage users from taking security seriously by making it so complex and hard to use? Our second topic is whether multicloud is being oversold for the average network operator.

download

On Building a Personal Brand

How do you balance loyalty to yourself and loyalty to the company you work for?

This might seem like an odd question, but it’s an important component of work/life balance many of us just don’t think about any longer because, as Pete Davis says in Dedicated, we live in a world of infinite browsing. We’re afraid of sticking to one thing because it might reduce our future options. If we dedicate ourselves to something bigger than ourselves, then we might lose control of our direction. In particular, we should not dedicate ourselves to any single company, especially for too long. As a recent (excellent!) blog post over at the ACM says:

Loyalty is generally a good trait, but extreme loyalty to the organization or mission may cause you to stay in the same job for too long.

The idea that we should control our own destiny, never getting lost in anything larger than ourselves, is ubitiquos like water is to a fish. We don’t question it. We don’t argue. It is just true. We assume there are three people who are going to look after “me:” me, myself, and I.

I get it. Honestly, I do. I’ve been there more times than I want to think about. I was the scapegoat in an argument between people far above my pay grade early in my career, causing much angst and pain. I’ve been laid off,—I cared about a company that simply didn’t care about me. Most recently, the family I’d dedicated more than twenty years of my life to ended through a divorce.

I can see why you might ask yourself hard questions about dedicating yourself to anything or anyone.

The problem, as Pete Davis points out, is that the human person was not designed for the kind of digital nomad life represented by the phrase “live for yourself.” We can try to substitute an online community. We can try to replace community with a string of novel experiences. But the truth is it will eventually catch up with you. When you’re young it’s hard to see how it will ever catch up with you, but it will.

Returning to the top—the author of the ACM article advises balancing between dedicating yourself to a company and dedicating yourself to your career. This is wise advice, but it leaves me wondering “how?” Let me lay out some thoughts here. They may not be all of the answer, but they will, I hope, point in the right direction.

First, resist seeing these two choices as orthogonal. They might be at odds in some companies—there are publishers who want your content to build their brand, and they specifically work at preventing you from building your brand. There are companies that explicitly want to own “your whole professional life.” They don’t want you blogging, going to conferences to speak, etc. Avoid these companies.

Instead, find companies that understand your personal brand is an asset to the company. Having a lot of people with strong personal brands in a company makes the company stronger, not weaker. People with strong brands will form communities around themselves. This community is a pool of people from which to recruit top-flight talent. This community allows them to collect new ideas that can be directly applied to problems in the organization. People with strong personal brands will have greater influence when they walk into a room to meet with a customer, a supplier, or just about anyone else. A company full of people with strong personal brands is stronger than one where everyone is faceless, consumed by/hiding behind the company logo.

Second, learn to manage your time effectively. I understand it’s possible to spend so much time building your brand that you don’t get your job done. As an individual, you need to be sensitive to this and learn how to manage your time effectively.

Third, seek out the win/win. Don’t think of every situation through the lens of “it’s either my brand or my employer’s.” There may be times when you cannot do something because, while it would help your brand immensely, it would harm your company’s. There may be times where you need to have a delicate discussion with your manager because you’ve been asked to do something that would be great for the company but would harm your brand. There is almost always a win/win, you just have to find it.

Fourth, seek out a community that’s not attached to work and dedicate yourself to it. Find something larger than yourself. A community that’s not tied to work will be your lifeline when things go wrong.

Finally, expect to get hurt. I know I have (an old saying in my community—never trust a man who doesn’t walk with a limp). You can be the nicest, humblest person in the world. Someone is still going to take advantage of you. In fact, the nicer and humbler you are—the more you care, the more likely it is people are going to take advantage of you. I am amazed at how much people seem to enjoy hurting one another when they believe there won’t be any consequences.

But … if you expect your life to be perfect, you were born in the wrong world. Build up the mental reserves to deal with this. Build a community that will help carry you through. There is nothing better than sitting down and sharing your hurt over a cup of coffee with a good friend (except I don’t drink coffee).

I get it—the world has moved into a YOLO/FOMO phase. If you don’t “grab it,” and right now! you risk missing something really important. We pile up alternative possibilities in our minds, wondering what might have happened if we’d chosen otherwise. We have deep angst over our personal brand, overthinking the concept to the point of diminishing returns.

The solution, though, is not to draw into yourself, to become self-centered. The solution is to find the balance, seek the win/win, dedicate yourself to something bigger than yourself, and find the right way to build your personal brand.

Quality is (too often) the missing ingredient

Software Eats the World?

I’m told software is going to eat the world very soon now. Everything already is, or will be, software based. To some folks, this sounds completely wonderful, but—leaving aside the privacy issues—I still see an elephant in the room with this vision of the future.

Quality.

Let me give you some recent examples.

First, ceiling fans. Modern ceiling fans, in case you didn’t know, don’t rely on the wall switch and pull chains. Instead, they rely on remote controls. This is brilliant—you can dim the light, change the speed of the fan, etc., from a remote control. No unsightly chains hanging from the ceiling.

Well, it’s brilliant so long as it works. I’ve replaced three of the four ceiling fans in my house. Two of the remote controls have somehow attached themselves to two of the three fans. It’s impossible to control one of the fans without also controlling the other. They sometimes get into this entertaining mode where turning one fan off turns the other one on.

For the third one—the one hanging from a 13-foot ceiling—the remote control sometimes operates one of the other fans, and sometimes the fan its supposed to operate. Most of the time it doesn’t seem to do much of anything.

The fan manufacturer—a large, well-known company—mentions this situation in their instructions and points to a FAQ that doesn’t exist. Searching around online I found instructions for solving this problem that involve unwiring the fans and repeating a set of steps 12 times for each fan to correct the situation. These instructions, needless to say, don’t work.

There is no way to reset the remote, nor the connection between the remote and the fan. There is no way to manually select some dip switch so the remote has a specific fan it talks to. Just some mystical software that’s supposed to work (but doesn’t) and no real instructions on how to resolve the problem. The result will be a multi-hour wait on a customer support line, spending hours of my time to sort the problem out, and the joy of climbing (tall) ladders to unwire and wire ceiling fans in four different rooms.

Thinking through possible problems and building software interfaces that take those situations into account … might be a bit more important than we think they are if software is really going to eat the world.

Second, the retailer’s web site—a large retailer with thousands of physical stores across the United States. Twice I’ve ordered from this site, asking to have the item held in the local store so I can pick it up. The site won’t let you order the item for store pickup unless they have it in stock.

The first time they called me to say they couldn’t find the item I ordered, but they found a “newer model” that was a lot less expensive. It was a lot less expensive because it wasn’t the same item. They never did find the item I originally ordered.

The second time they called me to say they couldn’t find the item I ordered. I asked if they could just ship the item to my house when it’s back in stock. “I’m sorry, our system doesn’t allow us to do that …” Several hours later, they called back to tell me they found it, but they cannot reinstate my order—I must place a new order.

Again, software quality strikes … what should be a simple process just isn’t. There will always be mismatches between the state in software and the state in the real world—but design the system so it’s possible to adapt when this happens, rather than shutting down the process and starting over.

Third, I own a car that has all the “bells and whistles,” including an adaptive cruise control system. There are certain situations, however, where this adaptive control does the wrong thing, producing potentially dangerous results. There is no way to set the car to use the non-adaptive cruise control permanently (I called and waited on the phone for several hours to discover this). You can set the non-adaptive cruise control on a per-use basis by going through set of menus to change the settings … while driving.

Software quality anyone?

Software eats the world might be someone’s ultimate dream—but I suspect that software quality will always be the fly in the ointment. People are not perfect (even in crowds); software is created by people; hence software will always suffer from quality problems.

Maybe a little humility about our ability to make things as complex as we might like because “we can always have software do that bit” would be a good thing—even in the networking world.

The Grass is Always Greener

This last week I was talking to someone at a small startup that intends to eliminate all the complex routing from campus networks. In the past, when reading blog posts about Kubernetes, I’ve read about how it was designed to eliminate routing protocols because “routing protocols are so complex.”

Color me skeptical.

There are two reasons for complexity in a design. The first is you’re solving a hard problem. The second is you’ve made bad design choices in the past, and you’re pasting complexity on top to solve some perceived problem (whether perceived or real).

The problem with all this talk about building something that’s “less complex” is people tend to see complexity of the first kind and think, “we can get rid of that complexity if we start over.” Failing to understand the past before building the future is a recipe for repeated failures of the same kind. Building a network without a distributed routing protocol hasn’t been tried before either, right? Well, yes, it has … We either forget how it turned out, or we say “well, that’s not the same thing I’m talking about here” (just like “real socialism hasn’t ever been tried”).

Even worse, they think they get rid of second and third kinds of complexity by starting over, or getting the humans out of the decision-making loop, or focusing on the data. Our modern penchant for relying “the data,” without ever thinking about the source of the data or how the data has been shaped and interpreted, is truly breathtaking.

They look over the horizon, see an unspoiled field, and think “the grass really is greener on the other side.”

Get rid of all those complex dynamic routing protocols … get rid of all those humans making decisions, so the decisions are “data driven” … and everything will be so much better.

Adding complexity to solve hard real-world problems is just the way things are, and they will always be, so the first reason for complexity will always be with us. People make mistakes, don’t see into the future perfectly, or just don’t have a perfect understanding of the system (technical debt), so the second kind of complexity will always be with us. You can’t “fix” people—God save us from those who think they can. The grass isn’t always greener—it just always looks that way.

What’s the practical upshot? Networks are always going to be complex. It’s just the nature of the problem being solved.

We add complexity because we fail to ask the right questions, we don’t understand the system, or we fail to do good design. The solution isn’t to seek out a greener field “out there,” but rather to make the field we currently live in greener by asking the right questions and reducing complexity through good design. Sometimes you might even need to start over with a new network … but when you start thinking about starting over with a newly designed set of protocols because the old ones are “too complex,” you need to ask how those old ones got that way, and how you’re going to stop the new ones from getting to the same place.

The grass is always greener because you looking at it through green-colored lenses just as the new grass is in its full flush, and before the weeds have had a chance to take over.

Learn how old things worked before you fall for some new “modern wonder” that’s going to solve every problem. The complexity in old things will show you where you can expect to find complexity grow up in new things.

Illusory Correlation and Security

Fear sells. Fear of missing out, fear of being an imposter, fear of crime, fear of injury, fear of sickness … we can all think of times when people we know (or worse, a people in the throes of madness of crowds) have made really bad decisions because they were afraid of something. Bruce Schneier has documented this a number of times. For instance: “it’s smart politics to exaggerate terrorist threats”  and “fear makes people deferential, docile, and distrustful, and both politicians and marketers have learned to take advantage of this.” Here is a paper comparing the risk of death in a bathtub to death because of a terrorist attack—bathtubs win.

But while fear sells, the desire to appear unafraid also sells—and it conditions people’s behavior much more than we might think. For instance, we often say of surveillance “if you have done nothing wrong, you have nothing to hide”—a bit of meaningless bravado. What does this latter attitude—“I don’t have anything to worry about”—cause in terms of security?

Several attempts at researching this phenomenon have come to the same conclusion: average users will often intentionally not use things they see someone they perceive as paranoid using. According to this body of research, people will not use password managers because using one is perceived as being paranoid in some way. Theoretically, this effect is caused by illusory correlation, where people associate an action with a kind of person (only bad/scared people would want to carry a weapon). Since we don’t want to be the kind of person we associate with that action, we avoid the action—even though it might make sense.

This is just the flip side of fear sells, of course. Just like we overestimate the possibility of a terrorist attack impacting our lives in a direct, personal way, we also underestimate the possibility of more mundane things, like drowning in a tub, because we either think can control it, or because we don’t think we’ll be targeted in that way, or because we want to signal to the world that we “aren’t one of those people.”

Even knowing this is true, however, how can we counter this? How can we convince people to learn to assess risks rationally, rather than emotionally? How can we convince people that the perception of control should not impact your assessment of personal security or safety?

Simplifying design and use of the systems we build would be one—perhaps not-so-obvious—step we can take. The more security is just “automatic,” the more users will become accustomed to deploying security in their everyday lives. Another thing we might be able to do is stop trying to scare people into using these technologies.

In the meantime, just be aware that if you’re an engineer, your use of a technology “as an example” to others can backfire, causing people to not want to use those technologies.

Is it really the best just because its the most common?

I cannot count the number of times I’ve heard someone ask these two questions—

  • What are other people doing?
  • What is the best common practice?

While these questions have always bothered me, I could never really put my finger on why. I ran across a journal article recently that helped me understand a bit better. The root of the problem is this—what does best common mean, and how can following the best common produce a set of actions you can be confident will solve your problem?

Bellman and Oorschot say best common practice can mean this is widely implemented. The thinking seems to run something like this: the crowd’s collective wisdom will probably be better than my thinking… more sets of eyes will make for wiser or better decisions. Anyone who has studied the madness of crowds will immediately recognize the folly of this kind of state. Just because a lot of people agree it’s a good idea to jump off a cliff does not mean it is, in fact, a good idea to jump off a cliff.

Perhaps it means something closer to this is no worse than our competitors. If that’s the meaning, though, it’s a pretty cynical result. It’s saying, “I don’t mind condemning myself to mediocrity so long as I see everyone else doing the same thing.” It doesn’t sound like much of a way to grow a business.

The authors do provide their definition—

For a given desired outcome, a “best practice” is a means intended to achieve that outcome, and that is considered to be at least as “good” as the best of other broadly considered means to achieve that same outcome.

The thinking seems to run something like this—it’s likely that everyone else has tried many different ways of doing this; that they have all settled on doing this, this way, means all those other methods are probably not as good as this one for some reason.

Does this work? There’s no way to tell without further investigation. How many of the other folks doing “this” spent serious time trying alternatives, and how many just decided the cheapest way was the best no matter how poor the result might be? In fact, how can we know what the results of doing things “this way” have in all those other networks? Where would we find this kind of information?

 

In the end, I can’t ever make much sense out of the question, “what is everyone else doing?” Discovering what everyone else is doing might help me eliminate possibilities (that didn’t work for them, so I certainly don’t want to try it), or it might help me understand the positive and negative attributes of a given solution. Still, I don’t understand why “common” should infer “best.”

The best solution for this situation is simply going to be the best solution. Feel free to draw on many sources, but don’t let other people determine what you should be doing.