Mean Time to Innocence is not Enough

A long time ago, I supported a wind speed detection system consisting of an impeller, a small electric generator, a 12 gauge cable running a few miles, and a voltmeter. The entire thing was calibrated through a resistive bridge–attach an electric motor to the generator, run it at a series of fixed speed, and adjust the resistive bridge until the voltmeter, marked in knots of wind speed, read correctly.

The primary problem in this system was the several miles of 12 gauge cable. It was often damaged, requiring us to dig the cable up (shovel ready jobs!), strip the cable back, splice the correct pairs together, seal it all in a plastic container filled with goo, and bury it all again. There was one instance, however, when we could not get the wind speed system adjusted correctly, no matter how we tried to tune the resistive bridge. We pulled things apart and determined there must be a problem in one of the (many) splices in the several miles of cable.

At first, we ran a Time Domain Reflectometer (TDR) across the cable to see if we could find the problem. The TDR turned up a couple of hot spots, so we dug those points up … and found there were no splices there. Hmmm … So we called in a specialized cable team. They ran the same TDR tests, dug up the same places, and then did some further testing and found … the cable was innocent.

This set up an argument, running all the way to the base commander level, between our team and the cable team. Who’s fault was this mess? Our inability to measure the wind speed at one end of the runway was impacting flight operations, so this had to be fixed. But rather than fixing the problem, we were spending our time arguing about who’s fault the problem was, and who should fix it.

When I read this line in a recent CAIDA research paper–

“Measurement is political, and often adversarial.”

It rang very true. In Internet terms, speed, congestion, and even usage are often political and adversarial. Just like the wind speed system, two teams were measuring the same thing to prove the problem wasn’t their’s–rather than to figure out what the problem is and how to fix it.

In other words, our goal is too often Mean Time to Innocence (MTTI), rather than Mean Time to Repair (MTTR).

MTTI is not enough. We need to work with our application counterparts to find and fix problems, rather than against them. Measurement should not be adversarial, it should be cooperative.

We need to learn to fix the problem, not the blame.

This is a cultural issue, but it also impacts the way we do telemetry. For instance, in the case of the wind speed indicator, the problem was ultimately a connection that “worked,” but with high capacive reactance such that some kinds of signals were attenuated while others were not. None of us were testing the cable using the right kind of signal, so we all just sat around arguing about who’s problem it was rather than solving the problem.

When a user brings a problem to you, resist the urge to go prove yourself–or your system–innocent. Even if your system isn’t the problem, your system can provide information that can help solve the problem. Treat problems as opportunities to help rather than as opportunies to swish your superhero cape and prove your expertise.

Posted in CULTURE, SKILLS

1 Comment

Phil on 14 November 2022 at 1:56 pm

This was the spirit of the early TAC at Cisco. Fix the problem, not the blame. Unfortunately, resource constrained organizations will literally punish folks that work beyond the boundaries. The inspired people will depart, leaving only those content to establish MTTI.

You get what you reward.

Related

1 Comment