Whither Cyber-Insurance?

Note: I’m off in the weeds a little this week thinking about cyber-insurance because of a paper that landed in one of my various feeds—while this isn’t something we often think about as network operators, it does impact the overall security of the systems we build.

When you go to the doctor for a yearly checkup, do you think about health or insurance? You probably think about health, but the practice of going to the doctor for regular checkups began because of large life insurance companies in the United States. These companies began using statistical methods to make risk, or to build actuarial tables they could use to set the premiums properly. Originally, life insurance companies relied on the “hunches” of their salesmen, combined with some checking by people in the “back office,” to determine the correct premium. Over time, they developed networks of informers in local communities, such as doctors, lawyers, and even local politicians, who could describe the life of anyone in their area, providing the information the company needed to set premiums correctly.

Over time, however, statistical methods came into play, particularly relying on an initial visit with a doctor. The information these insurance companies gathered, however, gave them insight into what habits increased or decreased longevity—they decided they should use this information to help shape people’s lives so they would live longer, rather than just using it to discover the correct premiums. To gather more information, and to help people live better lives, life insurance companies started encouraging yearly doctor visits, even setting up non-profit organizations to support the doctors who gave these examinations. Thus was born the yearly doctor’s visit, the credit rating agencies, and a host of other things we take for granted in modern life.

You can read about the early history of life insurance and its impact on society in How Our Days Became Numbered.

What does any of this have to do with networks? Only this—we are in much the same position in the cyber-insurance market right now as the life insurance market in the late 1800s through the mid-1900s—insurance agents interview a company and make a “hunch bet” on how much to charge the company for cyber-insurance. Will cyber-insurance ever mature to the same point as life insurance? According to a recent research paper, the answer is “probably not.”  Why not?

First, legal restrictions will not allow a solution such as the one imposed by payment processors. Second, there does not seem to be a lot of leverage in cyber-insurance premiums. The cost of increasing security is generally much higher than any possible premium discount, making it cheaper for companies just to pay the additional premium than to improve their security posture. Third, there is no real evidence tying the use of specific products to reductions in security breaches. Instead, network and data security tend to be tied to practices rather than products, making it harder for an insurer to precisely specify what a company can and should to improve their posture.

Finally, the largest problem is measurement. What does it look like for a company to “go to the doctor” regularly? Does this mean regular penetration tests? Standardizing penetration tests is difficult, and it can be far too easy to counter pentests without improving the overall security posture. Like medical care in the “early days,” there is no way to know you have gathered enough information on the population to know if you correctly understand the kinds of things that improve “health”—but there is no way to compel reporting (much less accurate reporting), nor is there any way to compel insurance companies to share the information they have about cyber incidents.

Will cyber-insurance exist as a “separate thing” in the future? The authors largely answer in the negative. The pressures of “race to the bottom,” providing maximal coverage with minimal costs (which they attribute to the structure of the cyber-insurance market), combined with lack of regulatory clarity and inaccurate measurements, will probably end up causing cyber-insurance to “fold into” other kinds of insurance.

Whether this is a positive or negative result is a matter of conjecture—the legacy of yearly doctor’s visits and public health campaigns is not universally “good,” after all.

Ironies of Automation

Ironies of Automation

In 1983 I was just joining the US Air Force, and still deeply involved in electronics (rather than computers). I had written a few programs in BASIC and assembler on a COCOII with a tape drive, and at least some of the electronics I worked on were used vacuum tube triodes, plate oscillators, and operational amplifiers. This was a magical time, though—a time when “things” were being automated. In fact, one of the reasons I left electronics was because the automation wave left my job “flat.” Instead of looking into the VOR shelter to trace through a signal path using a VOM (remember the safety L!) and oscilloscope, I could sit at a terminal, select a few menu items, grab the right part off the depot shelf, replace, and go home.

Maybe the newer way of doing things was better. On the other hand, maybe not.

What brings all this to mind is a paper from 1983 titled The Ironies of Automation.  It might often seem, because of our arrogant belief that we can remake the world through disruption (was the barbarian disruption of Rome in 455 the good sort of disruption, or the bad sort?), we often think we can learn nothing from the past. Reality check: the past is prelude.

What can the past teach us about automation? This is as good a place to start as any other:

There are two general categories of task left for an operator in an automated system. He may be expected to monitor that the automatic system is operating correctly, and if it is not he may be expected to call a more experienced operator or to take-over himself. We will discuss the ironies of manual take-over first, as the points made also have implications for monitoring. To take over and stabilize the process requires manual control skills, to diagnose the fault as a basis for shut down or recovery requires cognitive skills.

This is the first of the ironies of automation Lisanne Bainbridge discusses—and this is the irony I’d like to explore. The irony she is articulating is this: the less you work on a system, the less likely you are to be able to control that system efficiently. Once a system is automated, however, you will not work on the system on a regular basis, but you will be required to take control of the system when the automated controller fails in some way. Ironically, in situations where the automated controller fails, the amount of control required to make things right again will be greater than in normal operation.

In the case of machine operation, it turns out that the human operator is required to control the machine in just the situations where the least amount of experience is available. This is analogous to the automated warehouse in which automated systems are used to stack and sort material. When the automated systems break down, there is absolutely no way for the humans involved to figure out why things are stacked the way they are, nor how to sort things out to get things running again.

This seems intuitive. When I’m running the mill through manual control, after I’ve been running it for a while (I’m out of practice right now), I can “sense” when I’m feeding too fast, meaning I need to slow down to prevent chatter from ruining the piece, or worse—a crash resulting in broken bits of bit flying all over the place.

How does this apply to network operations? On the one hand, it seems like once we automate all the things we will lose the skills of using the CLI to do needed things very quickly. I always say “I can look that command up,” but if I were back in TAC, troubleshooting a common set of problems every day, I wouldn’t want to spend time looking things up—I’d want to have the right commands memorized to solve the problem quickly so I can move to the next case.

This seems to argue against automation entirely, doesn’t it? Perhaps. Or perhaps it just means we need to look at the knowledge we need (and want) in a little different way (along with the monitoring systems we use to obtain that knowledge).

Humans think quick and slow. We either react based on “muscle memory,” or we must think through a situation, dig up the information we need, and weigh out the right path forward. When you are pulling a piece of stainless through a bit and the head starts to chatter, you don’t want to spend time assessing the situation and deciding what to do—you want to react.

But if you are working on an automated machine, and the bit starts to chatter, you might want to react differently. You might want to stop the process entirely and think through how to adjust the automated sequence to prevent the bit from chattering the next time through. In manual control, each work piece is important because each one is individually built. In the automated sequence, the work piece itself is subsumed within the process.

It isn’t that you know “less” in the automated process, it’s that you know different things. In the manual process, you can feel the steel under the blade, the tension and torque, and rely on your muscle memory to react when its needed. In the automated process, you need to know more about the actual qualities of the bit and metal under the bit, the mount, and the mill itself. You have to have more of an immediate sense of how things work if you are doing it manually, but you have to have more of a sense of the theory behind why things work the way if it is automated.

A couple of thoughts in this area, then. First, when we are automating things, we need to be very careful to assume there is no “fast thinking” when things ultimately do fail (it’s not if, it’s when). We need to think through what information we are collecting, and how that information is being presented (if you read the original paper, the author spends a great deal of time discussing how to present information to the operator to overcome the ironies she illuminates) so we take maximum advantage of the “slow path” in the human brain, and stop relying on the “fast path” so much. Second, as we move towards an automated world, we need to start learning, and teaching, more about why and less about how, so we can prepare the “slow path” to be more effective—because the slow path is the part of our thinking that’s going to get more of a workout.

The Hedge 23: The MOPS Working Group

The IETF works on many things beyond IP and routing—the Media Operations (MOPS) working group is gathering input on media-related operational issues and practices, including “proposed technologies related to the deployment, engineering, and operation of media streaming and manipulation protocols and procedures in the global Internet (inter-domain) and within-domain networking.” Leslie Daigle and Eric Vyncke, the co-chairs of the MOPS working group, join Alvaro Retana and Russ White to discuss the work they are doing.

download

Knowing How Things Work

Simon Weckhert recently hacked Google Maps into guiding drivers around a street through a rather simple mechanism: he placed 95 cellphones, all connected to Google Maps, in a little wagon and walked down the street with the wagon in tow. Maps saw this group of cell phones as a very congested street—95 cars cannot even physically fit into the street he was walking down—and guided other drivers around the area. The idea is novel, and the result rather funny, but it also illustrates a weakness in our “modern scientific mindset” that often bleeds over into network engineering.

The basic problem is this: we assume users will use things the way we intend them to. This never works out in the real world, because users are going to use wrenches as hammers, cell phones as if they were high-end cameras, and many other things in ways they were never intended. To make matters worse, users often “infer” the way something works, and adapt their actions to get what they want based on their inference. For instance, everyone who drives “reverse-engineers” the road in their head, thinking about what the maximum safe speed might be, etc. Social media users do the same thing when posting or reading through their timeline, causing people to create novel and interesting ideas about how these things work that have no bearing on reality.

As folks who work in the world of networks, we often “reverse-engineer” a vendor product in much the same way drivers “reverse-engineer” roads and social media users “reverse-engineer” the news feed—we observe how it works in some circumstances, we read some of the documentation, we infer how it must work based on the information we have, and then we design around how we think it works. Sometimes this is a result of abstraction—the vendor has saved us from learning all the “technical details” to make our lives easier. And sometimes abstraction does make our lives easier—but sometimes abstraction makes our lives harder.

I’m reminded of a time I was working with a cable team to bring a wind speed/direction system back up. The system in question relied on several miles of 12c12 cable across which a low voltage signal was driven off a generator attached to an impeller. The folks working on the cable could “see” power flowing on the meter after their repair, so why wouldn’t it work?

In some cases, then, our belief about how these things work is completely wrong, and we end up designing precisely the wrong thing, or doing precisely the wrong thing to bring a failed network back on-line.

Folks involved in networks face this on the other side of the equation, as well—we supply application developers and business users with a set of abstractions they don’t’ really need to understand. In using them, however, they develop “folk theories” about how a network works, coming to conclusions that are often counter-productive to what they are trying to get done. The person in the airline lounge that tells you to reboot your system to see if the WiFi will work doesn’t really understand what the problem is, they just know “this worked once before, so maybe it will work now.”

There is nothing wrong per se with this kind of “reverse-engineering”—we’re going to encounter it every time we abstract things, and abstracting things is necessary to scale. On the other hand, we’re supposed to be the “engineer in the middle”—the person who knows how to relate to the vendor and the user, bridging the gap between product and service. That’s how we add value.

There are some places, like with vendor-supplied gear, that we are dealing with an abstraction we simply cannot rip the lid off. There are many times when we cannot learn the “innards” because there are 24 hours in a day, you cannot learn all that needs to be learned in the available timeframe, and there are times, as a human, that you need to back off and “do something else.” But… there are times when you really need to know what “lies beneath the abstraction”—how things really work.

I suspect the times when understanding “how it really works” would be helpful are very common—and that we would all live a world with a little less vendor hype during the day, and a lot less panic during the night, if we put a little more priority on learning how networks work.

The Hedge 22: Challenges in Deploying IPv6 in the Enterprise

Most transit providers, content providers, and IX’s have deployed IPv6—but many enterprise network operators have not. Ed Horley joins us at the Hedge for a wide-ranging conversation on the challenges of deploying IPv6 in enterprise networks, IPv6 penetration, and other intersecting topics. Ed cohosts the IPv6 Buzz podcast at Packet Pushers, blogs at howfunky.net, and writes at the IPv6 Center of Excellence. You can also find Ed on Twitter and LinkedIn.

download

Letting go of Clean Design

What is the best way to build a large-scale network—in two words? Ask ten networking folks (engineers, designers, or whatever else), and you’re likely to get the same answer from at least nine: clean abstractions. They might not say the word abstraction, of course; instead, they might say words like build things in modules, using summarization and aggregation to divide the modules up. Or they might say make certain to reduce the failure domain to the smallest you possible can everywhere you can. Or they might say use hierarchical design. These answers are, however, variants of the single word: abstraction.

This response came to mind when I was reading an article on clean code this last week (it’s amazing how often software architecture overlaps with network architecture):

Once we learn how to create abstractions, it is tempting to get high on that ability, and pull abstractions out of thin air whenever we see repetitive code. After a few years of coding, we see repetition everywhere — and abstracting is our new superpower. If someone tells us that abstraction is a virtue, we’ll eat it. And we’ll start judging other people for not worshipping “cleanliness”.

I have been teaching network design for many, many years. I co-authored my first book on network design, Advanced IP Network Design, with Don Slice and Alvaro Retana; it was published in 1999, and it typically takes about a year to write a book, so we probably started working on it in the middle of 1998. The entire object that book was to teach hierarchical network design, which relies on modularization through aggregation and summarization to separate complexity from complexity (though I didn’t really use this wording until many years later) in order to break up failure domains.

It has been twenty-two years since Don, Alvaro, and I wrote that book—and hierarchical network design is still as relevant today as it was then. But in the last 22 years, I think I’ve learned just a little more about network design.

Among the things I’ve picked up in that 22 years is this one: if you haven’t found the tradeoffs, you haven’t looked hard enough. Or perhaps there is no such thing as a free lunch. Abstraction is a superpower, and it can make your network a lot cleaner, even when you’re using it correctly (not using it to paper over complexity). But building the perfectly clean network can mean reducing the agility of the design to the point of fragility. For instance, in the article linked above, Dan Abramov notes changing requirements made his “clean revision” of the code much more complex—a classic sign of fragility.

Perhaps an example would be helpful here. If you think of RIP as a link state protocol with summarization (abstraction of topology) at every hop, given you understand how link state and distance-vector protocols work, you can probably quickly grasp what you have gained by summarizing at every hop—and what you have lost.

You should still use abstraction to break up failure domains. You should still use abstraction to separate complexity from complexity. But you should use abstraction like you would any other tool. You should decide the best places and times to use abstraction after understanding the whole system.

For instance—a lot of people really insist on aggregating routing information in their data center fabric, especially in the underlay control plane. Why? The underlay is a constrained routing domain with known properties. Aggregation in this environment can cause routing black holes and unpredictable traffic flow behavior—both of which require added complexity to “work around.” If there is another solution available, it might be best to use it.

At the same time, I see a lot of people insisting BGP is the only option for data center underlays, or that it is the simplest option because you can use a single protocol for the underlay and overlay. This, in my opinion, is wrong, as well—because it does not properly separate two different parts of the network, each with their own purpose, into separate failure domains.

Rather than looking at a network and saying, “we can abstract here, so we should abstract here,” you should look at a network and say, “what are the modules here, and what purposes do they serve?” Once you know that, you can start thinking about when and were abstraction makes sense.

To paraphrase Dan, don’t be a clean network design zealot. Clean network design is not a goal. It’s a good guide when you don’t understand the network; such guides are often useful, but they are guides rather than rules.