Slicing and Dicing Flooding Domains (1)

This week two different folks have asked me about when and where I would split up a flooding domain (IS-IS) or area (OSPF); I figured a question asked twice in one week is worth a blog post, so here we are…

Before I start on the technical reasons, I’m going to say something that might surprise long time readers: there is rarely any technical reason to split a single flooding domain into multiple flooding domains. That said, I’ll go through the technical reasons anyway.

There are really three things to think about when considering how a flooding domain is performing:

  • SPF run time
  • flooding frequency
  • LSDB size

Let’s look at the third issue first, the database size. This is theoretically an issue, but it’s really only an issue if you have a lot of nodes and routes. I can’t ever recall bumping up against this problem, but what if I did? I’d start by taking the transit links out of the database entirely—for instance, by configuring all the interfaces that face actual host devices as passive interfaces (which you should be doing anyway!), and configuring IS-IS to advertise just the passive interfaces. You can pull similar tricks in OSPF. Another trick here is to make certain point-to-point Ethernet links aren’t electing a DIS or DR; this just clogs the database up with meaningless information.

The second issue, the flooding frequency, is more interesting. Before I split a flooding domain because there is “too much flooding,” I would want to look at several things to make certain I’m not doing a lot of work for nothing. Specifically, I would want to look at:

  • Why am I getting all these LSAs/LSPs? A lot of flooding means a lot of changes, which generally means instability someplace or another. I would either want to be able to justify the instability or stop it, rather than splitting a flooding domain to react to it. Techniques I would look at here include interface dampening (if it’s available) and roping off a flapping network behind a nailed up redistributed route of some sort.
  • If the rate of flooding can only be controlled to some degree, or it’s valid, then I would want to look at how I can configure the network to control the flooding in a way that makes sense. Specifically, I’m going to look at using exponential backoff to manage bursts of flooding events while keeping my convergence time down as much as I can, and I’m going to consider my LSP generation intervals to make certain I account for bursts of changes on a single intermediate system. This is where we get into tradeoffs, however—at some point you need to ask if tuning the timers is easier/simpler than breaking the flooding domain into two flooding domains, particularly if you can isolate the bursty parts of the network from the more stable parts.

There are probably few networks in the world where tuning flooding will not hold the rate of flooding down to a reasonable level.

Continued next week…