A while back I posted on section 10 routing loops; Daniel responded to the post with this comment:
I am curious how these things are discovered. You said that this is a contrived example, but I assume researchers have some sort of methodology to discover issues like this. I am sure some things have been found through operational mishap, but is there some “standardized” way of testing graph logic for the possibility of loops? I trust this is much easier to do today than even a decade ago.
You would think there would be some organized way to discover these kinds of routing loops, something every researcher and/or protocol designer might follow. The reality is far different—there is no systematic way that I know of to find this sort of problem. What happens, in real life, is that people with a lot of experience at the intersection of protocol design, the bounds of different ways of finding loop free paths (solving the loop free path problem), and a lot of experience in deploying and operating a network using these protocols, will figure these things out because they know enough about the solution space to look for them in the first place.
I don’t know who actually discovered this problem; it is “just” a comment in the RFC, and these kinds of comments are not normally attributed. It might have even been something that developed on a mailing list, or in private conversation between folks sitting at a table drawing diagrams on a napkin. But I would bet it was the normal sort of process—one of two ways:
- Someone thinks: “given the way this works, there should be a loop in there…” They sit down with someone else, and think through how it could happen. Then they go find examples of it in the real world, by talking to folks who have seen the loop but could not figure out how it happened.
- Someone sees a loop, and thinks: “now why did that happen??” They talk to some other folks who know the protocol, sketch the problem out on a napkin, and they work together to figure it out.
There are three key points here. The first is the importance of knowing not only how to configure the protocol, but how the protocol really works. The second is not only knowing how the protocol works, but enough of the theory behind why it works to be able to relate the theory to the reality you are seeing in the network. The third is having someone to talk to with the same sort of understanding, who can hash out what you are seeing, and why.
In other words: operational experience, theoretical understanding, and community.
If these three sound familiar—they should.