The Overoptimization Meltdown
In simple terms Meltdown and Spectre are simple vulnerabilities to understand. Imagine a gang of thieves waiting for a stage coach carrying a month’s worth of payroll.
[time-span]
There are two roads the coach could take, and a fork, or a branch, where the driver decides which one to take. The driver could take either one. What is the solution? Station robbers along both sides of the branch, and wait to see which one the driver chooses. When you know, pull the resources from one branch to the other, so you can effectively rob the stage. This is much the same as a modern processor handling a branch—the user could have put anything into some field, or retrieved anything from a database, that might cause the software to run one of two sets of instructions. There is no way for the processor to know, so it runs both of them.
To run both sets of instructions, the processor will pull in the contents of specific memory locations, and begin executing code across these memory locations. Some of these memory locations might not be pieces of memory the currently running software is supposed to be able to access, but this is not checked until the branch is chosen. Hence a piece of software can force the processor to load memory it should not have access to by calling the right instructions in a speculative branch, exposing those bits of memory to be read by the software.
But my point here is not to consider the problem itself. What is more interesting is the thinking that leads to this kind of software defect being placed into the code. There are, in all designs, tradeoffs. For instance, in the real (physical) world, there is the tradeoff between fast, cheap, and quality. In the database world, there is the tradeoff among consistency, accessability, and partitionability. I have, for many years, maintained that in network design there is a tradeoff between state, optimization, and surfaces.
What meltdown and spectre respresent is the unintended consequence of a strong drive towards enhancing performance. It’s not that the engineers who designed speculative execution, and put it into silicon, are dumb. In fact, they are brilliant engineers who have helped drive the art of computing ever faster forward in ways probably unimaginable even twenty years ago. There are known tradeoffs when using speculative execution, such as:
- Power—some code is going to be run, and the contents of some memory fetched, that will not be used. Fetching these memory locations, and running this code, is not free; there is some amount of power used, and heat generated, in speculative execution. This was actually a point of discussion early in the life of speculative execution, but the performance gains were so solid that the power and heat concerns were eventually set aside.
- Real Estate—speculative execution requires physical real estate in the processor. It makes processors larger, and uses silicon gates that could be used for something else. Overall, the most performance enhancing use of the available real estate was shown to be the most economically useful, and thus speculative execution became an important part of chip design.
- State—speculative execution drives the amount of state, and the speed at which that state is changing, much higher than it would otherwise be. Again, the performance gains were strong enough to make the added state worth the effort.
There was one more tradeoff, we now know, that was not considered during the initial days and years when speculative execution was being discussed—security.
So maybe it is time to take stock, and think about lessons learned. First, it is always the unexpected consequence that will come back to bite you in the end. Second, there is almost always an unexpected consequence. The value of experience is in being bitten by unexpected consequences enough times to learn to know what to look for in the future.
Well, in theory, anyway.
Finally, if you haven’t found the tradeoffs, you haven’t looked hard enough. Any time you think you have come up with a way to do things that will outperform any other way, you need to find all the tradeoffs. Don’t just find one tradeoff, and say, “see, I have that covered.”
A single minded focus on performance, at the cost of all else, will normally cost you more than you think, in the end. Overoptimization can sometimes cause meltdowns. And spectres.
It’s a lesson well worth learning.
“I have, for many years, maintained that in network design there is a tradeoff between state, optimization, and surfaces.” Can you elaborate on this?. Maybe you already wrote about it previously (I’m knew to your blog). Thanks in advance.