Speculative Inference

How to inference-time compute scale your human engineering designs

A recent approach to improving the reasoning performance of large language models is to not only scale the amount of compute put into training the models, but to increase the amount of useful compute possible at inference time. This new approach is advantageous because not all problems are equally difficult; some can be solved trivially or with memorization, while others require magnitudes more effort. If we can’t scale the amount of compute we are able to put to good use at inference time, then we are stuck with some maximum budget for total flops.

For many problems, namely design problems, this is especially important. Design problems are those problems which can be solved in many (usually an in infinite number of) ways, and for which the quality of solutions is highly multidimensional. Often times with design problems, it may not be clear ahead of time what all the dimensions of quality are (or, they might even be likely to change over time). For these problems, it’s possible that two potential solutions may be very close to one another in design space, yet one of them may meet only some of the quality dimensions in a mediocre fashion yet the other could surpass all of the considered criteria dimensions while being simpler / less costly. This is not the case for many other types of problems; for example: solving closed-form algebra problems is not a design problem, because all solutions work equally well and are equally costly. However, an algorithm for solving closed-form algebra problems IS a design problem. Some examples of design problems include essays, poetry, art, software / programs, user interfaces, clothes. Anything that is used by a human or that can be considered to have an interface is subject to design (and design principles).

Design problems take some minimum bar of compute / intelligence / effort to solve at all, but the nature of them is that the more consideration, effort, experience, or expertise going into the problem, the better the solutions are that can be produced. For non-design problems, solutions (and often the optimality of solutions) are easy to prove. Design problems, on the other hand, require taste. For non-design problems that can be clearly articulated, it may be fine to hire / put someone to work on the problem that is inexperienced, because it can be easy to verify the solution. But putting an inexperienced person on an important design problem is a bad idea because the problem may appear to be solved appropriately, when in fact, major potential benefits of alternative solutions could have been left undiscovered.

The scaling of training compute is analogous to putting a more experienced person on a given task. Even if the more experienced individual puts in less effort, they’ve seen many more designs in the past than an inexperienced person. This doesn’t make them a good designer in a way that is transferable out of their domain experience, but through brute-force pattern matching on their experience, they’ll be able to regurgitate or interpolate a reasonable design.

The scaling of test-time / inference time compute in LLMs is analogous to employing an effortful design approach from first principles. A good designer does not merely perform unconscious pattern matching according to their experience. Instead, they perform a search over the space of possible solutions. This search is guided by a combination of intuition (which is unconscious pattern matching gained through experience) and explicit reasoning over the particularities of given solutions (evaluating them according to multidimensional quality criteria). A good designer is able to generate a large number of distinct, diverse ideas, then know the right balance of intuition and explicit evaluation to apply in order to maximize the value of their final solution given a finite amount of time.

In this way, a good chess player is a good designer. The space of possible solutions to chess is too large for humans to explicitly conceive of. However, there is also not enough time in a human lifespan to experientially train oneself to have intuitive pattern matching applicable to all possible board permutations. Because of this good chess players don’t even try to find strictly optimal solutions. Instead, they design solutions which are appropriate given finite time resources. Explicit reasoning takes time, and good chess players are those that have great intuitions about which areas of the board / solution space they should spend that time budget conducting explicit reasoning.

It’s possible to solve chess (relative to human performance) through explicit reasoning alone on current hardware because the number of board permutations is relatively low. For Go, however, the number of board permutations is significantly larger, which is why a solution performing better than humans took about two decades longer: we had to combine the explicit reasoning approach with an ‘intuitive’ method of narrowing the search space (using statistical methods).

Scaling inference time compute in human software designers

Anything that has an interface is a design problem. Whenever you write a program, a class, a method, or a line of code, you are creating an interface that will have to be used or maintained by other engineers in the future. Like all design problems, there is some minimum bar of effort required to produce a workable solution, but this does not mean that it is a good solution. We don’t just want to produce a solution that works right now, we want to produce a solution that makes it easy to extend, modify, and change the solution later, at a time when requirements or criteria could be wildly different than they are today.

Writing the first solution that comes to mind is akin to merely using intuition, experience, memorization, or training time compute. Instead, a good software designer is one who is able to generate a large number of possible solutions (or solution subspaces), and is able to use the right combination of explicit reasoning about evaluation criteria and intuition in order to narrow in on a great solution. Good software designers try to predict how an interface could possible be used in the future, or how the requirements of the software may change in the future, and build things in a way that minimizes future complexity (rather than minimizing present complexity, or failing to minimize complexity at all).