The Math of Less

Why this week’s biggest breakthroughs all worked by taking things away, and what a fringe philosophy of mathematics has to do with any of it.

For most of the last decade, the headlines in physics, AI, and computing have all rhymed with the same dull tune. More. More parameters. More qubits. More compute. More data. More layers. More floors on the same skyscraper, and another floor again next quarter, because what else are you going to put in a press release.

But this week, four very different research stories shared an unsettling subtext, and it is the opposite one. The next jump forward might come from doing brilliantly less. From a Caltech team showing that a useful quantum computer needs ten thousand qubits instead of a million, to a Google paper that crushes a transformer’s memory using a forty-year-old trick from pure mathematics, science is having a quiet subtractive moment. And tucked behind it, almost embarrassed to be there, is a philosopher’s question that won’t stop nagging. What if infinity was the wrong abstraction all along?

What follows is one essay, four stories, and an admission that the rhyme is loud enough to listen to.

1. The cult of more

Somewhere around 2018, the scientific imagination, or at least its bestseller list, decided that the way forward in basically everything was to make the thing larger.

Deep learning got the scaling laws. Kaplan, then Chinchilla, then everyone with a budget, plotted log-loss against log-parameters and saw, gloriously, a straight line. The straight line was the policy. Quantum computing got the fault tolerance argument. A useful quantum computer would need around a million physical qubits per useful logical one, the textbooks said, because you needed massive redundancy to fight decoherence, and that was just the cost of doing business. PDE solvers got finer meshes. Foundational mathematics, in the small corner that worried about such things, kept adding axioms.

The pattern was so obvious it stopped looking like a pattern. “More” became the heuristic that nobody had to defend. You don’t defend gravity.

But “more” has costs that aren’t always tracked on the same page as the wins. There is the electricity bill for the GPU farm. There is the fabrication yield for the qubit chip. There is the carbon. There is also a subtler tax, which is that when your only move is to scale, you stop looking for the better move. The skyscraper metaphor is unfair but persistent. If you spend ten years adding floors to the same building, the boldest thing you can do is ask whether the building needed to be that tall in the first place.

There was always a quieter tradition saying so. Occam was an obvious example, but the modern flavor is more practical and more interesting. Statisticians had sparsity. Physicists had Kolmogorov complexity. Designers had minimalism, which they pretended was about aesthetics but was really about understanding the thing well enough to remove the parts that weren’t doing any work.

This week, the quiet tradition showed up to four different parties wearing essentially the same outfit.

When you have spent ten years adding floors to the skyscraper, the boldest move is to ask whether the building needed to be that tall.

2. The qubit count just dropped by two orders of magnitude

Of all the disciplines where “more” felt like physical law, quantum computing wore it most heavily. The reason was simple. Qubits are fragile in a way that classical bits are not. A cosmic ray, a thermal jitter, the wrong glance, and your information evaporates. The whole architecture of fault-tolerant quantum computing was built around brute redundancy. Encode each logical qubit you actually care about using something like a thousand physical ones, run them in lockstep, vote on the answer.

Estimates for a machine that could run Shor’s algorithm against modern cryptography sat, for years, at around a million physical qubits. You needed roughly two thousand logical qubits to factor a four thousand bit RSA key, give or take, and at a thousand-to-one redundancy ratio, well, the math is the math.

Then on March 31, Caltech announced something that broke that ratio. A team led by groups including Manuel Endres, working with a new venture called Oratomic, reported a fault-tolerant neutral-atom architecture that needs around five physical qubits per logical qubit, not a thousand. They estimated that a cryptographically relevant machine could be built with ten to twenty thousand atoms, not a million.

Two orders of magnitude. In a field where every order of magnitude is a decade of work and a billion dollars, you don’t get those casually.

The trick was not to make better atoms. The trick was to stop pretending the atoms had to sit still. In a static layout, two qubits you’d like to entangle might be physically distant on the chip, and getting them to talk requires shuttling information through intermediate qubits, each of which is another chance to make an error. In a reconfigurable optical-tweezer array, the atoms themselves can be picked up and moved. Want to entangle atom 47 with atom 8,213? Drag atom 47 over to atom 8,213 and let them shake hands. The connectivity graph is no longer a fixed grid, it’s a dance.

Once you can wire any two switches to each other on demand, an enormous category of overhead simply vanishes. The new error-correcting code can be smaller because it doesn’t have to route information across a frozen lattice. It is a much better matched piece of math to a much better matched piece of hardware.

A million qubits was the scaffolding we thought we needed. It turned out we only needed it because we were stacking the bricks wrong.

Same physics, different bookkeeping. The static layout assumed a thousand physical qubits per logical one. Reconfigurable atoms cut that to around five

There’s a nice side note about how the algorithm itself was found. According to Time’s coverage on April 7, AI tools were “instrumental” in the search for the error-correcting code that closed the gap. Not in a press-release way, in a real way. Somebody fed the search problem to a model, and the model came back with something the humans hadn’t tried.

Manuel Endres’s group at Caltech had already built a 6,100-atom array before any of this. The implication is gentle but firm. The gap between what we have and what we need just collapsed by about two decimal places. If you were one of the people quietly hoping that post-quantum cryptography migration would remain a problem for the second half of the 2030s, this is your bad week.

3. A 1984 lemma runs your AI

Two days before the Caltech announcement, Google posted a paper to arXiv and a blog post to its research site about something called TurboQuant. The headline numbers were striking. A six-times reduction in the KV cache memory of large language models. Up to eight times faster attention on H100s. Three bits per coordinate instead of the usual sixteen or thirty-two. Within roughly 2.7x of the information-theoretic lower bound. No measurable accuracy loss.

If you don’t live inside a transformer, a quick translation. The KV cache is the running memory of an LLM during generation. Every token the model produces costs more memory, linearly, because it has to remember every key and value vector from every previous token in every layer of attention. This is why long context windows are expensive. The cache is the bottleneck for long conversations, persistent agents, and basically every dream of what AI is supposed to grow up to be.

TurboQuant’s solution has two stages. First, PolarQuant rotates the cache vectors out of Cartesian coordinates and into a polar form, then recursively distills them into a single radius and a sequence of angles. This is the kind of thing engineers used to do to data when memory cost real money. Second, and this is the load-bearing wall, Quantized Johnson-Lindenstrauss applies the JL lemma to the result.

A 1984 theorem about preserving distances in low dimensions is the unsung hero of why a transformer does not have to remember everything in full color.