The Math of Less

Why this week’s biggest breakthroughs all worked by taking things away, and what a fringe philosophy of mathematics has to do with any of it.

Copyright: Sanjay Basu

For most of the last decade, the headlines in physics, AI, and computing have all rhymed with the same dull tune. More. More parameters. More qubits. More compute. More data. More layers. More floors on the same skyscraper, and another floor again next quarter, because what else are you going to put in a press release.

But this week, four very different research stories shared an unsettling subtext, and it is the opposite one. The next jump forward might come from doing brilliantly less. From a Caltech team showing that a useful quantum computer needs ten thousand qubits instead of a million, to a Google paper that crushes a transformer’s memory using a forty-year-old trick from pure mathematics, science is having a quiet subtractive moment. And tucked behind it, almost embarrassed to be there, is a philosopher’s question that won’t stop nagging. What if infinity was the wrong abstraction all along?

What follows is one essay, four stories, and an admission that the rhyme is loud enough to listen to.

1. The cult of more

Somewhere around 2018, the scientific imagination, or at least its bestseller list, decided that the way forward in basically everything was to make the thing larger.

Deep learning got the scaling laws. Kaplan, then Chinchilla, then everyone with a budget, plotted log-loss against log-parameters and saw, gloriously, a straight line. The straight line was the policy. Quantum computing got the fault tolerance argument. A useful quantum computer would need around a million physical qubits per useful logical one, the textbooks said, because you needed massive redundancy to fight decoherence, and that was just the cost of doing business. PDE solvers got finer meshes. Foundational mathematics, in the small corner that worried about such things, kept adding axioms.

The pattern was so obvious it stopped looking like a pattern. “More” became the heuristic that nobody had to defend. You don’t defend gravity.

But “more” has costs that aren’t always tracked on the same page as the wins. There is the electricity bill for the GPU farm. There is the fabrication yield for the qubit chip. There is the carbon. There is also a subtler tax, which is that when your only move is to scale, you stop looking for the better move. The skyscraper metaphor is unfair but persistent. If you spend ten years adding floors to the same building, the boldest thing you can do is ask whether the building needed to be that tall in the first place.

There was always a quieter tradition saying so. Occam was an obvious example, but the modern flavor is more practical and more interesting. Statisticians had sparsity. Physicists had Kolmogorov complexity. Designers had minimalism, which they pretended was about aesthetics but was really about understanding the thing well enough to remove the parts that weren’t doing any work.

This week, the quiet tradition showed up to four different parties wearing essentially the same outfit.

When you have spent ten years adding floors to the skyscraper, the boldest move is to ask whether the building needed to be that tall.

2. The qubit count just dropped by two orders of magnitude

Of all the disciplines where “more” felt like physical law, quantum computing wore it most heavily. The reason was simple. Qubits are fragile in a way that classical bits are not. A cosmic ray, a thermal jitter, the wrong glance, and your information evaporates. The whole architecture of fault-tolerant quantum computing was built around brute redundancy. Encode each logical qubit you actually care about using something like a thousand physical ones, run them in lockstep, vote on the answer.

Estimates for a machine that could run Shor’s algorithm against modern cryptography sat, for years, at around a million physical qubits. You needed roughly two thousand logical qubits to factor a four thousand bit RSA key, give or take, and at a thousand-to-one redundancy ratio, well, the math is the math.

Then on March 31, Caltech announced something that broke that ratio. A team led by groups including Manuel Endres, working with a new venture called Oratomic, reported a fault-tolerant neutral-atom architecture that needs around five physical qubits per logical qubit, not a thousand. They estimated that a cryptographically relevant machine could be built with ten to twenty thousand atoms, not a million.

Two orders of magnitude. In a field where every order of magnitude is a decade of work and a billion dollars, you don’t get those casually.

The trick was not to make better atoms. The trick was to stop pretending the atoms had to sit still. In a static layout, two qubits you’d like to entangle might be physically distant on the chip, and getting them to talk requires shuttling information through intermediate qubits, each of which is another chance to make an error. In a reconfigurable optical-tweezer array, the atoms themselves can be picked up and moved. Want to entangle atom 47 with atom 8,213? Drag atom 47 over to atom 8,213 and let them shake hands. The connectivity graph is no longer a fixed grid, it’s a dance.

Once you can wire any two switches to each other on demand, an enormous category of overhead simply vanishes. The new error-correcting code can be smaller because it doesn’t have to route information across a frozen lattice. It is a much better matched piece of math to a much better matched piece of hardware.

A million qubits was the scaffolding we thought we needed. It turned out we only needed it because we were stacking the bricks wrong.

Same physics, different bookkeeping. The static layout assumed a thousand physical qubits per logical one. Reconfigurable atoms cut that to around five

There’s a nice side note about how the algorithm itself was found. According to Time’s coverage on April 7, AI tools were “instrumental” in the search for the error-correcting code that closed the gap. Not in a press-release way, in a real way. Somebody fed the search problem to a model, and the model came back with something the humans hadn’t tried.

Manuel Endres’s group at Caltech had already built a 6,100-atom array before any of this. The implication is gentle but firm. The gap between what we have and what we need just collapsed by about two decimal places. If you were one of the people quietly hoping that post-quantum cryptography migration would remain a problem for the second half of the 2030s, this is your bad week.

3. A 1984 lemma runs your AI

Two days before the Caltech announcement, Google posted a paper to arXiv and a blog post to its research site about something called TurboQuant. The headline numbers were striking. A six-times reduction in the KV cache memory of large language models. Up to eight times faster attention on H100s. Three bits per coordinate instead of the usual sixteen or thirty-two. Within roughly 2.7x of the information-theoretic lower bound. No measurable accuracy loss.

If you don’t live inside a transformer, a quick translation. The KV cache is the running memory of an LLM during generation. Every token the model produces costs more memory, linearly, because it has to remember every key and value vector from every previous token in every layer of attention. This is why long context windows are expensive. The cache is the bottleneck for long conversations, persistent agents, and basically every dream of what AI is supposed to grow up to be.

TurboQuant’s solution has two stages. First, PolarQuant rotates the cache vectors out of Cartesian coordinates and into a polar form, then recursively distills them into a single radius and a sequence of angles. This is the kind of thing engineers used to do to data when memory cost real money. Second, and this is the load-bearing wall, Quantized Johnson-Lindenstrauss applies the JL lemma to the result.

A 1984 theorem about preserving distances in low dimensions is the unsung hero of why a transformer does not have to remember everything in full color.

The Johnson-Lindenstrauss lemma promises that a random projection from high dimensions to low ones approximately preserves pairwise distances. TurboQuant pushes it almost to the bone

The Johnson-Lindenstrauss lemma is one of those results that sounds too good when you first hear it and then you read the proof and it is, in fact, that good. Johnson and Lindenstrauss in 1984 showed that you can take a set of points in a very high-dimensional space, project them randomly down into a far lower-dimensional space, and the pairwise distances between the points are preserved with arbitrarily small distortion, with high probability, provided the lower dimension is at least logarithmic in the number of points.

Read that again. The output dimension does not depend on the input dimension. A thousand-dimensional cloud and a million-dimensional cloud can both be squashed into roughly the same number of dimensions, if all you care about is who is near whom.

Here is the analogy that finally made it stick for me. Imagine a thousand of your friends scattered across a city. You want to reproduce, on a much smaller map, who is near whom. You don’t actually need anyone’s GPS coordinates. You need their shadows, cast in roughly the right direction, onto a much smaller surface. The JL lemma guarantees that a randomly chosen direction works almost as well as the cleverest one, which is the kind of miracle pure math hands out approximately once a generation.

QJL pushes this to its limit. One sign bit per dimension. You don’t even keep the magnitude of the projection, you keep only whether the inner product with each random direction was positive or negative. And, astonishingly, distances are still mostly preserved.

TurboQuant wraps this in a rotation-then-quantize pipeline. The rotation is essential. Quantizing without rotating is like cropping a long photograph without first straightening it. You lose far more of the picture than you needed to.

The result, as one of the paper’s authors quietly notes, is that we are now operating within a small constant factor of the Shannon limit for transformer state. Information theory says you can’t go much further. We are crowded up against the wall.

Set aside, for a moment, the specifics. What this story says about the field is something like a confession. The most important infrastructure for serving the next generation of long-context AI is a theorem published when Reagan was president, that AI engineers, by and large, had forgotten about. The new compute frontier is built on a foundation that classical mathematicians wrote down before there were transformers, before there was the web, before there were even the kind of computers you could fit on a desk.

4. Mollifiers, or the resurrection of classical analysis

If the TurboQuant story is about AI rediscovering old math, the next one is about old math returning to repossess the territory.

The setting is physics-informed neural networks. PINNs, to friends. The basic idea is to train a network to fit data while simultaneously satisfying a known physical law, expressed as a partial differential equation. You want the network’s output to fit your measurements and to obey Maxwell, or Navier-Stokes, or whatever your problem deals in. The PDE acts as a soft constraint during training.

The catch is that, to enforce the constraint, you need the derivatives of the network output. High-order ones, often. Recursive automatic differentiation can compute these, but the memory grows with each order, and the result is brittle if the underlying data is noisy. Asking a network to differentiate itself four times in a row is, mathematically, fine. Doing it stably in the wild is not.

The Penn Engineering paper that landed in May, titled Mollifier Layers, replaces this whole recursive tower with a single convolution.

Mollifiers are the equivalent of asking a friend to read your handwriting after one careful pass with a nice fountain pen, instead of squinting at your scrawl with a magnifier.

Recursive autodiff produces every higher derivative by chaining the chain rule. Convolving once with a mollifier hands them all over analytically, and gently

Mollifiers come from functional analysis in the 1930s. Sergei Sobolev introduced them, in a different language, to give precise meaning to derivatives of functions that aren’t smooth. The idea is simple. A mollifier is a smooth, narrowly-peaked, bell-shaped function. If you convolve any function, even a wildly non-smooth one, with a mollifier, the result is as smooth as you like. And crucially, the derivatives of the convolution are exact and analytic, because the differentiation falls on the mollifier instead of on the original function.

Translate that into a PINN. Instead of asking the network to differentiate itself, you convolve its output with a mollifier once. All the derivatives the PDE needs come out for free, by differentiating the smooth bump analytically. The memory savings, the authors report, are six to ten times. The training is more stable. And the method is dramatically less rattled by noise, which matters because the killer application of PINNs is inverse problems, and inverse problems are almost always noisy.

The test case in the paper is gorgeous. Inferring spatially-varying epigenetic reaction rates from super-resolution chromatin imaging. Genomics on the inside of a single cell, by way of a 1930s definition of a smooth bump function. A complementary group at Hawaiʻi at Mānoa has just published a related physics-informed ML algorithm. The pattern is real.

The funny part is that mollifiers were briefly fashionable in the 1950s, became obscure, became standard machinery for any first-year graduate student in PDE theory, and have now been deputized to save deep learning from itself. There is something to be said for keeping the math department around.

And here is the rhyme. Whether it’s TurboQuant or Mollifier Layers, the move is the same. Pick the right transform up front, in a basis where the problem looks small, and the heavy lifting that everyone else is doing turns out to be unnecessary work.

5. What if infinity was always the bug?

On April 29, Quanta Magazine ran a piece by their math desk titled “What Can We Gain by Losing Infinity?” It profiled ultrafinitism, a position in the philosophy of mathematics that has, for most of the last century, been treated as the eccentric uncle of the foundations. The thesis is that only “small enough” numbers really exist. Past some threshold, very large finite numbers are convenient fictions, useful in proof but not in reality, and infinity is, well, even more of a fiction.

The most public advocate today is Doron Zeilberger at Rutgers, who has been writing essays about this for thirty years with a kind of cheerful belligerence. The April 2025 Columbia conference brought together physicists, logicians, philosophers, and mathematicians around a question that, until quite recently, was considered a curiosity. “No satisfactory development,” Anne Troelstra wrote in 1988, and that was, for a long time, the conventional view.

Something has shifted.

When the AI you use, the qubit array you read about, and the PDE solver under the hood all gain by throwing away the infinite, what does that say about the territory the infinite was supposed to map?

My take on this — https://medium.com/physics-philosophy-more/banishing-infinity-6a54a36da6a2


For every actual computation, finite is the floor and the ceiling. The disagreement between the working mathematician and the ultrafinitist is about everything that lives past the horizon

Physics has been taking finiteness more seriously for a while now. The Bekenstein bound says any region of space can store only finitely much information, in bits per area. Holographic arguments push that further. The universe, as far as anyone can tell, is not in the business of storing infinities.

Computation is inherently a finite affair. Every neural network, every quantum circuit, every silicon chip you have ever physically touched is finite. And yet the theory we wrap around them is drenched in infinities. Real-valued weights, continuous PDEs, infinite-dimensional Hilbert spaces, completed limits taken to convergence in proofs that nobody actually waits around for.

And then there is the thing that ties this essay together. AI is, more and more, behaving like a discipline where infinity is a tax we keep paying for the privilege of using clean notation. Mollifier Layers smooth over the singular continuum because the singular continuum was a mathematical convenience, not a property of the data. TurboQuant projects continuous embeddings down into finite bit budgets and barely notices. Quantum error correction reduces an idealized infinite-precision quantum state to a discrete code and runs better for it.

The argument doesn’t have to be that infinity is false. That’s the strong claim, and you can pick it up if you like, but you don’t have to. The weak claim is enough. Infinity may be a useful idealization that we have kept paying tax on, decade after decade, and the cutting edge of three different disciplines is figuring out how to stop paying.

Joel David Hamkins, who is not an ultrafinitist but takes the position seriously, has written that the disagreement between the classical mathematician and the ultrafinitist is, in practical terms, almost nothing. If you only ever care about numbers below, say, two to the thousand, the working mathematician and the ultrafinitist live in the same world. The disagreement is theological. The interesting question is whether the theology has been quietly draining the budget of the empirical disciplines.

There is a complementary essay one could write about how the Stanford Encyclopedia of Philosophy revised its Early Modern Rationalism entry on April 17, and why it matters that working scientists keep needing their philosophical foundations refreshed. Another time. For now, enough to note that the question of what numbers really exist has stopped feeling like an idle one. It is starting to look load-bearing.

6. The elegance of less

Four stories don’t prove a theorem. But the rhyme is loud, and I’d rather call it what it is than pretend otherwise.

Quantum computing got smaller. Not by abandoning fault tolerance but by finding a smarter encoding, on hardware that was willing to rearrange itself. AI memory got smaller. Not by giving up on long contexts but by remembering that a forty-year-old lemma will gladly hand you a near-optimal compression for free. Scientific machine learning got smaller. Not by lowering the bar on accuracy but by trading a recursive computation for a classical convolution that does the same work and asks for less. Foundations are quietly getting smaller, too. Not by deciding that infinity is wrong, but by recognizing that the empirical disciplines may have been paying for infinity, in tokens and qubits and gradients, for longer than was strictly necessary.

The era of “scale is all you need” is sliding into an era where the right small representation is all you need, and that is an aesthetic, even an ethic.


Fig 5. Four different decades of mathematics, four different problems, the same arrow


The common thread is expressive compression. The deepest progress now seems to come from finding the basis in which the problem looks small, and then doing the problem there. The bigness you used to need was always a confession that you hadn’t found the right basis yet.

For builders and product people, the lesson is operational. The next moat may be subtractive. Efficiency, sparsity, the kind of elegance that ships because you understand the problem well enough to make it tiny. The wins from “throw another card at it” are leveling off. The wins from “throw the right operator at it” are not.

For everyone else, I think the philosophy is the most fun part. Watch carefully when an entire discipline starts gaining by throwing things away. It almost always means a piece of math from somewhere else just walked in and took a seat. This week, four different fields heard the same knock at the door.

What would your most ambitious subtraction be?

Sources

Quantum computing

• Caltech press release: “Quantum computers may be smaller than we thought” (March 31, 2026)

• IQIM blog: “Shor’s algorithm with 10,000 reconfigurable atomic qubits”

• Oratomic launch announcement (March 2026)

• Nature news: “It’s a real shock” (April 2026)

• Time: “AI helped spark a quantum breakthrough” (April 7, 2026)

AI memory and compression

• Google Research blog: TurboQuant (April 2026)

• arXiv:2504.19874, TurboQuant (April 2026)

• Johnson and Lindenstrauss, “Extensions of Lipschitz mappings into a Hilbert space” (1984)

• PolarQuant, AISTATS 2026

• Quantized Johnson-Lindenstrauss, AAAI 2025

Physics-informed machine learning

• Mollifier Layers, arXiv:2505.11682 (May 2026)

• OpenReview discussion thread for Mollifier Layers

• phys.org coverage, May 2026

• Sergei Sobolev, original mollifier construction (1930s)

Mathematics, foundations, and philosophy

• Quanta Magazine: “What Can We Gain by Losing Infinity?” (April 29, 2026)

• Columbia Philosophy: Ultrafinitism conference proceedings (April 2025)

• Doron Zeilberger, “A Very Short Survey of Ultrafinitism”

• Joel David Hamkins on ultrafinitism, Infinitely More

• SEP: Early Modern Rationalism, revised entry (April 17, 2026)










Comments

Popular posts from this blog

Digital Selfhood

Axiomatic Thinking

How MSPs Can Deliver IT-as-a-Service with Better Governance