The Co-Mathematician's Question

Copyright: Sanjay Basu

In late April, an Oxford topologist named Marc Lackenby fed a problem from a battered Russian notebook to an AI that DeepMind had been quietly building for the better part of a year. The notebook is the Kourovka Notebook, and it has been collecting unsolved questions in group theory since 1965, passed mathematician to mathematician across continents and editions, like an open-mic list nobody quite knows how to close. The question Lackenby chose, problem 21.10, had outlived two generations of mathematicians. A few days and one caught-out flawed proof later, the problem was closed. The strange part isn’t that the machine solved it. The strange part is what happened in between.

Most reporting on the result settled into the predictable register. Machine cracks human problem. The expected think pieces filed themselves. What got less airtime, and what is actually the story, is the workflow that did the cracking, and a small philosophical pinch point it produced almost as a by-product. The proof is correct. Whether anyone, including Lackenby, understands it the way we used to mean understanding, is now a live question. I want to argue here that this is not a technicality. It might be the question that decides what mathematics looks like by the end of the decade.

SECTION 1

A Sixty-Year-Old Question, Answered by a Committee of Bots

The Kourovka Notebook is one of those mathematical artefacts that sounds like a Borges premise but is real and a little embarrassing in its physicality. It started in 1965 in Novosibirsk, at a group theory conference, when the Soviet algebraist Mikhail Kargapolov produced a small ruled notebook and asked everyone present to write down a problem they could not solve. They did. The notebook went home with someone. The next year it had more problems. It got typed up, mailed around, photocopied, re-edited. It is now in its 21st edition, published this past January, and it remains the longest-running open list of unsolved problems in any branch of mathematics.

Problem 21.10 had the quality that makes a question survive that many editions. It is short. It is crisp. It asks whether every finite group has what the literature calls a just-infinite presentation, which means a description so economical that if you remove any single rule the group ceases to be finite and blows up to infinity. You can hand it to a graduate student in twenty seconds. You can fail to crack it for a career.

Lackenby did not hand it to a graduate student. He handed it to DeepMind’s AI Co-Mathematician, a system the lab released as a preprint on the first of May. The Co-Mathematician is not a chatbot in a costume. The architecture matters, so it is worth slowing down.


A coordinator agent dispatches workstreams in parallel. Reviewers check each step. The human picks the thread that survives

At the top sits a project coordinator agent whose only job is to keep score. It receives the problem, divides the attack into workstreams, and spawns sub-agents to chase each one. Some of those agents try to prove the conjecture. Others, working in parallel, try to disprove it. The point of running both is that the system does not need to know in advance which side is correct. It can converge on the truth by elimination. Beneath each strand of attack, reviewer agents read everything that gets produced, looking for the small logical gaps that human referees are paid to spot a few months after a paper goes up on arXiv.

What happened with problem 21.10 was instructive. The first attempted proof had a hole in it. It was not a hole that any of the worker agents noticed. It was the reviewer agent that flagged the step where the argument quietly conflated congruence with conjugacy, two relations on groups that look similar on a chalkboard and behave very differently if you push them. Lackenby read both the attempted proof and the critique, recognised the shape of the strategy, and redirected the system toward a cleaner formulation. He told the agents, in effect, to stop treating presentations as objects and start treating them as a poset, an ordered family with a top and a bottom. That suggestion did the work. The system rewrote the argument, the reviewers passed it, and Q 21.10 went into the back of the next Kourovka edition with a citation rather than a question mark.

The AI is not playing the role of mathematician. It is playing the role of an extremely fast, tireless graduate student who has read everything.

Lackenby has been careful in interviews about what the experience actually was. The system, he said, works best when the human is already familiar with the area. He used the AI the way a senior researcher uses a postdoc with infinite stamina, except this postdoc reads the entire literature again before lunch. It is a useful framing because it cuts through the marketing language. Nobody was replaced. A workflow was multiplied.

The wider context is the part that gets buried in the press release. The same week the Lackenby result was announced, DeepMind also reported that the Co-Mathematician had scored forty-eight percent on FrontierMath Tier 4, a benchmark Epoch AI explicitly built to be unreachable by machine learning systems for, in their words, decades. The base model, by itself, scored nineteen percent. The twenty-nine point jump came almost entirely from the agentic scaffolding sitting on top. Which is to say, the lift was not in the model. It was in how the model was made to argue with itself.

SECTION 2

A Different QR Code, A Different Kind of Tool

Two weeks before the Co-Mathematician story broke, Quanta Magazine ran a piece by Erica Klarreich about a new tool for distinguishing knots. The tool, built by a small group of geometric topologists, is being described as a powerful new QR code for knots. The headline undersells it slightly. What the team did was construct a computable, visualisable approximation to the Kontsevich integral, an object that has hovered in knot theory since the 1990s as the theoretical fingerprint of every knot in the universe. The integral is so detailed it might, in principle, distinguish any two knots that look alike. But it is written in a language no human can read off a real piece of string.

Think of it this way. The Kontsevich integral is the perfect identification photo of every knot you will ever tie or untie. Crisp, unambiguous, complete. The catch is that it is stored in a file format no camera can render. For thirty years it has been a beautiful object that nobody can hold up to the light. The new QR code, in effect, is the first rendering of fragments of that file that the human eye can actually parse. It is a piece of mathematics built by humans that lets humans see something they could not see before.

Now place the two stories next to each other. The QR-code knot tool extends human sight. It lets a person, a working topologist, look at a knot and read something off the page that was always there but invisible. The Co-Mathematician, by contrast, extends the machine’s sight. It lets a system carry out the seeing on behalf of the mathematician, who then signs off on the result. Both expand the reach of the field. They are doing very different epistemic work.


Two pieces of news from the same season. One extends what a human can see. The other extends what the machine can see for them

If you wanted a metaphor that would help a non-specialist friend over coffee, try this one. A telescope is an instrument that improves your sight. You still do the looking, but you get further. A friend who looks through the telescope and then tells you what they saw is doing something different, and the difference matters. The view is no less real in the second case. But what you have is a testimony, not an observation. The new knot invariant is a telescope. The Co-Mathematician is a friend with a very good pair of binoculars and a habit of writing things down. The question that will run through the rest of this piece is what we lose, and what we gain, when our discipline tilts away from the first kind of seeing toward the second.

SECTION 3

What Tao Actually Said

Terence Tao, who is the field’s most-watched voice on this stuff for the obvious and slightly intimidating reasons, published an essay in late March with the philosopher of mathematics Tanya Klowden. It will appear in the forthcoming Blackwell Companion to the Philosophy of Mathematics, but a preprint is on arXiv and a friendlier version sits on his blog. The essay’s framing has, in the weeks since, become a small piece of common vocabulary in math department lounges. AI, Tao wrote, is a flavour concentrate. Not the dish.

The vanilla extract is automated. The cake still has to be assembled, and the recipe still has to be invented.

The point of the analogy is to draw a line between two phases of mathematical work. There is the exploration phase, which most people outside the field do not realise exists. This is where the actual time goes. You chase a lead. You look up an analogue in a different subfield. You rule out three plausible-sounding conjectures by trying them on small cases. You read papers in adjacent areas you do not really understand. You do all of this, mostly, to find out which question is worth posing in a formal way. The exploration phase ends with a conjecture that feels like it might be true and feels like it might be provable. Only then does the second phase begin, which is the slow, technical, sometimes elegant work of actually proving it.

Tao’s claim, made with the air of someone who has been watching this carefully, is that AI compresses the exploration phase by orders of magnitude. It does not, on present evidence, replace the proving phase. It does not, certainly, replace the choosing phase, by which he means the human business of deciding which question is worth chasing in the first place. What automation does to a market is well understood. The thing that becomes scarce is whatever cannot be automated. In mathematics, what is becoming scarce is taste.



Fig 3 Tao’s framing as a workflow. Exploration costs collapse. The bottleneck shifts upstream, into deciding what is worth proving


Tao has been blunter elsewhere. In a Nature interview in April he said, more or less, that the field is being forced to rethink what a proof is, what a paper is, what the profession is for. If mathematicians do not answer these questions themselves, the answers will arrive by way of quarterly earnings calls, which is an exquisite phrase and possibly the most useful sentence anyone has uttered about academic AI this year. He also let slip something quietly important about his own practice, which is that he now tries crazier things. The cost of finding out a conjecture is wrong has fallen so dramatically that he is willing to test ideas he would once have written off as not worth the time.

There is a chess analogy here that has been done to death, but only because it keeps working. When Deep Blue beat Kasparov in 1997 a lot of people predicted the end of professional chess. What happened instead is that the game shifted. Grandmasters now prepare lines with engines for months and pick which games to play more carefully. The skill ceiling rose. The skill itself changed. Tao’s claim, if you take it seriously, is that something structurally similar is happening to mathematics this year.

SECTION 4

Has Anyone Understood the Answer?

In January, the Carnegie Mellon philosopher Jeremy Avigad uploaded a paper to the PhilSci Archive called simply Mathematical Understanding. It opens with a claim so cleanly stated it has the quality of an aphorism. Mathematics, Avigad writes, is not about numbers, equations, computations, or algorithms. It is about understanding. From that one sentence he then derives a much sharper one. A correct proof is neither sufficient nor necessary for mathematical value. A verified but opaque proof leaves us no wiser. A flawed but illuminating sketch can advance the field.

If you are a working scientist that probably sounds like a polite seminar paradox. If you have spent any time near the AI Co-Mathematician story, it lands rather differently. Lackenby and the AI together produced a proof of Q 21.10 that the reviewer agents passed and that human referees will, in time, also pass. The proof is correct. Whether the proof illuminates anything, whether it tells working group theorists why the result holds, what other problems it cracks open, what underlying structure it puts into view, is a separate question. The reviewer agents do not check for insight. They check for gaps. Insight is a property of mathematicians reading the proof, not of the proof itself.

Correctness is a property of the proof. Understanding is a property of the mathematician reading it. When the prover is a machine, the gap between the two can yawn open.

This is a much older problem in a new coat. The Four Colour Theorem was proved in 1976 by Appel and Haken using a computer to check fourteen hundred and ninety-six cases by exhaustion. The result was correct. The mathematical community accepted it. Some mathematicians, including some very good ones, found the proof unsatisfying for exactly Avigad’s reason. It told them that the theorem was true. It did not tell them anything about why. There was no insight to extract, no smaller idea that could be reapplied elsewhere. It was a certificate of correctness, and certificates are not what most mathematicians got into the field for.



Fig 4 Correct, illuminating, neither, both. Four cells that used to be three. The Co-Mathematician puts pressure on the upper-left

The same week that DeepMind released the Co-Mathematician, the 6th International Conference on Philosophy of Mind: AI was meeting in Porto. The papers being delivered there are, on the surface, about whether large language models have anything resembling semantic competence. Whether anything is meant when these systems reason. Stitch the two conferences together in your head and you can see the shape of a real disagreement forming. If a system without semantic competence can produce a correct proof, what does it mean to say that a human, reading the same proof, has understood it? Is understanding a feature of the prover, the reader, or the page in between?

There is, by way of elegant counter-thread, a small and growing group of mathematicians who think the discipline should solve the whole problem by giving up on infinity altogether. Doron Zeilberger, the loudest of them, was profiled in Quanta on April 29th, and he tells anyone who will listen that infinity may or may not exist, that God may or may not exist, but that in mathematics there should be no place for either. The position is called ultrafinitism. It is, depending on your taste, either a serious revision of the foundations of mathematics or a charming heresy. It is also, I think, not a coincidence that we are watching humans interrogate the foundations of their discipline at exactly the moment we hand its execution to machines. People reach for first principles when they suspect the floor is moving.

SECTION 5

Across the Hall, the Physicists Are Doing the Same Thing

It would be a mistake to read the Co-Mathematician story as a quirk of one discipline. Across the corridor, the physicists are running their own version. AI-Newton, a system out of a Chinese university group that was profiled in Nature in late 2025 and has been developed quietly since, takes raw experimental data as input and rediscovers principles like Newton’s second law without being told in advance what physics is. Symbolic regression methods such as AI Feynman and the newer Parallel Symbolic Enumeration, published in Nature Computational Science earlier this year, chase the same target. They want equations distilled directly from data, without the human in the loop of guessing which functional form to try.

The opposite trade also exists. Physics-informed neural networks, the so-called PINNs, are systems in which the human hands the physics to the machine in the form of constraints, and the machine learns within them. This May alone produced a small flood of these. PDE solvers for previously unsolvable systems. OLED spectrum prediction. Real-time air pollution mapping in dense urban environments. The pattern repeats. In mathematics, an AI proves a theorem and the meaning of the proof is contested. In physics, an AI rediscovers a law and the interpretation of the law remains a human act.

Nature also ran a piece this spring with the satisfying headline Human Scientists Trounce the Best AI Agents on Complex Tasks. The piece is correct, as far as it goes. The gap is real. The gap is also closing, and closing exactly along the boundary where correctness meets judgement. The Co-Mathematician release is the loudest evidence yet that the boundary is moving. It is not moving uniformly. There are pockets where the human eye remains decisively better, and Quanta published a lovely piece on May 6th about atmospheric scientists patiently circling the actual mechanism of lightning, an old phenomenon whose explanation keeps getting more interesting the more closely we look. That is what human attention to a physical question looks like. A machine that predicts the next storm with eerie accuracy is doing something else.

A weather model can predict tomorrow’s storm with terrifying precision and still not know anything about clouds.

SECTION 6

The Co-Mathematician’s Bargain

The reason the Co-Mathematician release matters is not that a machine proved a theorem. Machines have been doing that, in various reduced senses, for years. It matters because the workflow it embodies has become the new face of mathematical research at the frontier. Parallel agents. Internal reviewers. A coordinator. A human curator who reads, judges, and redirects. The bargain it offers the discipline is plain enough. Faster results, in exchange for a forced confrontation with what we wanted the results for in the first place.

If Tao is right, and exploration costs are falling toward zero, then the value of taste climbs. The mathematicians who flourish in this decade will not be the fastest provers. They will be the ones with the sharpest noses for what to ask. That is a real shift in what the profession rewards, and it is happening fast enough that nobody has had time to redesign the graduate curriculum to match. The students entering PhD programs this autumn will be the first generation to be trained in a field where the bottleneck is question-picking rather than proof-grinding. None of their advisors were trained that way. None of their textbooks were written that way. Watch this space.

And if Avigad is right, the distinction between correctness and understanding is no longer a polite seminar topic. It is the question that decides what mathematics looks like in five years. A discipline of insight, or a discipline of certificates. The two are not the same. They were once close enough to be confused. The Co-Mathematician is what it looks like when they finally come apart.

The Co-Mathematician did not replace Marc Lackenby. It made the version of Lackenby with ten parallel reviewer-graduate-students, working through the night without losing focus, suddenly real. That version of him, and the equivalent version of every other working mathematician at the frontier of the field, is the thing that gets to write the next chapter of the discipline. The interesting question, the question worth holding onto past the news cycle, is what kind of mathematician that turns out to be. Not what they prove. What they choose to prove.

Somewhere in Novosibirsk, the Kourovka Notebook now has one less open problem. Underneath the line where Q 21.10 used to sit, in nobody’s handwriting in particular, a different sort of question is forming. Did we understand it?

Sources

PRIMARY STORY · THE AI CO-MATHEMATICIAN

• DeepMind, AI Co-Mathematician: Accelerating Mathematicians with Agentic AI (arXiv 2605.06651). https://arxiv.org/abs/2605.06651

• Office Chai, Google DeepMind Releases AI Co-Mathematician. https://officechai.com/ai/google-deepmind-releases-ai-co-mathematician-that-creates-new-high-score-on-frontiermath-benchmark/

• Google DeepMind, AI for Math Initiative. https://blog.google/innovation-and-ai/models-and-research/google-deepmind/ai-for-math/

• Kourovka Notebook, 21st edition, January 2026. https://kourovkanotebookorg.wordpress.com/

• Charlotte Scott Centre, New 21st Edition of the Kourovka Notebook. https://algebra-lincoln.org/2026/01/09/new-21-st-edition-of-the-kourovka-notebook-unsolved-problems-in-group-theory/

MATHEMATICS · KNOT THEORY

• Erica Klarreich, A Powerful New QR Code Untangles Math’s Knottiest Knots. Quanta, 22 April 2026. https://www.quantamagazine.org/a-powerful-new-qr-code-untangles-maths-knottiest-knots-20260422/

TAO ON AI AND MATHEMATICS

• Terence Tao and Tanya Klowden, Mathematical Methods and Human Thought in the Age of AI (arXiv 2603.26524). https://arxiv.org/abs/2603.26524

• Tao’s blog post, 29 March 2026. https://terrytao.wordpress.com/2026/03/29/mathematical-methods-and-human-thought-in-the-age-of-ai/

• Nature interview, The job description is changing. https://www.nature.com/articles/d41586-026-01246-9

• OpenAI Academy, Terence Tao on AI in math and theoretical physics. https://academy.openai.com/public/blogs/terence-tao-ai-is-ready-for-primetime-in-math-and-theoretical-physics-2026-03-06

PHILOSOPHY OF MATHEMATICS AND MIND

• Jeremy Avigad, Mathematical Understanding, January 2026. https://philsci-archive.pitt.edu/27708/1/mathematical_understanding.pdf

• Michael Harris, Formal Proof and Epistemic Value (Silicon Reckoner). https://siliconreckoner.substack.com/p/formal-proof-and-epistemic-value

• Gregory Barber, What Can We Gain by Losing Infinity? Quanta, 29 April 2026. https://www.quantamagazine.org/what-can-we-gain-by-losing-infinity-20260429/

• 6th International Conference on Philosophy of Mind: AI (Porto, 4 to 8 May 2026). https://philevents.org/event/show/143946

PHYSICS AND AI FOR SCIENCE

• Discovering physical laws with parallel symbolic enumeration, Nature Computational Science, 2026. https://www.nature.com/articles/s43588-025-00904-8

• A Chinese AI model taught itself basic physics, Nature. https://www.nature.com/articles/d41586-025-03659-4

• AI-Newton: A Concept-Driven Physical Law Discovery System (arXiv 2504.01538). https://arxiv.org/html/2504.01538v2

• Human scientists trounce the best AI agents on complex tasks, Nature, 2026. https://www.nature.com/articles/d41586-026-01199-z

• What Causes Lightning? The Answer Keeps Getting More Interesting. Quanta, 6 May 2026. https://www.quantamagazine.org/physics/






 

Comments

Popular posts from this blog

Digital Selfhood

Axiomatic Thinking

How MSPs Can Deliver IT-as-a-Service with Better Governance