The Chip Designs Itself Now

 



Autonomous AI Agents Have Quietly Started Rewriting the Tools That Rewrite Silicon

I first came across this particular kind of news that arrived without a press release. No keynote, no crowd in a dim auditorium in San Jose, no breathless CNBC segment. Just a PDF quietly parked on arXiv, the kind of paper that looks unremarkable until you read the abstract twice and realize what it actually says. Then, of course, the fireside chat that Anirudh had with Jensen.

The paper in question is from NVIDIA Research and the University of Maryland, and its claim is modest in tone and enormous in implication. A team of large language model agents, they report, was pointed at ABC, the million-plus-line open-source logic synthesis system that has been the de facto academic and industrial backbone of chip design research for two decades. The agents were not asked to use ABC. They were asked to evolve it. To rewrite its C code. To improve the tool itself. After thirty-some cycles of automated compilation, formal equivalence checking, and benchmarking on an eighty-seven node cluster, the evolved ABC produced chips roughly eight percent better on area-delay product than the human-tuned baseline, without any person in the loop making architectural decisions.

Eight percent sounds like a conference-paper number. It is not. In semiconductor design, eight percent at the synthesis layer compounds through every downstream step, every subsequent generation, every customer who licenses the tool. Eight percent is the gap that separates second-place silicon from first-place silicon in a market that occasionally moves half a trillion dollars in market cap on a single earnings call.

More importantly, eight percent is not the point. The point is who, or what, produced it.

A Short Detour Through Irony

There is something delightfully recursive about the fact that the tools which design the chips that run the AI which is now redesigning those tools were the last corner of the stack to be touched by AI. Chip designers built TPUs, H100s, Blackwells, MI300Xs, and a small constellation of inference ASICs that made modern LLMs possible. They did this, for the most part, using software that was architecturally unchanged from the mid-2000s. Perl scripts wrapping TCL scripts wrapping C binaries whose authors are now tenured, retired, or both. The EDA world is the only place in tech where a senior engineer still says “the script broke” with the intonation of someone describing an actual, physical script on papyrus.

This was not laziness. It was prudence. Chip design is one of the few industries where the cost of a mistake is paid in foundry masks that run eight figures per set, in quarters of lost time, and in CEO resignation statements. You do not experiment with the tooling when the tooling experiments on physical reality. So EDA stayed conservative, the engineers stayed employed, and the rest of us got used to the idea that chip design was a slow, handcrafted, almost monastic discipline where the machines worked for us rather than alongside us.

That is no longer the situation. It stopped being the situation somewhere between December 2025 and March 2026, though almost nobody outside the EDA ecosystem noticed.

◆ ◆ ◆

What Just Actually Happened

At GTC 2026, NVIDIA announced it was integrating what it calls agentic AI into partnerships with every major EDA vendor that matters. Cadence rolled out the ChipStack AI SuperAgent, which orchestrates design, testbench coding, test-plan creation, and regression debugging as a continuous loop. Synopsys, now flush with a two-billion-dollar direct equity investment from NVIDIA, introduced AgentEngineer, which plugs into NVIDIA’s NeMo and Nemotron stack to drive autonomous design rule checking. Siemens unveiled the Fuse EDA AI Agent, which promises to orchestrate the entire semiconductor and PCB workflow, from design conception through manufacturing sign-off.

Every major vendor, in other words, shipped the same thing in roughly the same quarter. When that happens in any market, it is because something beneath the marketing deck actually works. The question is what.

Here is the honest answer, and it is a useful frame for anyone trying to separate the substance from the vendor theater. Agentic AI in chip design is not one thing. It is three things, and they operate at three different layers of the stack, and they have very different implications for how silicon gets built over the next five years.


Figure 1. Three layers of agentic activity in the semiconductor design stack, with representative tools at each layer.

Layer One: The Assistant

This is the layer everyone is most familiar with, because it is the LLM equivalent of the seatbelt warning chime. Useful, low-risk, and nobody argues with it. A chip designer writes Verilog, the assistant suggests the next block. An engineer opens a testbench, the assistant proposes coverage points. A failure log scrolls past, the assistant offers three plausible root causes.

Cadence has Cerebrus and JedAI copilots that live inside their flows. Synopsys has DSO.ai, which has been quietly searching physical design space with reinforcement learning since well before the current agent craze made it fashionable. GitHub Copilot and Cursor handle the generic code-completion case. None of these systems is autonomous in any meaningful sense. A human reviews every suggestion. The LLM is a very well-read intern.

The value at this layer is real but bounded. Productivity goes up maybe 20 to 40 percent for skilled engineers on well-scoped tasks. The limit is set by the fact that a human has to read and approve everything the assistant produces, and humans read slowly and approve even slower when the thing being approved might end up as a billion-dollar mask.

Layer Two: The Orchestrator

This is where the center of gravity shifted in Q1 2026. An orchestrator agent does not suggest the next line of Verilog. It runs the whole workflow. It reads a design spec, decides which EDA tools to call in which order, kicks off verification, reads the failure report, rewrites the offending code, reruns the tool, and keeps going until the design closes or it gives up and asks a human.

ChipStack is the cleanest example. Cadence claims up to a 10x productivity improvement on certain verification tasks, and while vendor numbers should always be treated as slightly optimistic fiction, the architecture is real and the customer list is not made up. Qualcomm, Altera, and NVIDIA itself are among the announced users. Synopsys AgentEngineer plays the same game with different branding. Siemens Fuse EDA stakes out the full PCB-through-silicon territory.

The interesting thing about orchestrator-layer agents is that they require a completely different set of guardrails than assistant-layer agents. When an LLM is an intern, you can let it hallucinate a little. When an LLM is running an eight-hour verification regression and deciding on its own which failures are spurious and which need a code fix, hallucinations become outages. This is why every serious vendor at this layer has spent more engineering effort on verification and equivalence checking than on the agent itself. The agent is the cheap part. The scaffolding that prevents the agent from quietly producing a wrong netlist is where the money goes.

You can already feel the market tension. Orchestrators threaten the headcount math of every large silicon org. A verification team of forty engineers that becomes a verification team of twelve plus an agent fleet is, from the CFO’s perspective, not a reduction in capability. It is a margin expansion. Whether that math turns out to be accurate is a separate question, and one that will be answered in tape-out success rates over the next two or three product cycles rather than in vendor slide decks today.

Layer Three: The Self-Evolver

This brings us back to the NVIDIA/Maryland paper, which is genuinely the most interesting development in this space and, not coincidentally, the one getting the least attention because it is the hardest to explain in a tweet.

The ABC self-evolution work is not an assistant and not an orchestrator. It is something else. It is an agent that modifies the tool, not the chip. The output of a run is not a netlist. It is a new, better version of the synthesis software.

| The agent does not help you design a chip. It redesigns the thing that designs the chip. That distinction is the whole story. |

The architecture is elegantly boring in the way all good engineering is. A planning agent, built on Claude Sonnet 4.5, reads quality-of-results feedback from the previous cycle and decides which subsystem of ABC to modify next. Three specialized coding agents, each scoped to a different module of the codebase, propose edits to their assigned directories. The modified tool gets compiled. If the compilation fails, the agents get the error log and debug themselves in a tight inner loop. If the compilation succeeds, the tool is run through formal combinational equivalence checking against the original, which is the step that separates this work from vibe coding. An equivalence check is not fuzzy. It is a mathematical proof that the modified tool produces logically identical results to the original. Any mismatch, even one, terminates the iteration.

If the tool survives equivalence checking, it gets benchmarked across ISCAS, EPFL, VTR, and IWLS suites on an eighty-seven node CPU cluster. The results become the reward signal for the next evolution cycle. A self-evolving rulebase governs which kinds of edits are allowed, and the planner can propose refinements to the rulebase itself when rules block beneficial edits.


Figure 2. The evolution loop for Self-Evolved ABC. Specialized agents modify non-overlapping subsystems under unified correctness and quality-of-results feedback.

Read that closely. An agent rewriting code. Another agent planning which code to rewrite. A formal verifier checking the rewrite did not break correctness. A benchmark cluster measuring whether the rewrite is actually better. A rulebase that itself evolves. The whole thing running for days, producing 87,749 lines of new C code, converging on improvements that a human expert would either not find or not have time to try.

The total cost to do this, according to the paper, was about $2,400 in LLM tokens.

A person who has spent any time in EDA will read that sentence three times, and not because they do not believe it. They will read it three times because they are trying to figure out what to tell their VP of Engineering on Monday morning.

◆ ◆ ◆

The Lineage Nobody Mentions

The ABC paper did not come out of nowhere. It is the third in a sequence that starts with Google DeepMind’s AlphaEvolve in 2025, which showed that LLM-based agents could evolve isolated algorithmic kernels, the hundreds-of-lines-of-code variety, and discover improvements beyond human baselines. AlphaEvolve famously contributed optimizations to Google’s TPU RTL, which is quietly one of the most important proof points nobody talks about.

Then came SATLUTION, also from NVIDIA, which took the AlphaEvolve idea and scaled it to full-repository evolution on SAT solvers. Tens of thousands of lines of C++, complete solvers, evolved from scratch. The evolved solvers beat the winners of the SAT Competition 2025, which is the NP-completeness equivalent of a model-scale thing quietly outrunning a human-tuned thing that represents decades of competitive optimization.

ABC evolution is the logical next step. Take the SATLUTION machinery, point it at a codebase ten times larger, over a much messier optimization landscape with multiple competing objectives (area, delay, depth, and correctness under technology constraints), and see what happens. What happened is the paper.

What happens next is almost certainly other parts of the EDA stack. Place and route. Timing closure. Physical verification. Analog sizing. If you have a tool with a well-defined quality metric, a fast-ish evaluation loop, and a correctness check, you can in principle evolve it. The question for every EDA vendor right now is not whether this works. It clearly does. The question is whether their internal codebases are clean enough, modular enough, and well-tested enough that agents can safely edit them. Many of them, candidly, are not.

◆ ◆ ◆

What This Means If You Build Silicon

A few observations, ordered from least controversial to most.

First, the productivity compound is going to be weird and nonlinear. Layer one assistants give you linear productivity gains. Layer two orchestrators give you step-function gains on specific workflows. Layer three self-evolving tools compound across every customer who uses the tool, on every subsequent project, for as long as the tool exists. The EDA vendor that gets self-evolution working in production first has an advantage that is genuinely difficult for competitors to match, because their tool improves faster than their competitor’s tool can be hand-tuned.

Second, the moat is not the model. Every serious EDA vendor is running the same frontier models, often the same exact models, and the moats are the things wrapped around those models. Formal verification pipelines. Benchmark clusters. Proprietary internal tools that the agents are editing. Rulebases that encode decades of EDA domain knowledge. These are the things that take years to build, and the companies that spent the last two decades building proper testing infrastructure for their tools are now looking at an asset they did not know they had.

Third, the human role shifts from executor to verifier. I have been saying some version of this line for three years about enterprise AI generally, and it is now arriving in the one industry where it has the most interesting implications. The chip designer of 2028 will spend very little time writing Verilog. They will spend most of their time defining what correct looks like, specifying constraints, reading agent proposals, approving or rejecting them, and investigating the cases where the agent and the verifier disagree. This is closer to the role of a mathematician reading a proof than the role of a programmer writing code.

Fourth, and this is the part that makes me genuinely excited in the way I have not been about a technology cycle in a while, we may be approaching the end of the era where Moore’s Law is dead. Not because transistors are going to shrink again, though maybe they will, but because the software layer between intent and silicon is about to compress by an order of magnitude. Most of the difficulty in modern chip design is not physics. It is the translation overhead between what an architect wants and what the physical layout can achieve. If agents can shorten that translation by a factor of three, custom silicon gets dramatically cheaper, which means more silicon gets built for more specific workloads, which means the effective performance gains per dollar start to look Moore-like again even without a node shrink.

| The dream of Moore’s Law was never really about transistor density. It was about exponential improvement in what a dollar can compute. That is a dream agents can keep alive even after the physics stops cooperating. |

What Breaks

I would be doing you a disservice if this essay were only a victory lap. Several things are actively broken, and it is worth being clear about them because the vendor keynotes will not be.

The first is that layer-three self-evolution only works when the optimization target is crisp and the correctness check is formal. Logic synthesis is a good fit because equivalence checking is well understood. Analog design is a terrible fit because “correct” for an analog circuit is a matrix of tradeoffs across temperature corners, process variation, and noise, none of which is cleanly expressible as a SAT problem. Expect self-evolution to dominate digital synthesis and mapping long before it touches analog.

The second problem is that the current agents are, in the paper’s own honest words, better at refining existing algorithms than at proposing new paradigms. They are superb at finding the threshold you set too conservatively, the heuristic that was never revisited after 2009, the cost function that could be reshuffled. They are much less good at inventing a new approach from first principles. The agents in the ABC study, when they tried novel algorithmic structures without anchors in existing invariants, typically failed with segfaults or subtle correctness violations they could not self-debug. This matters because the next decade of chip design will need new paradigms, not just better tuning of old ones. Agents are refiners, not revolutionaries. At least so far.

The third is a social problem. The EDA industry runs on a small number of very senior engineers who have built their careers and their self-worth on the detailed algorithmic choices that agents are now about to start optimizing away. This is not a technical problem. It is a managing-the-org problem, and it will be badly handled in many places. Some companies will pretend this is not happening. Others will over-index on it and lay off the people whose domain knowledge was most essential to training the rulebases in the first place. The right answer is obvious and very few organizations will implement it, which is that the senior experts become rulebase authors, benchmark designers, and verification architects, which is exactly the shift from executor to verifier described above.

◆ ◆ ◆

A Closing Thought, Because I Cannot Help Myself

I spend a lot of my time thinking about the philosophical shape of what is happening in AI, which I try to keep separate from my day job of actually shipping infrastructure. But occasionally the two collide in a way that is worth marking.

What the ABC paper quietly describes is a system in which a tool improves itself, verifies that the improvement is correct, and does this continuously, at a marginal cost that rounds to zero on any reasonable industrial budget. The framing most people will reach for is recursive self-improvement, which is the Bostrom-adjacent language that has powered a decade of AI safety discourse and a thousand Twitter arguments. That framing is not wrong, exactly, but it misses what is actually interesting about this specific case.

What is actually interesting is that the self-improvement is bounded, verified, and scoped. The agents are not getting smarter. The tool is getting better. These are different things. The agents still need human-supplied domain knowledge at cycle zero. They still need formal verification to catch their errors. They still need a benchmark cluster to tell them whether their edits helped. What they do not need is a human in the loop deciding which heuristic to tune this afternoon.

That is the right shape for this technology. Not “the AI designs the chip.” Not “the AI replaces the engineer.” Just: the tool gets better while the humans do more interesting work. The verifier stays human. The spec stays human. The judgment about what is worth building stays human. And in the gap between spec and implementation, the boring handcrafted middle layer that has absorbed most engineering talent for fifty years, the agents quietly work the graveyard shift.

I am not sure whether to find this comforting or unnerving, which is probably the right emotional state to hold about any genuinely new capability. What I am sure of is that the next time you hear someone in a keynote talk about AI designing chips, you should ask which layer they mean. If they do not know the answer, they are not the right person to be telling you about it.

The tool designs itself now. The question of what to build with it is still, and thankfully, ours.

───────────────

Dr. Sanjay Basu is Senior Director of GPU and Gen AI Solutions at Oracle Cloud Infrastructure, founder of Cloud Floaters Inc., and author of a couple of newsletters like A Technocrat’s Discernment and Philosophy. He writes on QBism, machine intelligence, and the infrastructure of thought.

Primary source: Yu, C. and Ren, H. (2026). Autonomous Evolution of EDA Tools: Multi-Agent Self-Evolved ABC. DAC 2026. arXiv:2604.15082.

Additional context: NVIDIA GTC 2026 announcements; Cadence ChipStack AI SuperAgent; Synopsys AgentEngineer; Siemens Fuse EDA AI Agent; Google DeepMind AlphaEvolve (2025); SATLUTION (2025).


Comments

Popular posts from this blog

Digital Selfhood

Axiomatic Thinking

How MSPs Can Deliver IT-as-a-Service with Better Governance