Posts

Featured Post

The Split Personality of AI Inference

Image
  How LLM-D Parallel Runs Are Rewriting the Rules of Model Inference Copyright: Sanjay Basu When One Brain Isn’t Enough What if the secret to making AI faster wasn’t building bigger machines, but teaching it to think with two minds at once? For anyone who’s ever typed a prompt into ChatGPT and watched those little dots dance across the screen, there’s an invisible orchestra playing behind the curtain. Large language models don’t just materialize answers from thin air. They’re running a two-act play every single time: first, they digest your question (prefill), and then they generate your answer, token by token (decode). Traditionally, these two acts happened on the same stage, using the same resources. And like any double-booked theater, chaos ensued. Enter LLM-D, the distributed inference framework that said, “What if we gave each act its own theater?” The result? A system that can serve AI models faster, cheaper, and more reliably by splitting the inference process into spec...

Quantum Simulation on a Desk

Image
  Copyright: Sanjay Basu Experimentation with my DGX Spark continues A field report from a weekend of DGX Spark experimentation, written on the road to SC25 I will admit something upfront. Every time I sit down in front of the DGX Spark, I feel a little like I am getting away with something. Not in a criminal sense. More in the sense that this compact workstation is quietly doing the type of quantum simulation work that used to require a noisy rack somewhere in a cold data center. It feels a bit unfair, like owning a personal synchrotron that fits neatly next to a coffee cup. My DGX experimentation continues, and this week I have been focused on quantum simulation libraries from NVIDIA. I spent the weekend running circuits across CUDA Quantum and the cuQuantum Appliance before heading to St. Louis for SC25. Nothing sharpens the mind like a few late-night qubit experiments followed by an early-morning conference flight. This long-form edition is a reflection on that work. A guided t...