MoreRSS

site iconNot BoringModify

by Packy McCormick, Tech strategy and analysis, but not boring.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Not Boring

Weekly Dose of Optimism #186

2026-03-27 20:39:57

Hey friends 👋,

Happy Friday! The sun is shining here in New York City, Duke plays in the Sweet 16 tonight, and there are so many incredible stories this week that I kept having to change what made the cut as new news came out.

Grab a cup of coffee and catch up on all of it in one place. And if you have a second cup, take an hour this weekend to read Electromagnetism Secretly Runs the World.

Let’s get to it.


Today’s Weekly Dose is brought to you by… Quince

Last week, it was cold and rainy in New York. Today, it’s warm and sunny. Aside from the renewed joie de vivre, that means it’s time to switch out the wardrobe.

More and more, when people are thinking about refreshing (and upgrading) their … anything, they ask, “Does Quince make this?” More and more, the answer is yes. Quince is the default starting point for quality, across cashmere, luggage, furniture, fragrance, fine jewelry, and new spring wardrobes.

I have been lucky to be a Quince investor for years. They recently raised at a $10.1 billion valuation. The reason is that they’re able to deliver high-quality across a growing number of categories at surprisingly affordable prices. 100% Grade A Mongolian cashmere. 100% mulberry silk. 100% European linen. 0% brand tax.

They can pull it off because Quince built an M2C (Manufacturer-to-Consumer) operating system with AI demand forecasting, real-time production planning, and direct factory partnerships that cut out distributors, wholesalers, and the retail markup entirely. This week, they even hit #1 in the App Store for shopping. The machine is humming.

Look, I’m biased, but I’m also writing this while wearing Quince jeans and a Quince q-zip. We dress our kids in Quince. And I’m about to do a big spring haul. If you haven’t started shopping at Quince, or haven’t checked out everything they have to offer in a while, join me.

Shop Quince


(1) NASA Goes Nuclear (and to the Moon)

Jared Isaacman’s NASA is shaping up to be as cool as we expected.

On Monday, Isaacman went onstage at the Hill & Valley Forum in Washington for a conversation with Founders Fund / Varda’s Delian Asparouhov and laid out his plans to build a permanent base on the Moon and send a nuclear-powered spacecraft to Mars.

In a normal week, either one would get its own top billing in the Dose, but since JI dropped both at once, let’s Start with the Moon Base. NASA is committing $20 billion over seven years to construct humanity's first permanent outpost near the lunar south pole. “The moon base will not appear overnight,” Isaacman said. “We will invest approximately $20 billion over the next seven years and build it through dozens of missions.” The plan has three phases, with Phase 1 starting now: up to 24 launches and 20+ landings in the next 35 months. That's roughly a mission every six weeks, which sounds like SpaceX cadence.

The first wave will MoonFall surveillance drones to map terrain around the south pole, fission reactors and RTGs for power, and enhanced rovers for scouting. Phase 2 adds pressurized rovers built in partnership with Japan's aerospace agency and semi-habitable modules after Artemis IV. Phase 3 is the full base: permanent infrastructure for sustained human presence. We’re going, and we’re staying. Moon will be a State.

Between NASA and SpaceX, the Moon has been getting a lot of love recently, and Mars has been pushed out. But Isaacman also announced some Mars plans.

SR-1 Freedom will be the first spacecraft to use a nuclear fission reactor for propulsion beyond Earth orbit. A HALEU-fueled reactor delivers 20+ kilowatts of electrical power, driving xenon ion thrusters via a closed Brayton cycle. It launches in December 2028 (likely on a Falcon Heavy) and should arrive at Mars about a year later. Once there, it will deploy three Ingenuity-class helicopters carrying cameras, ground-penetrating radar, and radios to scout human landing sites and hunt for subsurface water. To summarize, we’re sending a nuclear-powered space craft to Mars to release scout helicopters.

SR-1 repurposes the already-built Power and Propulsion Element from the scrapped Gateway program. Rather than shelve expensive hardware when priorities shifted, Isaacman's team pointed it at Mars. The reactor will activate within 48 hours of launch, then it’s off to the Red Planet.

Nuclear is going to be critical to our space ambitions, whether going or staying. We wrote about it in our Deep Dive on Radiant. Fission reactors will power the Moon Base and a fission reactor will propel SR-1 to Mars. Eventually, we’ll have nuclear-powered civilizations as far as the eye can see.

Looks like putting Isaacman in charge of NASA set off a … nuclear chain reaction.

(2) Terraform Industries Breaks Ground on Synth Hydrocarbon Site

Casey Handmer on TBPN

We’re going to need a lot of energy here on earth, too, and even as we add a ton of clean nuclear, solar, and even geothermal, we’re going to need those sweet, sweet hydrocarbons. We just want them to be clean, too.

Good news. Casey Handmer and the Terraform Industries team have been hard at work figuring out how to turn sunlight and air into pipeline-grade natural gas for less than the cost of drilling it. In March 2024, his team in Burbank proved the chemistry works by producing synthetic methane end-to-end for the first time. Now, two years later, Terraform Industries has broken ground on a manufacturing site in Kern County, California, to scale up production.

Kern County is a lucky place: it’s one of California's biggest oil-producing regions and it's also absolutely sun drenched. Those rays are what Terraform wants. The Terraformer system is straightforward in concept: solar electricity powers water electrolysis for hydrogen, direct air capture pulls CO2 from the atmosphere, and a reactor combines them into carbon-neutral methane. What's hard is making it cheap enough to compete with drilling a hole in the ground. Terraform's bet is that plummeting solar costs will get them there, and their qualified electrolyzer stack (under $100/kW) and full-scale reactor suggest they’re heading in the right direction.

Last week, we wrote about one Australian, Chris Power, launching a huge new manufacturing site in Alabama. This week, another Aussie, Handmer, is launching a hydrocarbon manufacturing site in the California desert. Chris and Casey are two of the smartest people I’ve met. With them working on the country’s behalf, it’s a very g’day in America.

(3) Arbor Energy Lands Billion-Dollar Turbine Deal

TechCrunch

But wait… how are we going to turn all of our natural gas into power to feed the data centers? I’m glad you asked.

Arbor Energy, founded by former SpaceX engineers Brad Hartwig and Andres Garcia Clark, just landed a deal with GridMarket to supply up to five gigawatts of its Halcyon turbines, or roughly 200 units, valued in the single-digit billions. They expect the first turbine to connect to the grid in 2028 to be manufacturing 100+ per year by 2030.

Turbines are so hot right now. They’re such a big bottleneck to the data center buildout that their shortage is a big driver behind the rush towards orbital data centers, where they can be powered by the sun. Traditional gas turbines from the big OEMs are backordered until 2032. Hartwig puts it simply: “Everyone wants more power. They wanted it yesterday. The time frames are compressing and the scale is getting larger.”

Arbor takes rocket turbomachinery designed for spaceflight, 3D-prints it into 25-megawatt modular turbines, and sells them to the data centers that needed power yesterday. The turbines run a supercritical CO2 cycle with oxy-combustion, which means zero operating emissions on natural gas, or carbon-negative operation if you feed them biomass waste. They're fuel-agnostic, pre-assembled, and 3D-printed, which is important because it decouples manufacturing from the supply chain bottlenecks (specialized blades, vanes, castings) that plague conventional turbine production or even alternatives proposed by companies like Boom Supersonic.

Our friends at Cantos can’t stop winning.

(4) Centivax raises $37M to advance universal flu vaccine

Katherine Davis for Axios

All of this progress in rockets and energy is sick… but you might not have to be.

Last month, we covered research on a single vaccine to protect against all colds and flus. The company behind that research, Centivax, whose homepage reads “smash the mutants,” just announced that they’ve raised $37 million in follow-on funding led by Structure Fund, with Meiji Seika Pharma, Sigmas Group, Kendall Capital Partners, and Stripe founders Patrick and John Collison joining. The company has now raised $133 million, including from our friend at Amplify Bio.

When we get the flu vaccine every year, we line up for a shot that's been reformulated to match whichever strains scientists think will circulate that season. Sometimes they guess right and sometimes they don’t. Centivax wants to change that with a single dose that protects against all flu strains, with a booster needed only every two to four years instead of annually. The company's universal flu vaccine is in a Phase 1 trial, with results expected by end of year.

“If our data looks good by the end of this year,” CEO Jake Glanville said, “effectively, the pandemic era for influenza is over.”

Centivax is applying its approach, targeting conserved regions across strains rather than trying to keep up with seasonal mutations, beyond flu, too: universal snake venom, Alzheimer's, malaria, and cancer vaccines are in the pipeline. As Glanville puts it, “You want to treat all the flus, not just one strain. You want to treat all the snakes, not just one snake.”

Of course, it wouldn’t be a medical miracle entry without some FDA hair. Last month, the FDA about-faced on Moderna’s mRNA flu vaccine proposal and put it back under review. Just to be safe, Centivax is hedging by running trials in Europe and Australia alongside the US. “If something unexpected happens with the FDA, we would just proceed in Europe,” Glanville says.

The possibility that Europeans get a universal flu vaccine before Americans because of regulatory dysfunction is both absurd and completely believable. Hopefully, if the FDA blocks the universal everything vaccine, they will also watch my kids during sick season.

(5) How to Turn a Chicken Egg Into a Drug Factory

Carl Zimmer for The New York Times

I am so clucking eggcited that Neion Bio is finally coming out of its shell.

Today, pharma uses Chinese hamster ovary (CHO) cells to produce biologics like Keytruda and Humira in huge stainless steel bioreactors. Merck recently spent $1B on a single Keytruda facility.

In early 2024, Elliot wrote that chicken eggs are much more efficient bioreactors. They run on grain and water, produce six grams of protein per unit, and we already farm them at massive scale.

As the (excellent) Neion website says, chicken egg drug manufacturing can be “extremely low COGS, hyper resilient, CapEx avoidant, ultra scalable, very high reliability, and lower environmental impact.” Nature evolved the chicken egg to be that way. The hard part has been that the science hasn’t been able to modify what’s inside the egg fast and reliably enough to actually manufacture drugs.

Sam Levin, who we previously backed at Melonfrost, and Dimi Kellari explored the frontiers of chicken egg research, which they could get their arms around because the cutting edge stuff is happening in a very small number of labs around the world. They want to all of them, and they realized that the time was right to build a company that uses nature's bioreactors to produce drugs at a fraction of the cost.

In today's NYT article, Sam predicts that the cost can be 1/10th or even 1/100th of the current cost, and that just 3,900 hens could meet global Humira demand.

I'm proud to back Sam, Dimi, and the Neion Bio team out of not boring capital as they work to hatch the balk of the world's biologics and dramatically lower the cost to produce critical drugs, right here in NYC. And I'm sure they're happy that I can share my chicken puns with all of you instead of just replying to investor updates with them.

Extra Dose: Science Breakthroughs, GPS, Memory Without Brains, Astro Mechanica

Join not boring world to get an even bigger Dose of Optimism.

Read more

Electromagnetism Secretly Runs the World

2026-03-24 20:57:34

Welcome to the 520 newly Not Boring people who have joined us since our last essay! Join 260,690 smart, curious folks by subscribing here:

Subscribe now


Hi friends 👋 ,

Happy Tuesday! Welcome to our newest installment in what has become an unintentional two-part series on non-LLM models that can do things that humans can’t, things that will give us superhuman abilities in the physical world. They’re also both co-written with founders you’d expect to find in SF but are building right here in the greatest city in the world, NYC.

The first was last week’s essay on World Models with Pim de Witte.

Today’s is about machines that can intuit electromagnetic fields in a way almost no humans can that will help us design and build better electromagnetic (EM) systems.

As you know, I’m very bullish on the growing role of electromagnetic systems in the economy. After Sam and I wrote The Electric Slide, Arena Physica CEO Pratap Ranade and I traded emails. In one of them, he wrote:

The electrical and electromagnetic components are the “nervous system” of modern hardware and contribute to 40-50% of failures. Our ability as a nation to test and build it has declined, but –– imo even bigger –– as a species, we’re still unable to wield electromagnetism to its full potential.

Over the past seven months, we’ve developed a friendship, and Pratap has broken my brain many times. One of the things that’s most fascinated me is the idea he’s betting his company on, the one he emailed me about: humans can’t intuit EM, and it’s a bottleneck to the electric progress we both want to see. There’s no reason machines can’t be taught to understand them much better than we can, though.

For the past few years, Arena has been building AI tools and deploying expert electrical and RF engineers to help companies design, develop, and debug electromagnetic hardware. They’re working with companies including AMD, Anduril, and Sivers Semiconductors. They are backed by investors including Founders Fund, Peter Thiel, Initialized (Garry Tan), Shield Capital, and 137 Ventures.

Today, they’re re-branding as Arena Physica with the expanded mission to develop “Electromagnetic Superintelligence.”

This is an essay about how to teach machines to see the fields that we can’t, and what the world might look like if we can.

Let’s get to it.


Today’s Not Boring is brought to you by… The Pitch by Deel

Normally, our sponsors would like me to tell you why you should give them your hard-earned money. This time, Deel has asked me to tell you how they can give YOU money in The Pitch.

Deel recently launched a global tournament with $15M in prizes for startups: the top 10 get a $1M investment each, and 100 regional winners will get $50k. You don’t need warm intros and you don’t need to pay to apply. It’s just a pure competition for the best entrepreneurs.

Pitch in just two minutes for the chance to win a $1M investment, access to a global ecosystem designed to help you scale (partners and sponsors include Stripe, Google, AWS, and a16z), and to get your startup in front of global leaders. This is your shot.

Apply Now


Electromagnetism Secretly Runs the World

Electromagnetism secretly runs our world. “Secretly,” because only a few people on this planet can intuit how it works.

Your phone’s GPS is powered by satellites that broadcast electromagnetic (EM) waves with timestamps. The wi-fi in your apartment is created by EM waves bouncing around the walls. Air traffic control is radar, as in EM waves that pulse out and listen for echoes off aircraft. When Maverick locks onto a bogey in Top Gun, he’s using a phased array radar steering EM beams electronically. Contactless payment? EM. Microwave oven? EM. The fiber optic cables carrying the internet across the ocean floor and through Somos’ network? That’s light… which is also EM.

Every single wireless signal, medical image, radar sweep, every chip talking to another chip inside a data center. All of it is electromagnetic waves, shaped and directed by physical structures designed to manipulate these waves. And, as electricity and intelligence race to define our era, EM’s presence is only growing more pronounced. In our data centers, chips communicate with each other via short-range EM waves. If Elon successfully moves the data centers to space, AI will be beamed down from satellites to your device via EM waves.

As Packy and Sam wrote in The Electric Slide, everything that can economically go electric will. Cars, trucks, buses, drones, boats, stoves, heat pumps, batteries, bikes, even planes, anything that moves, heats, lights, computes, or converts energy is moving from mechanical to electric. All of those newly electric things will be full of EM components.

In 1970, electronics accounted for five percent of a new car’s cost, on average. By 2020, that number reached forty percent. By 2030, it’s anticipated that the cost of the electronics of a consumer automotive vehicle will reach fifty percent of the vehicle cost.

Electronics comprise 35% of the cost of the F35 Lightning II, more than the cost of the engine itself, and 15% of the Pratt & Whitney F135 engine, which costs $20 million. By the 2030s, when it’s projected that defense contractors will be building the F-47, they’ll be spending over 40% of the $300 million airframe on electronics.

This is good. We want to see this electrification continue. Electric machines perform better with less impact on the environment, give us capabilities combustion engines can’t, are better-suited for autonomy, and are riding cost/performance curves that will continue to widen the advantage.

But among a number of challenges addressed in The Electric Slide particularly as it relates to production, there’s an equally large one looming in the research and development of new and better electromagnetic machines: our electromagnetic capabilities are bottlenecked by the very small number of humans who actually understand how any of this works.

There is a reason radiofrequency (RF) engineering — the practice of designing hardware that shapes and directs EM waves — is often referred to as black magic. There are maybe ten people in the world who can deeply intuit electromagnetism, who can see which shapes will create which EM fields in their mind’s eye1. I am not one of those people, but I’ve met them. I’m hiring them at my company, Arena Physica, and I went to school with many of them.

There was a guy in my physics program about whom our professor asked, “You know what’s special about this guy?” We all said no. “This guy thinks like an electron.”

What he meant was that, electrons, if they were sentient, would feel all of these different fields pulling at them. Electrons would probably have an intuition for this feeling, the same way we have an intuitive feel for gravity, how we just know that when we let go of a ball, it will drop to the ground. Those friends of our ancestors’ who couldn’t intuit gravity did not live long enough to reproduce.

Some people – a vanishingly small number – have spent enough time studying, testing, designing, and simulating electromagnetic systems to be able to intuit them like gravity. But for the rest of us, electromagnetism is mostly invisible.

For the vast majority of human history, we haven’t needed to see beyond the visible spectrum in order to survive. And so we haven’t. Those of our ancestors’ friends who wasted precious resources on seeing the full spectrum of EM waves wouldn’t have lived to pass on these traits, either.

Humans can see a small part of the electromagnetic spectrum, the “visible light” portion with wavelengths between 400 nanometers (violet) and 700 nanometers (red). We don’t see shorter wavelengths (ultraviolet light, X-rays, or gamma rays) or longer wavelengths (infrared, microwave, or radio waves).

This has served us just fine. Until electromagnetism came to run the world.

We have a fundamental force that we rely deeply on, but one that very few of us can work with naturally. This slows technological progress and limits what we can make.

Fortunately, AI doesn’t share our blind spots. It is particularly good at seeing patterns, at making connections and understanding dependencies that are not necessarily intuitive to humans.

Because of this, we believe that computers will be much better at grasping electromagnetism than we are. We should be able to build a Large Field Model (LFM) — like an LLM that generalizes across language, except ours generalizes across EM. We should be able to use this LFM to understand EM waves and shape them to do what we’d like them to do.

That’s the big bet we’re making at Arena Physica. To understand why we’re making it, I want to first make sure you understand electromagnetism.

A Brief Primer on Electromagnetism

Packy and Sam gave A Brief History of Electromagnetism in The Electric Slide.

I’m going to add to that with A Brief Primer on Electromagnetism. I’ll sprinkle in relevant history, but my goal is to make sure we have a working understanding of electromagnetism.

There are four fundamental forces that govern how everything in our universe works:

  1. Strong force

  2. Weak force

  3. Gravity

  4. Electromagnetism

The strong force and weak force operate at subatomic scales. The strong force binds protons and neutrons together in atomic nuclei. The weak force enables radioactive decay and nuclear fusion.

Gravity is the weakest of the four forces by an enormous margin (roughly 1036 times weaker than electromagnetism). Yet, it dominates at cosmic scales. It only attracts, never repels, which means its forces keep adding up. And it acts on every single particle with mass or energy. It’s also mysterious at a fundamental level: gravity and quantum theory are incredibly powerful theories for how our world works, but they are fundamentally incompatible. This remains one of the deepest unsolved problems in physics. But we have a strong, intuitive relationship with gravity. In our day-to-day lives, we can feel the force.

Electromagnetism is the force we interact with most directly in everyday life. It’s also the one we’ve industrialized most aggressively. It governs light, electricity, magnetism, and chemistry, essentially everything about how matter behaves above the nuclear scale. It’s why matter has structure, why chemistry works, and why technology works. It’s responsible for the structure of atoms (electrons bound to nuclei), the bonds between molecules, the rigidity of solid objects, and all electronics and communication technology. Unlike gravity, electromagnetism has both positive and negative charges, which means it can attract or repel, and large accumulations tend to neutralize themselves. Our mathematical understanding of electromagnetism is extremely accurate: it is described by quantum electrodynamics (QED), the most precisely tested theory in all of science.

And yet… despite this precision, electromagnetic systems can be deeply counterintuitive. RF engineering, for instance, has a reputation as black magic. The wave-like, distributed nature of fields at certain frequencies produces effects that violate the intuitions built from simple circuit theory.

But let’s try to build up our intuition as best we can.

Every force carries energy. Electromagnetic energy comes in quanta we call photons, particles of light, but for most of what we build — antennas, radars, communication systems, phased arrays — it is easier to think about it as waves, in terms of frequency, wavelength, and phase. A photon has different amounts of energy based on its frequency, which can be seen on the electromagnetic spectrum. A super-high energy photon would be something like an ASML machine making chips using EUV (extreme ultraviolet). EUV is very, very high frequency. Therefore, it is very high energy, and in turn, very short wavelength. As you go through the visible spectrum and get to the other side, you reach infrared, and electromagnetic energy becomes heat. Then you get to RF (radiofrequency). With RF you have very low-energy photons. But it’s really just all electromagnetic waves that follow the same principle. High energy = high frequency = short wavelength and vice versa. (Explore the EM spectrum here.)

Now, think about a prism. A prism is an object or a material that treats incident EM waves — or different photons — differently. So if you’re a red photon, your refractive index, or how much you bend, is a certain number. If you’re a blue photon, it’s different. Refract the whole beam and you end up with that beautiful rainbow on the other side of the prism.

Brittanica

The prism is an early example of humans manipulating electromagnetic fields. People noticed that light, when it passed through glass or a crystal shaped in a certain way, would form a rainbow. We’ve come up with many ways to manipulate electromagnetic fields since, the most consequential of which is also the simplest.

If a caveman discovered he could manipulate electrons, the very first thing he might do is the simplest possible thing: electron on, electron off. On, off. One, zero.

The prism is actually much more sophisticated than the on, off switch. It’s more subtle, more expressive, and its implications are more powerful. But computing is based on the caveman’s idea of flipping the switch.

Early computers were literally built on this simple idea: they used mechanical switches known as relays to compute. When the metal touches, current flows (1/on). When it separates, current stops (0/off).

One problem with mechanical switches was that, when you switch often in air, the air ionizes and creates tiny bolts of lightning that can jump across the gap and create an arc. These arcs can make your “bit” unreliable. Sometimes it’s supposed to switch, and it doesn’t.

The other problem was that they are very slow. It was the advent of the electronic switch that put us on a path to be able to now build processors that can work at Ghz speeds (109 cycles per second).

So we invented vacuum tubes2. They removed the air. Without atmosphere, there was no arcing. But vacuum tubes were fragile, power-hungry, and couldn’t scale. The great breakthrough was the semiconductor, materials like silicon that can have their conduction controlled. They can be made to conduct or not conduct (hence semi-conductor) based on applied voltage. Semiconductors enabled the transition from mechanical to digital, and it gave us the transistor, which are tiny silicon devices that switch on or off using voltage. This gave us Moore’s Law, which gave us Boolean logic, which gave us everything in modern computing. That singular innovation, the transistor, has produced most of our technological advancement over the past seventy years.

But if you read Gordon Moore’s 1965 paper, the one in which he described what would come to be known as Moore’s Law, you’ll find that only the first half is about digital silicon; the second half is about analog silicon.

Cramming More Components Onto Integrated Circuits, Gordon Moore

Nobody paid attention to the analog part, but I think it’s even more fascinating today than the digital one.

Digital silicon is about switching: transistor on or off, conducting or not, one or zero. All the gates, all the logic, all of computation follows from that binary foundation. It’s powerful, but it’s also, as we’ve discussed, the simplest possible thing you can do with an electron. It’s caveman math.

Analog silicon is about shaping. Instead of just on/off, you’re asking: what if I could bend the electromagnetic wave? What if I could guide it, direct it, absorb it at specific frequencies and reflect it at others? In practice, this is RF front-ends, antennas, packages, and printed circuit boards (PCBs) behaving like a single, unified electromagnetic object.

This is how the world works too. The world is analog. The world does not work in 0s and 1s, but rather in the continuum in between them. Even if all computation is done digitally, you’ll need to deal with analog signals and shape waves the moment you need to interact with the real world (for example, capture sound in a microphone, produce sound in a speaker, send wireless signals over the air, send light on optical fibre)

Remember the prism? That’s what analog silicon does, but for all electromagnetic frequencies, not just visible light. Instead of glass bending light, you can use carefully shaped conductors printed on silicon to bend, direct, and shape EM waves.

This is where we leave the realm of deterministic computing and enter a world of black magic.


Try This At Home

Here’s an experiment. You can try this at home.

Materials List: COPPER WIRE, COMPASS, BATTERY.

Take your copper wire, connect it to a battery, and run current straight through it. The magnetic field it produces will wrap around the wire in a helix. You can confirm it’s doing this by holding a compass near it and watching the needle deflect perpendicular to the wire.

Now, coil that same wire into a spring shape (a solenoid) by wrapping it around a pencil or screw 10-15 times. Run current through it. The magnetic field is completely different: instead of wrapping around the wire, it shoots straight through the center of the coil. Same wire, same current. But a different shape = radically different field.

This is the fundamental game of electromagnetism: geometry determines behavior. Every antenna, radar, or phased array tile is just a more sophisticated version of this principle. Find the right shape, and you can make electromagnetic fields do almost anything.


To understand why shapes matter so much, consider what happens when an electromagnetic wave hits a conductor.

A conductor is special because it has free electrons. Free electrons are not locked into a lattice like in an insulator, but instead swim around in what physicists call an “electron sea.” When a photon (an EM wave) hits this electron sea, those electrons start to move in response. They oscillate with the wave.

This is fundamentally how an antenna works. The old bent TV antenna on your grandparent’s roof was shaped specifically to receive UHF frequencies broadcast from a distant TV station. The EM waves traveling through the atmosphere would hit the antenna, excite the electrons in the metal, and those oscillating electrons would travel down the wire into your TV as a signal.

That signal carried information, like encoded images of I Love Lucy, compressed into patterns of electromagnetic oscillation, broadcasted through the air, absorbed by your antenna, decoded by your TV. If you step back and think about it, this entire chain is completely absurd. We transmit moving pictures through the air using invisible waves. And turning those waves back into pictures all comes down to the shape of a wire.

Radar works basically the same way, except it’s more high-powered and moves in reverse.

World War II accelerated radar. It also showed us how badly we needed it. The Allies were being pummeled and needed to track incoming threats. They turned to radar, which had developed quickly thanks to the war. In the late 19th and early 20th centuries, Heinrich Hertz (of Hz fame) showed that radio waves could reflect off objects. Several physicists also noticed that radio signals behaved strangely when ships or other objects were nearby. Through the 1920s and early 1930s, scientists in the U.S., U.K., Germany, France, the Soviet Union, Italy, and Japan all experimented with using radio echoes to detect objects.

In 1935, a Brit named Robert Watson-Watt (no relation to the steam engine Watts) proposed and then demonstrated a practical aircraft-detection system using pulsed radio waves. This led to the Chain Home early-warning network along the English coast. Chain Home was operational at the start of WWII and gave the Royal Air Force so much of an early warning in the Battle of Britain that it’s often credited as a key factor in preventing a German invasion. The United States picked up development a bit later, with the benefit of British tech transfer, and scaled up the technology’s capabilities and manufacturing. In the U.S., Alfred Loomis led research efforts at Tuxedo Park3 and helped establish MIT’s Rad Lab, which developed fire-control radar, airborne radar, and navigation radar. Germany built parallel systems that pushed the state-of-the-art in different directions.

Radar, instead of receiving a broadcast signal, transmits a beam on multiple wavelengths, waits for it to reflect off something (like a bomber), and then listens for the echo. If the object is big enough and close enough, you can detect it.

But to scan the sky, you need to point your beam in different directions. In the 1940s, that meant literally spinning a large dish antenna. You needed mechanical motors to do it. A mechanical gimbal rotating a giant antenna.

Ames Type 7, an early WWII radar

This worked, but it had obvious limitations. Moving parts break, for one. For another, the dish can only spin so fast. Today, Starlink satellites need to update their pointing multiple times a second, since they are moving at 7.6km/second. Try doing that mechanically and for 5,000 simultaneous beams.

This is where the second half of Moore’s 1965 paper becomes relevant. Moore realized you could use transistors to solve the spinning-dish problem. Replace the mechanical movement with electronic steering.

The key insight is constructive and destructive interference, the same phenomenon you see when ripples on a pond meet and either amplify or cancel each other.

Imagine you have a grid of small antenna tiles instead of a single big dish, like a checkerboard where each square is a tiny antenna. Each tile can emit a signal. Each tile is the whole RF front-end and antenna structures across chip/package/PCB that acts like a single, holistic EM object. Now, if you start the signal from the leftmost tile first, then the next tile a tiny bit later, then the next, and so on and so forth, the wave fronts from each tile will interfere with each other. Get the timing right and they’ll constructively interfere in one specific direction, creating the effect of a single, focused beam pointing in the direction you want.

Change the timing pattern, and the beam will point somewhere else. You can replace moving parts with analog and digital logic that controls when each tile fires.

This is called a phased array. And it’s how modern radar works. If you want to develop a better intuition by playing with it, we built a little simulator here.

The radar on an F-35 is called an AESA (Active Electronically Scanned Array). Nothing on it moves. It’s just a grid of semiconductor tiles, and the “beam” sweeps across the sky purely through timing. It’s also how Starlink works. Each Starlink terminal has 1,280 of these beam-forming silicon tiles. That’s why you can buy a flat panel for $300 that does what used to require a million-dollar spinning dish.

Starlink Terminal

What’s happening on those tiles?

Remember: digital silicon is about transistors switching on and off. But the tiles in a phased array shape electromagnetic fields through their physical geometry.

Think back to the TV antenna, a bent wire specifically shaped to receive certain frequencies. Imagine you could print that shape onto a silicon chip, layer by layer, laying down copper traces in specific geometries, creating structures that interact with EM waves in precise ways.

On one layer, you might have a spiral. On the next, a grid. On the next, something that looks like a QR code. Stack them up with tiny connections between layers called vias, and you’ve created a three-dimensional structure that can emit, receive, absorb, and reflect electromagnetic waves at specific frequencies and in specific directions, which you have full control over.

A cross-section of a modern chip; the multilayer geometry shapes how EM fields behave (Physicsworld)

This is what a phased array tile actually is: a 3D sculpture of copper and silicon, designed so that electrons create exactly the EM field you want when electrons flow through it.

Why This is So Hard to Build

Creating an EM field of your liking is all about geometry, which means it’s all about shapes. But how do you know which shapes to make?

With digital silicon, the rules are relatively simple. Transistors are either on or off. You can simulate billions of them with perfect accuracy. The design problem is about routing and timing, but the physics is well-behaved.

Analog silicon is different. The physics is wave physics, and waves do things that violate our intuitions.

At optical (high) frequencies, we often get away with “ray optics” intuition; light mostly travels in straight-ish lines, reflecting and refracting, and you can treat its components as fairly local.

At RF, the wavelength is big enough that your whole device becomes part of the circuit. Fields couple into enclosures, PCBs, screws, nearby tiles… Everything talks to everything. That’s why RF feels like black magic, and why you have to simulate the whole object to predict how it’ll perform.

When you’re designing a Starlink tile, for example, you can’t just model the tile in isolation. The EM waves emanating from it will interact with the entire Starlink unit: the metal casing, the other tiles, the mounting bracket, the support structure. You have to simulate the whole system at once.

This is why there are no automated tools for analog circuit design. Digital circuits can be “synthesized” from code; a digital designer can write “RTL” code that describes how his digital circuit works. Then, there are tools that can read the code, and “compile” it to a chip. But no such tools exist for analog design. There are no “standard cells” for analog, no standard analog designs, no standard building blocks. Everything interacts with everything.

Which is why there’s no ARM for analog silicon4. There are no companies that can sell “IP,” standardized circuit designs, to a multitude of customers in a highly profitable fashion — no such standard circuit exists. Every new system is different, and as a result, each new customer has different needs.

ARM can design a chip that works in any phone because digital chips are self-contained. But an analog phased array tile designed for a Starlink terminal won’t work in a different satellite. The interference patterns will be completely different!

And the simulation tools are slow. The equations governing electromagnetic fields are called Maxwell’s equations, four partial differential equations that are notoriously difficult to solve.

They’re just equations. What’s the problem?

If an EM wave is at a higher frequency, it’s more “particle-like”— intuitively, it’s like a ball — you know where it is, and it bounces off stuff cleanly. If the ball is in one corner, it doesn’t affect anything in the other corner. As EM wavelengths get longer (into RF), they become more wave-like, and the particle is sort of “spread out.” The waves start to interfere with each other a lot, like ripples on a pond. They can either strengthen or cancel each other out.

So, if you’re NVIDIA and selling high frequency chips in boxes, you can sell a single product. You can design one GPU and sell it to everyone. There is no difference between the Navy putting your chip on a ship, or Sony putting it in a Playstation. They can all just buy the chip and plug it in. But, for example, if you’re buying components for a phased array system, you have to model the entire system, because it’s type 2, not type 1. A design for a Navy ship won’t work in a Starlink terminal. The EM fields interact with everything around them—the metal casing, the mounting structure, nearby components. Change the environment, and you need a completely new design. Everything becomes a custom services problem that is rate limited by these rare experts and simulation.

In short, the solvers are slow, even with supercomputers or programs like Ansys, because the equations are tough to solve and require expertise to wield. The reason the equations are very hard is boundary conditions (the edges, where smooth calculus breaks down), e.g., a sharp metal edge can cause problems by creating strong electromagnetic reflections that cause fields to concentrate in unexpected ways.

Running a full simulation of a proposed design can take hours or days. Here’s an example design loop: make your best guess at a shape, wait hours for simulation, discover it doesn’t quite work, adjust the shape, wait hours again. As we collaborate with field experts, we witness each simulation iteration with legacy tools taking a week. That’s not enough ‘shots on goal’ to develop Electromagnetic Superintelligence.

RF design can’t be done by brute-force computation. The search space is infinite and each evaluation takes too long.

Take a simple two-layer circuit where each pixel in a 64×64 grid can be either metal or dielectric. That’s already roughly 264x64, or 101,233, possible configurations for a single, small component. The entire history of human RF design has explored a vanishingly small fraction of that space, obviously. Here, see how many of those configurations you can come up with.

Navigating this search space requires intuition. You need someone who can look at the desired field pattern and just... sense what shape might produce it.

The people who can do this have spent decades building up a feel for how electrons move through structures, how fields bend around corners, how waves interfere. They can sketch a spiral on a whiteboard and tell you roughly what frequencies it will emit strongly and which it will absorb. Aside from my classmate who could see like an electron, though, this intuition isn’t natural even to those special few. They acquire it painstakingly over long careers. Because, unlike gravity, there was never evolutionary pressure to understand electromagnetic fields outside of the visible spectrum. We don’t feel them. They’re invisible to us.

I watch my baby daughter learning about the world. She already has an intuition for mechanics. She knows that if you roll a glass off a table, it will break. She has no intuition for electromagnetism, which is probably genetic. 99.99% of people don’t.

It is a miracle that we’ve been able to manipulate EM waves to our purposes to the extent that we have. But the world is only getting more electromagnetic, and we will need a lot more shapes.

That means we need to build something that does have intuition for electromagnetism.

AlphaGo for Electromagnetism

In 2016, DeepMind’s AlphaGo defeated Lee Sedol, one of the greatest Go players in history.

The moment that stuck with everyone was in Game 2, Move 37.

The expert commentary went something like this: “That’s a mistake.” Then: “That’s stupid.” Then: “That’s a very strange move.” And finally: “That’s beautiful. That’s elegant.”

AlphaGo had done something no human would have tried, a move so unconventional that the world’s best players initially dismissed it as an error. But it worked. The machine had discovered a strategy that humans, despite thousands of years of playing Go, had never found.

What made AlphaGo possible? Two things. First, Go has clear rules and a perfect simulator. You always know exactly what state the board is in and which moves are legal. Second, because of these limitations, a computer can play millions of games against itself very quickly. AlphaGo learned by playing more games of Go than all humans in history combined.

We want to do what AlphaGo did for Go — but for physics. What if we could build a system that played millions of “games” of electromagnetic design and developed an intuition that humans simply can’t acquire?

There was an obvious obstacle between us and that dream. AlphaGo worked because Go is a perfect simulation. You know exactly what happens when you place a stone on the board. But physics is more complicated, and the simulators are slow. Maxwell’s equations take hours to solve. You can’t “play a million games” overnight.

So we needed to build the simulator first.

The EM foundation model that we’ve built, and continue to scale, is the simulator for EM physics.

The starting point for what we’ve built is known as a neural surrogate. The idea is simple: instead of solving Maxwell’s equations from scratch every time (which is slow), you train a neural network to approximate the solutions (which is fast). It’s like the difference between calculating the sine of an angle by hand versus looking it up in a table, except the “table” is a neural network that can interpolate to angles you’ve never seen before.

Traditional physics simulators work by brute force. They break space into tiny chunks, apply the equations at each point, and iterate until the solution converges. It’s accurate, but a single simulation can take hours.

But what we’re building goes beyond a surrogate. Most surrogates in physics are narrow: trained to approximate one specific simulator for one specific class of problems. Arena Physica’s model learns the relationship between shapes and fields directly, allowing it to generalize. It’s not a faster calculator (it’s not a calculator at all). The neural surrogate is learning the syntax of physics. Just as GPT learned the “logic” of language, our model is learning the “logic” of fields. Show it enough examples of “this shape produces this field pattern,” and it learns to predict new patterns for new shapes almost instantly. We’re talking about 18,000x speedups, hours to milliseconds.

If you’re reading closely, you might be thinking to yourself: of course you can go faster if you’re just trying to get an approximate answer and not a perfect solution.

Good catch. This is where the magic happens.

When you’re searching for good designs, speed and direction matter more than precision.

Think about how an experienced RF engineer actually works. They use their intuition to filter out ideas that probably won’t work and to get the rough shape of ideas that might. Then, they simulate those. They make fast, approximate judgments to decide where to invest their slow, precise simulation time.

Arena Physica’s model does the same filtering, just much faster. It doesn’t need to tell you exactly how well a shape will perform. It needs to tell you how each shape will perform relative to the others. Good enough for search is a much lower bar than good enough for publication.

Speed lets us flip the problem around. Instead of asking “what field does this shape produce?” we can ask “what shape produces this field?” That’s generative design. We specify what we want, say, an antenna that transmits strongly at 28 GHz but rejects interference at neighboring frequencies. The system uses our desired state to generate shapes that might achieve our goals.

Then, we pair two models in a loop: one that generates designs, and one that evaluates them5.

The generator proposes a batch of shapes, many of them wild, strange, things that no human would come up with. Move 37 shapes. The evaluator characterizes them all in seconds, directionally: this one’s terrible, that one’s promising, this one’s interesting. The best candidates get refined via small variations and perturbations. The evaluator evaluates the refinements. Repeat.

At each step, we ask, “Is this one better for my goal than the shape we had before?” Because we know what the rules are, like AlphaGo, and what our goals are, like AlphaGo, we can reward the model for getting closer. And like AlphaGo, by making simulation cheap, we can explore much more of the design space than we could have if everything we wanted to try required a precise multi-hour simulation.

In Asymmetry of verification and verifier’s rule, OpenAI’s Jason Wei describes the “verifier’s law.” Essentially, it says that any task that’s easy and fast to verify will be automated by AI. The hard part about our world is verification relies on special humans and simulators that are slow and expensive. By attacking this first via the foundation model for fields, acting as a fast simulator, we’ve made this problem accessible by generative AI for the first time. Our generator learns the weights through our feedback loop.

This is the same loop that made AlphaGo work: generate, evaluate, learn, repeat.

Run the loop for yourself here. You get a visceral sense for the importance of speed.

Of course, this loop only works if you have enough training data to feed it, and unlike LLMs, which can scrape the internet for training data, EM field simulations don’t exist in the wild. Nearly every single data point has to be created. So we’re building our own Data Factory.

To do that, we’re hiring the best RF lead designers we can find. Put them in one place, where, theoretically, we can amortize the capabilities of this scarce group of people over all sorts of existing and newly possible customers. Have them create designs, give feedback, test their designs, run through the loop.

We generate random designs synthetically, our experts create seed designs that our system can then procedurally amplify, and we fabricate the top candidates and pipe real-world measurements back into training.

Expert-Created Design Templates (left) and Random Designs (right)

The Data Factory has three layers: high-volume synthetic data, high-information expert-seeded data, and ground-truth fabrication data.

Because of how manual the Data Factory is today, we need to pick our use cases strategically. We are starting with analog silicon (chip packaging, phased array components, RF front-ends) and the full phased array system we mentioned above. We’ll be expanding into new domains, like superconducting quantum computing, in conversation with our partners.

The factory is the moat. You can’t build a foundation model for EM without it, nobody else that we know of is building one, and to try, you’d have to hire from the very small pool of experts that we’re bringing together at Arena Physica.

Then, when we’ve converged on something promising, by feeding the Data Factory’s output into our loop, we validate it by running the slow, precise traditional solver on our best candidate. Or better yet, by fabricating the design and validating it in the real world.

Remember why there’s no ARM for analog? At RF frequencies, the wavelengths are long enough that EM waves interact with everything around them. When designing a phased array for a Starlink terminal, you can’t just model the chip; you have to model the chip, the circuit board, the metal casing, the mounting structure, everything. It all affects how the EM waves behave.

Because of that, we’re even using our own components to build an entire phased array system for imaging and detection from scratch.

Arena Physica’s Teraherz phased array with 512 antennas and 32 silicon phased array tiles

We’ll tape out silicon before the end of the year. And, for any part of the problem that’s not analog, we’re actually using our agentic stack — our hardware-aware agents, operating on our metagraph – a dynamic graph representation of the hardware – talking to tools via MCP — to speed up every single aspect of the process, so that we can go end-to-end faster. In this way, we benefit from all the amazing leaps coming from the foundation models, but also own something they can’t replicate: an EM foundation model, fed by our own Data Factory.

The system compounds: fast approximate evaluation enables broad search, broad search finds promising candidates, fabrication validates and generates training data, training data improves the generator, better generator enables even broader search.

If you can do all of the RF design work in our loop, you can build an analog IP Factory.

The IP Factory for RF and the Compiler for Atoms

Our EM Foundation model’s key advantage over existing surrogates is that it can generalize.

Take a philosophical leap with me.

LLMs don’t mechanically learn to classify a sentence. They train on the structure of words and sentences in relation to each other. The rest of their behavior is emergent.

Before LLMs, you had spam detection as its own major problem. You had summarization as its own problem. Translation was its own separate problem. There were actually good companies with good ML teams doing each. The thing that they all got wrong was focusing on narrow problems. What we’ve learned is that if you can understand language at the root level, and you see scaling laws, you can get all the downstream applications for free.

So if it is true that there’s a fundamental relationship between geometry and EM fields, just like there is in language, and if scaling laws are true, then this model should generalize.

EM simulation looks a lot like language pre-LLMs. Where Ansys simulators fail – e.g., this tool is for antenna simulation, this tool is for EMI simulation for an automotive engine – we could tackle all of that with one model, like an LLM tackles translation, sentiment analysis, spam detection, and so much more all in one.

We believe that LLMs were the first foundation models, not the last. Language is one primitive of intelligence – the one humans use to communicate and think. But the universe has other primitives. Newton created calculus because the language of the universe is not English. LLMs will let us interface with new foundation models that can push humanity’s understanding of the universe farther, giving us intuition for things our biology wasn’t evolved for. Such models are a critical part of a positive-sum future of AI.

This is the huge bet that we are making at Arena Physica. It’s our thesis. We are already building a strong business that helps companies better understand and iterate on their electromagnetic systems, but we think generalization will also allow us to build the IP Factory for RF.

With our loop, we can automate IP creation and design. This is the key distinction between our model and ARM’s. If generalization works, then instead of selling one design to many customers like ARM does, we can generate a unique design for each customer and each use case almost instantly, with roughly the same effort it takes ARM to create one-size-fits-all IP.

And we think, and are seeing early signs, that our model can scale this automation across the whole electromagnetic spectrum, in which case our TAM is anything with a wave.

One question we get often is: can’t LLMs just do this? We’ve been running our own internal tests against frontier LLMs – both regular and extended thinking - and their performance gap to our base model is substantial. Our model achieves a magnitude weighted-MAE (Mean Absolute Error) well under 1 dB (for context, the range that RF engineers typically care about spans roughly 20-30 dB, so <1dB is a very strong result)6.

Arena Physica Internal Testing

What we’re building is different than LLMs, and better at what it was built to do. A useful way to think about what this unlocks is as a “Compiler for Atoms.”

In software, compilers translate high-level programming languages into binary instruction sets that a CPU can execute. We went from assembly to C++ to Python, and now Claude Code is arguably a compiler with English as the programming language. The LLM compiles down to the programming language it decides is best, which then compiles down further to machine code. At each step in this progression, the abstraction gets higher and the number of people who can “program” gets larger.

Physics doesn’t have a compiler yet. The universe has an instruction set: materials and geometries arranged in specific configurations. Place them one way, you get a motor. Place them another, you get an invisibility cloak. We know all of this is possible because the equations tell us so. Human physicists have spent centuries learning this instruction set. But to access any of it, you still need to hire the equivalent of an assembly language programmer: a physicist who has spent decades learning to translate between human intent and the physical world’s instruction set.

What we’re building at Arena Physica is, in some sense, a Compiler for Atoms: a way to express what you want in high-level terms and have it compiled down into the geometries and materials that produce it, starting with Maxwell’s equations and eventually, we hope, adding Schrödinger’s.

When we launch our EM foundation model next week, we’ll make it available to interact with via an agentic UI. You’ll be able to type a request in plain English like, “I need an eight gigahertz band pass filter for a satellite uplink.” The LLM translates that target scattering parameters, the technical parameters we’re optimizing for. Our LFM generates candidate geometries using physics as its reasoning substrate rather than language. Then, the LLM engages again to explain what the model did and why, drawing on the foundational RF engineering knowledge we’ve built into the system.

Andrej Karpathy says LLMs are “people spirits.” In our system, the spirit of David Pozar, author of the bible of RF engineering, provides his best explanation of the geometries generated by our EM foundation model. It’s like working with an intern who happens to speak electron, except the intern is channeling decades of accumulated RF wisdom.

I think this will be a powerful paradigm for the future. Everyone is thinking about human-to-model interactions. But the bulk of the work in systems like ours is model-to-model: the LLM talking to the EM foundation model, the EM foundation model responding, the two iterating through design space at machine speed. The human-to-model layer becomes a minority of the interactions, the intent-setting and interpretation. The real work happens in a language we can’t speak, translated for us by models that can.

In the future, language models might serve as the universal interface between humans and an entire ecosystem of specialized foundation models: for EM fields, for biology, for materials science. In this future, the LLM becomes the diplomat between species of intelligence.

Eventually, we want to get to a world in which anyone can say, “I want an invisibility cloak” or “I want a cheaper Starlink,” and the machine will design it. But today, even with our compiler, we’re still in the C++ era. Expert “physics coders” still need to tell the machine, “I need an 8 gigahertz band pass filter for a satellite uplink.”

In the meantime, we want to deliver that “eventually” future today.

Taking Problems Off Customers’ Plates Entirely

Since it became clear that our model had the potential to scale and generalize, I’ve been thinking a lot about the right way to deliver the capabilities it provides to customers.

I’ve thought about whether we should build the ARM for Analog. I’ve thought about selling access to the model, or the IP Factory, directly. I don’t think either are quite right. I think it’s worth talking through where we landed and how, because what constitutes the right business model is changing a lot with AI, and the model we landed on is probably not the one we would have pursued a few years ago.

If you think through what makes Arena Physica unique, it’s really four things: we have a talent-dense collection of some of the world’s best RF engineers, a “Compiler for Physics” that can generate valuable IP in experts’ hands, a Data Factory that improves as those RF engineers generate and validate more IP, and a software platform (complete with FDEs) that makes it easy to apply LLM-based agents to reason about hardware.

The business model that falls out of that is services. We’ve hired some of the world’s best RF designers and a team of excellent electrical engineers. Each one, working with our EM foundational model, with our agents, and with LLMs, can cover enormous ground. So instead of selling people the tools and asking them to figure it out on their own, we’re starting to just take problems off people’s plates.

You want to launch a space company and you have an unsolved problem with how your racks communicate in orbit? We’ll solve that for you. You need a silicon layout for your next chip? We’ll do the layout. We send in RF and electrical engineers, armed with our tools, and deliver the product the customer needs. It’s full-stack electromagnetic engineering as a service, at a speed and cost that wasn’t possible before, because each of these rare experts can now do what used to require an entire team, faster, better, and cheaper.

And because we’re so early in our journey to build the EM foundation model, working directly with our customers lets us learn faster and improve our models based on real-world needs.

To that end, we’re also pursuing research partnerships. This falls out of our model, too. As we scale the LFM from version one to version two and beyond, we need to decide which training data to generate next, and that depends on the problems we’re solving. Partners working on chip packaging need us to model different structures than partners working on superconducting quantum computing. They get early access to our model and our team for their specific problems, and we get the data we need to drive generalization. The research partnerships feed the Data Factory, the Data Factory feeds the model, and we eat more of the EM spectrum.

One counterintuitive move here is that we don’t plan to sell the model. We plan to publish it, and sell everything around it: the platform, the experts, and solutions to our customers’ problems. The model is what makes all of that possible, but it’s not the product. As Packy wrote in Power in the Age of Intelligence, “If your technology is so good, why aren’t you using it to compete?” Our product is: bring us your electromagnetic problem, and we’ll solve it.

Companies shouldn’t be bottlenecked by the fact that, as it stands, you need hundreds of millions of dollars to build the types of systems we can build. We should be able to almost AWSify expertise for them.

There’s a lot more I want to do here.

As our model improves, and we move from the C++ to Python and even Claude era of EM foundation models, I suspect our business model changes, too. We can sell the “designer” to companies and they can use it to generate their own IP. A sort of Golden Analog Silicon Goose. As that happens, the cost to manipulate the EM spectrum goes down, and humanity’s capabilities increase.

Over the longer term, my dream is to run Arena Physica as a modern Bell Labs, to use the commercial side of the business to fund a new kind of research network.

What if, once we’ve proven the foundation model works, we opened it up? We could give academics free access to our model and our compute. In exchange, when they use it to design novel structures or discover new phenomena, the IP flows back through Arena. We take a cut (maybe 20%, like an app store) and they keep the rest.

Right now, a professor working on some exotic antenna geometry has to write grants, wait for funding, hire grad students, and slowly iterate through simulations on whatever compute they can scrounge. What if instead they could just... use the model? Explore design spaces that would take years to search manually? And get paid when their discoveries become commercially valuable?

If our model can auto-generate IP for any electromagnetic application, we become the platform. The rare humans who can push the boundaries become contributors, and get rewarded for it as their rare knowledge turns scalable. Because the universe of people with this expertise is so small, we can actually get our arms around sharing the upside of their work with them.

Of course, as we expand across the EM spectrum, and our models become multiphysics models, we’re not going to hire all of the world’s great physicists. But by opening up the platform and becoming a new kind of Bell Labs, we can work with them.

I want Arena to be a place where academics can sabbatical in and access our models. Where we’re not just selling software, but funding experiments. Where we create incentive structures that let brilliant people do fundamental research without the soul-crushing grant cycle. Where we compress the cycle between physics research and application, and wield AI not to do what humans can do cheaper, but to do things humans can’t do at all today. If we win, humanity benefits.

But I’m getting ahead of myself. First, we need to take problems off of our customers’ plates.

What Can Our Customers Build?

So what could our customers build if we provide them best-in-class RF and EE?

To start, by working with Arena Physica, any company that wants to do anything with phased arrays can get custom ones, and much more quickly than they could have before.

What they’ll be able to build isn’t limited to what you’d traditionally think of as radar.

I heard a phrase once that stuck with me. I’ve always thought of radar as a detection system, but this guy told me that radar is actually an imaging system. As the frequency gets shorter, you get higher resolution. You can image things. So radar doesn’t just tell you that “something is there.” It also tells you what that something looks like. Radar can create images, like a camera.

Take drones. Everyone is talking about drones as the future of warfare, but currently, we can’t see them with radars or sensors. It might surprise you to learn that the United States Navy doesn’t have counter-drone phased array radar at scale. That surprised me too, so when we dug a little bit, what we heard was that in the current system, it’s just too expensive. To get Raytheon to build a new ship-based phased array radar for them would end up costing as much as half the ship.

It should not cost close to a billion dollars to make phased array radars. But remember that slow process of imagining and simulating designs that we discussed earlier? Now, imagine that happening inside of a slow-moving legacy prime that gets paid a margin on top of every dollar it spends. The result is that our Navy doesn’t have phased array radars that are good at detecting drones.

We can help the Navy see drones much more cheaply. And we should. For drone detection, you might have 1,000 incoming targets. The benefit of phased arrays is that you can form multiple beams from one antenna, like Starlink does, instead of something spinning like an old-timey radar or LiDAR. (As a side-note, those spinning LiDARs on top of Waymos will almost certainly become phased arrays, too, and when they do, the whole unit goes solid-state, which is cheaper and more reliable with no moving parts).

We also have people reaching out to us who want to design phased arrays for drone capture. They’re building systems that catch drones in mid-air with a robot arm. The drone capture systems are smart; they have onboard computing and sensors to actively track and intercept the incoming drone. They need to be able to see, which means they need custom phased arrays that can track a fast-moving drone with high precision, work in any visual conditions, and are small and cheap enough to put on the capture mechanism itself. They certainly couldn’t make the math work at a billion dollars, but we can bring those costs down at least an order of magnitude in the near-term with automated design and fast iteration.

The applications only multiply once you realize that you’re dealing with an all-condition imaging system, and that drones are just one application of phased arrays. The same physics that steers radar beams also steers communication beams.

If space stuff continues to grow the way everyone thinks it will, you’re talking about tens of thousands of satellites and millions of ground terminals. Every single one needs these precisely shaped silicon tiles. Every ground station needs phased arrays. Northwood just raised a ton of money to build phased array ground stations. Every Starlink antenna on a house and satellite is powered by phased arrays.

Now, let’s say an adversary is trying to jam your radar signal. Notice how these little things dive down to zero:

Remember I said you can dynamically move this beam? This is one of the benefits of the phased array. It lets you move those points. Those are called null points, and it’s where the interference pattern zeroes out, like your noise canceling headphones. So imagine I’m being jammed. What’s crazy cool about a phase array is that I can transmit my signal, and then I can cyclically move my null point to absorb the jam signal. Just think about it. It’s remarkable, actually. You’re still transmitting and receiving, but you’ve carved out a little pocket of silence right where the enemy is screaming at you.

But if you can exploit the physics really well—like, in this case, we’re still using our good old digital transistors to say what to do, while the analog shapes determine the quality of the EM fields produced—this is where Gordon Moore’s dream might come true: his silicon boolean transistors are talking to analog silicon. And that’s really cool. With digital silicon, we can compute. But with analog silicon, we can transmit power, we can transmit directed energy, we can absorb energy. Analog silicon makes it much more physical.

The same physics that lets you communicate also lets you deny communication to others. And the same physics that lets you transmit also lets you absorb. Stealth is just the inverse problem of radar: instead of bouncing signals back, you’re making them disappear.

It’s all shapes.

Imagine we could change the economic structure of all of this.

That is exactly what we are trying to do at Arena Physica. Our mission is to create “Electromagnetic Superintelligence.” It sounds audacious, but remember, it is much easier to achieve superintelligence—as in, relative to humans—in electromagnetism than it is in language or even math. And it describes precisely what we’re building: a system that develops superhuman intuition for how geometry shapes electromagnetic fields, a mind that can see what we can’t.

As a software engineer, I don’t know how to be an infrastructure engineer. That’s because I don’t need to. Amazon takes care of it for me. If you could say companies no longer need RF expertise, you could lower the cost of everything we’ve discussed dramatically by a factor of 10 or more. We could give RF capabilities to everyone, from small companies to those that serve the Navy.

One obvious ramification is we’ll probably see more satellite companies and space companies, because they can now design their own phased arrays. More competition in the radar space. More competition in the jamming space.

Another less obvious one might be backpack radars. Think about troops going into a situation like Ukraine. One of the big risks they face is drones sneaking up on them. They should have backpack-mounted counter-drone radar: small, cheap phased arrays that let every warfighter see what’s coming.

Backpack radars are a very specific thing, but the point is that they’re something that made no practical or economic sense before that becomes practically and economically feasible.

We can even help AI models get better in a sort of indirect way.

Data centers need to move enormous amounts of data between chips very fast. The problem is that, at those speeds, the wires connecting chips start acting like antennas. They can accidentally broadcast and pick up signals. Which means they hit bandwidth limits on chip-to-chip communication. So their goal is the opposite as ours: they’re trying to make really bad antennas. They don’t want the electron traveling between their GPU and CPU to pick up a signal or transmit one, because then the data gets garbled. This is called signal integrity. The solution is the same shaped-silicon approach: carefully designed structures that guide high-frequency signals without interference.

It’s not just chip-to-chip. Think bigger. A data center company recently asked us whether we could beam data rack-to-rack wirelessly, because the cabling itself is becoming a bottleneck to how fast they can deploy. It’s not just terrestrial, either. For orbital data centers, there won’t be any option. You’re not going to be running optical cables between racks in space.

The most interesting thing is how the market has responded by asking us for use cases we would never have thought of.

For example, high-frequency trading firms have reached out to us about helping them trade faster. I’d assumed fiber optics already transmitted at the speed of light, but due to total internal reflection inside the glass, signals travel at only 60-70% of light speed. A phased array transmitting through free space goes at actual light speed. Over the distance between New York and Chicago, that difference could be enough to make a lot of money.

If you look at what we’ve just described, there are really two different things happening. I think we‘re going to see a K-shaped future for hardware.

The lower leg of the K is making commodity devices dramatically cheaper and more accessible. For example: the Navy acquiring drone-detection radar without writing a billion-dollar check to Raytheon, satellite startups designing their own phased arrays instead of outsourcing to a prime, backpack-mounted counter-drone radar for every warfighter, and data centers deploying faster with wireless rack-to-rack links are capabilities that exist today in expensive, bottlenecked forms. We want to remove these bottlenecks and make the capabilities cheaper and more abundant.

The upper leg of the K is making exquisite devices at the frontier of what’s physically possible newly achievable. This is what Bell Labs enabled, entirely new and more powerful capabilities than were possible before their research. There is a lot of excitement about cheap, attritable systems in the future of warfare, and rightly so. But in the conversations I’ve had with military leadership, they believe that to win (or deter) a conflict in the Indo Pacific, we’re going to need some of the world’s most exquisite machinery. The F-117 was crucial to winning the Gulf War; it made up just 2.5% of coalition air power but destroyed 40% of all strategic targets. We want to make new types of exquisite hardware possible, for defense and beyond.

By making it easier, faster, and cheaper to design more capable RF components, we think we’ll help expand the market beyond current analyst estimates, which don’t anticipate the unimagined. Those estimates project a 12% CAGR for RF components, from $44.8 billion to $140.5 billion over the next decade. I think that’s wrong, almost certainly too low. One of the reasons they’re projecting relatively slow growth is that RF is just too hard today. But if you democratize the expertise and transform the cost structure, would everyone just add better all-weather sensing equipment onto their robot? Would everyone just add backpack-mounted radars to every soldier?

What I’m most excited about is this open possibility space. I don’t even know what else people might think up now that they have the ability to manipulate this stuff. If we’re right, their ambitions won’t be limited by speed, economics, or even by the shapes required to deliver the capabilities that they need.

Alien Designs

A lot of the shapes coming out of our system already look nothing like what a human would design. Basically, we’re working with near-alien humans to create a system that will ultimately produce alien designs.

Human RF engineers have been trained on certain canonical structures: dipoles, patches, spirals, horns. They know these shapes work because they’ve been refined over decades. When they design something new, they start from these familiar forms and tweak them.

Our system doesn’t care about any of that. It starts from noise and evolves toward function. The results often look like QR codes, or random stippling, or structures that seem to follow no logic at all.

RF circuit with an alien geometry, designed by Arena Physica’s EM foundation model

When we show these designs to expert RF engineers, their first reaction is usually skepticism. “That doesn’t look like an antenna.” “I would never have come up with that.” “Are you sure this works?”

Then, we fabricate it.

Human-designed RF circuit (L) vs. AI-generated RF circuit (R)

Today, we’re fabricating at the PCB board level. The goal I have for the team is that we’ll do our first silicon tapeout this year. As in, we’ll manufacture actual silicon chips. Analog silicon has the advantage that we don’t need TSMC’s bleeding-edge fabs; older, cheaper factories like Samsung, Global Foundries, and some defense fabs can do it, because the node size is typically larger.

And it works. This is the AlphaGo moment. Remember Move 37 expert commentary?

We’re seeing the same pattern. Engineers look at our designs and say, “That’s super unconventional. That’s unusual. That’s not in the textbook.” And then it works. And then they say it was “creative.”

Our goal is to do more than just match human performance autonomously. We want to exceed it and find all of the EM equivalents of Move 37: designs so counterintuitive that no human would have tried them, but so effective that they outperform anything we would have tried.

My hunch is that this will happen fairly quickly in EM because of what we discussed at the top: most humans didn’t evolve to intuit which shapes produce which EM waves. Therefore, we are terrible at intuiting the answer. On the other hand, computers can become superhuman — we can evolve them to get there in simulation.

There’s this great story in Skunk Works, Ben Rich’s personal memoir from his time atop the famed Lockheed division, about how the F-117 stealth bomber came to be.

A thirty-six-year-old mathematician and radar specialist named Denys Overholser happened to read a translation of a dense technical paper by Pyotr Ufimtsev, the chief scientist at the Moscow Institute of Radio Engineering, titled Method of Edge Waves in the Physical Theory of Diffraction. Ufimtsev had “revisited a century-old set of formulas derived by Scottish physicist James Clerk Maxwell… these calculations predicted the manner in which a given geometric configuration would reflect electromagnetic radiation” and took them a step further. “Ben,” Overholser told Ben Rich, “this guy has shown us how to accurately calculate radar cross sections across the surface of the wing and at the edge of the wing, and put together these calculations for an accurate total.”

With Ufimtsev’s work, the Skunk Works team could create computer software to calculate the radar cross section (how visible an object is to radar) as long as the shapes were in two dimensions. If they designed the bomber as thousands of flat triangles, they could add them all up and get the radar cross section.

That’s exactly what Overholser did, and the design that emerged “was a diamond beveled in four directions, creating in essence four triangles,” which, viewed from above, “closely resembled an Indian Arrowhead.”

He called it the Hopeless Diamond, and calculated that it would be “one thousand times less visible than the least visible shape previously produced at the Skunk Works.” On a radar screen, it would appear to be the size of an eagle’s eyeball.

Kelly Johnson is the Skunk Works founder and boss who was so magnificent at airplane design that his boss said of him, “That damn Swede can actually see air.”

Kelly Johnson

He was so unimpressed by the design that, when he saw it, he physically kicked Rich in the butt, crumpled the proposal, threw it at Rich’s feet, and stormed, “Ben Rich, you dumb shit. Have you lost your goddamned mind? This crap will never get off the ground.”

As it turned out, the model was right and even the great Kelly Johnson was wrong. The Hopeless Diamond became the F-117 Nighthawk, the stealthiest plane built to date by more than three orders of magnitude, and flew over 1,300 sorties during the 1991 Gulf War without a single combat loss.

F-117 Nighthawk

The F-117’s was an alien geometry, and it gave the US otherworldly capabilities.

We’re attempting to do something similar, but at silicon scale (to start). And with AI doing the search instead of human engineers sketching on whiteboards, we’re planning to do it over much wider problem spaces and search spaces than currently exist.

The Shape of Things to Come

We have a lot of work ahead of us to solve the core problem: building a Large Field Model that can do for EM what LLMs did for language.

The simulation has to get faster. The generative model has to get smarter. The fabrication loop has to tighten, and we need to actually fab analog silicon, and then we need to do it for a lot of customers. We need to hire more of those rare humans who think like electrons and work with them to create training data.

It’s hard to see what our models can do today and not imagine what they might be able to do in the future, though.

If it scales across the entire EM spectrum like we think it will, things will get very interesting.

How interesting? Packy asked me whether our models might one day help finally produce a Grand Unified Theory, if one is possible.

I don’t know. But here’s how I think about discovering new physics more broadly.

All of our tools today — since the dawn of computers — have been about deduction. Solve, compute, calculate, predict. Here’s an input; tell me what happens. But a lot of really creative human reasoning is inductive. You postulate a question or an idea, and then you research it. That’s deeply creative and deeply human.

This connects to something personal for me. My undergraduate research advisor at Stanford, Hari Manoharan, did an experiment that made the cover of Nature in 2000.

He arranged 80 cobalt atoms in an ellipse on a copper surface. You know those whispering galleries where you stand in a corner and whisper and someone on the other side can hear you? That’s constructive interference of sound waves. Hari knew that electrons also behave like waves, and he suspected they should interfere in the same way.

And that’s exactly what he saw. A “quantum mirage.” A ghost atom appearing at one focus when a real atom sat at the other. What blew everyone’s mind was that there was no time delay. Physics predicted some tiny delay to account for information traveling at the speed of light. Instead, it was instantaneous. This kicked off a whole field of quantum communication research.

Manoharan et. al, Nature 403, 2000

Hari knew the Schrödinger equation. He knew Maxwell’s equations. Everybody knew those equations. But he combined them in a way no one had thought to try, postulated what might happen, and built an experiment to test it. That’s how new physics happens.

How amazing would it be if we had machines that could actually understand these equations and help us? Currently, we have to wait decades for a genius to come along and push a field forward. What could we benefit from if those geniuses had a little help? How much closer could we pull the future? How much more of the universe could we understand in our lifetimes?

What if our foundation model could start making those leaps? Right now, it’s not yet breakthrough inductive, meaning it can’t create its own experiments. It’s like an applied physicist: we can tell it our engineering goal and it will come up with a design. But as it learns more, develops something like intuition, could the model start postulating? Could it notice patterns humans have missed and suggest experiments?

And maybe (this is where I let myself dream), maybe if we build foundation models for each of the fundamental forces, and they start talking to each other, we get closer to something bigger. There are four fundamental forces: strong, weak, gravity, electromagnetism. We’re just trying to make a dent in one of them. But physics is deeply interconnected. Hari’s quantum mirage happened because electromagnetism and quantum mechanics intersected in a way that nobody expected.

What happens when foundation models that understand multiple forces chain together, exploring the spaces between them? I can’t stop thinking about this.

Ultimately, that’s what this is about. It’s why I pursued my PhD in quantum electromagneticism for four years (then dropped out in true Silicon Valley fashion) and why I started Arena Physica.

I want to understand the nature of reality in order to manipulate it for the betterment of humanity. To do that, we need models that learn directly from physics, that develop intuitions we’ve never evolved to have.

One of the questions that came up in grad school, and I remember thinking, how do people even ask these questions, was: Why does physics work? Why does math work? Isn’t it strange that reality is so... describable? Why isn’t it much more random?

Maybe these models will help us find out.

Electromagnetism secretly runs the world. We’ve been manipulating it for a century with our hands tied behind our backs, limited by the rarity of humans who can see what we cannot see.

We’re creating something to understand the interplay between geometries and electromagnetic waves, and evolving them to develop a new intuition.

The universe is made of fields. Fields are shaped by geometry. Geometry, it turns out, is something computers can learn much better than we can. We should lean on them so that we can get to new problems.

If we learn to shape the waves, we might be able to shape the future.


Big thanks to Pratap and the whole Arena Physica team for sharing their knowledge, and to Badal for the cover art.


That’s all for today. We’ll be back in your inbox with a Weekly Dose on Friday.

Thanks for reading,

Packy

1

Ten is the number that keeps coming up in conversations with people at companies where these people work today. The real number of world-class RF designers is probably in the low hundreds. They all seem to know each other by first name. The community is that small.

2

The programming term “bug” comes from bugs crawling into vacuum tubes.

3

Tuxedo Park : A Wall Street Tycoon and the Secret Palace of Science That Changed the Course of World War II by Jennet Conant is a great book for those interested in learning more.

4

Read more of the ARM story in The Electric Slide (link right to ARM section)

5

For those who want the technical details, we will be releasing a technical blog post next week.

6

The metric is MAE, but the value we’re measuring here is an S-parameter (Scattering parameter), it’s a complex number, so there is a real and imaginary part... that’s why we separate out phase and magnitude.

Weekly Dose of Optimism #185

2026-03-20 20:46:59

Hi friends 👋 ,

Happy Friday and welcome back to our 185th Weekly Dose of Optimism. Fresh off the heels of my World Models primer with General Intuition’s Pim de Witte…

… I’m excited to bring you my favorite Dose in a long time. Kalanick, Bezos, copper, cancer drugs, meditation stimulation, and the return of Ulkar. I can’t wait any longer…

Let’s get to it.


Today’s Weekly Dose is brought to you by… Framer

Framer gives designers superpowers.

Framer is the design-first, no-code website builder that lets anyone ship a production-ready site in minutes. Whether you’re starting with a template or a blank canvas, Framer gives you total creative control with no coding required. Add animations, localize with one click, and collaborate in real-time with your whole team. You can even A/B test and track clicks with built-in analytics.

The weekend starts today, and the weekend is the perfect time for side-projects. Get to it.

Launch for free at Framer dot com. Use code NOTBORING for a free month on Framer Pro.

Just Publish it With Framer


(1) Travis Kalanick is Back with Atoms

Travis Kalanick &

There’s a version of the story where Travis Kalanick disappears into the wilderness after his investors ousted him from Uber. He’s rich. He doesn’t need to deal with this shit.

If you watched the Uber saga, though, you knew that wasn’t how the story was going to play out, and luckily, you were right. Travis. Kalanick. Is. Back.

Kalanick spent nearly eight years in what might be the most extreme version of stealth mode any modern founder has pulled off: he’s hired thousands of employees, in 30 countries, bought and developed hard real estate assets, built a full-stack food infrastructure business, and did it all like lasagna. Employees weren’t even allowed to list the company’s name anywhere.

The company was called City Storage Systems. Its most visible subsidiary, CloudKitchens, operated ghost kitchens. It was valued at $15 billion in 2022. And it was just, to use a food term, a little amuse-bouche before the main course.

Last week, Kalanick unveiled Atoms: a robotics company spanning food, mining, and transport, built on everything CloudKitchens learned about physical automation. CloudKitchens is now Atoms Food, which includes Lab37's Bowl Builder robot (200 meals per hour, no humans), the Otter restaurant OS, and Picnic delivery.

With Atoms, TK is expanding into mining (via the acquisition of Pronto AI, an autonomous haulage startup for mines and quarries founded by Waymo founder Anthony Levandowski) and transport, where Atoms is building what it calls “a wheelbase for robots.”

Kalanick’s robotics bet is explicitly anti-humanoid. It’s the same bet that we wrote about in Many Small Steps for Robots, One Giant Leap for Mankind with Standard Bots’ Evan Beard and in yesterday’s World Models: Computing the Uncomputable with General Intuition’s Pim De Witte. “The recent humanoid Olympics in Beijing highlighted many advances in humanoid development,” Kalanick writes in the company’s Vision doc, “I watched the half-marathon and couldn’t help but think how much better it would be if they just had wheels.”

Uber was Kalanick’s first attempt to digitize the physical world. Atoms seems to be the rest of it.

It’s a bad idea to bet against Travis Kalanick. Watch the TBPN interview to get a taste of why. Which means it’s a good idea to bet on our autonomous everything future.

(2) Jeff Bezos Raising $100B AI Manufacturing Fund

The Wall Street Journal

THE BOYS ARE BACK IN TOWN. First Kalanick, now Bezos. Elon’s been living here.

Imagine going out to raise one of the largest funds in history, a SoftBank Vision Fund-sized whopper of an investment vehicle, one that exceeds the total amount of US venture capital funds raised in 2025… and that is less than 50% of your personal net worth. Jeff Bezos doesn’t have to imagine.

The WSJ reported that Bezos is looking to raise a $100B fund “to buy companies in major industrial sectors such as chipmaking, defense and aerospace.” The plan is to buy existing manufacturing companies, and then use Bezos’ new startup, Project Prometheus, which sounds like a World Models company (“building AI models that can understand and simulate the physical world”), to boost efficiency and profitability.

Basically, the fund is to AI rollup funds targeting accounting firms as new Jeff Bezos is to scrawny old Jeff Bezos. It’s the gigachad version of the AI rollup playbook.

Details are sparse, but if you told me a couple of years ago, in the depths of the bear market, when everyone said that America couldn’t build things anymore, that Jeff Bezos, Travis Kalanick, and Elon Musk were going to be competing to industrialize America harder, all while Chris Power launches a new Hadrian factory in Alabama, I would have told you… welcome to The American Millennium.

(3) Copper One: The World’s Only Autonomy-First Mine and Refinery

Mariana Minerals

Wait… and we’re mining and refining autonomously here, too?

Long time not boring readers will know that I am a big fan of mining startups, given how critical (get it) the industry is to the modern world and how critically underinvested-in it’s been. I think sell side analysts will write poems about Earth AI’s business model one day.

It’s also a unique industry in that it’s structurally not winner-take-all. Earth AI can build a very, very big business while Kobold, Durin, and Mariana Minerals do, too. If you have enough high-grade metal in your deposit and can get to it economically, you’ll make money.

So I was pumped to see Mariana announce Copper One, its first autonomy-first copper mine and refinery in Utah. The blog post in which they announced the project is one of the best company announcement blog post’s I’ve read in a while, with clear explanations and great graphics, so I encourage you to go read it. Here, I’ll just share the plan they laid out to turbocharge the existing copper mine they acquired last year

  1. Deploy PlantOS at scale to maximize copper recovery and reduce copper refining costs throughout heap leaching, solvent extraction, and electrowinning.

  2. Restart mining operations with autonomous equipment and orchestration via MineOS.

  3. Integrate copper scrap processing into the refining circuit, leveraging PlantOS to manage feedstock variability and to put a meaningful dent in US copper scrap exports.

  4. Scale combined output at the site to 50,000 metric tonnes per year from both geologic and scrap feedstocks (leveraging CapitalProjectOS to accelerate capital project delivery).

We’re going to need to find a lot of metals and minerals, we’re going to need to get them out of the ground and refine them more efficiently, quickly, and cleanly. I’m excited to see what improvements Mariana can dig up in Utah.

(4) Scientists Create Cancer-Fighting Immune Cells Right in the Body

University of California San Francisco

Long time not boring readers will also know that we have long taken a bold but important staunchly anti-cancer stance here in the Dose. We just don’t like cancer, and we want to see it gone.

’s piece on Sid Sijbrandij’s extraordinary care journey, Going Founder Mode on Cancer, remains our favorite in the anti-cancer canon. It tells the story of one person’s against-the-odds battle to bend the medical system and cure his own cancer. But it shouldn’t have to be that hard. Enter this week’s entry.

CAR-T therapy is one of the most powerful weapons against cancer. It works by pulling a patient's T cells out of their body, genetically reprogramming them to hunt cancer, growing them up in a lab, and infusing them back in. Seven CAR-T therapies are now FDA-approved for blood cancers. The problem is, the process takes weeks, costs over $400,000, requires specialized manufacturing facilities, and demands lymphodepleting chemotherapy just to make room for the new cells. Most cancer patients in the world will never have access to it.

This week, a team at UCSF led by Justin Eyquem published a paper in Nature showing they can skip almost all of that. Instead of the extract-engineer-expand-reinfuse pipeline, they designed a two-particle injection system that reprograms T cells inside the body. One particle delivers CRISPR-Cas9 to make a precise cut in the T cell’s genome. The second delivers the DNA template for a chimeric antigen receptor (the cancer-targeting weapon). The whole thing is designed so only T cells get edited, and only at a specific, safe genomic location, avoiding the random integration that can, in rare cases, cause secondary cancers.

In mice with humanized immune systems, a single injection cleared detectable leukemia in nearly all animals within two weeks. The engineered cells made up as much as 40% of T cells in organs like bone marrow and spleen. The approach also worked against multiple myeloma and sarcoma, a solid tumor that is historically much harder for CAR-T to crack. Best of all, the in vivo cells actually outperformed lab-manufactured ones, because cells that never leave the body retain their “stemness” and ability to keep dividing.

As always, it’s important to remember that mice are not people, and we need to see this thing work in humans before we start popping the champagne. To that end, Eyquem and his collaborators founded a company, Azalea Therapeutics, to push toward human trials. If it translates, this could turn CAR-T from a last-resort therapy available at a handful of elite cancer centers into something closer to a vaccine, a single injection that any hospital could administer.

Just a quick shot, a follow-up, and back at it. Say it with us… get fucked, cancer.

(5) Facilitating Mindfulness Training with Ultrasonic Neuromodulation

Brian Lord, Erica N. Lord, Jessica Schachtner, Laura Beaman, Shinzen Young, John J. B. Allen, Joseph L. Sanguinetti

A little while back, I read Shinzen Young’s The Science of Enlightenment. It was one of the best books I’ve read on meditation. Shinzen is a Jewish-American who trained as a Shingon monk in Japan, did deep practice in all three major Buddhist traditions (Vajrayana, Zen, Vipassana), and then came back to the West to fuse contemplative practice with scientific rigor.

Most relevantly, near the end of the book, he makes a wild prediction about Maitreya, the prophesied “future Buddha” of Buddhist tradition. Maitreya won’t be a person, Shinzen argues. It will be a collective of scientists, technologists, and contemplatives working together to make liberation accessible at scale through tools and systematic methods, rather than requiring every individual to spend decades in a monastery.

This week, Shinzen and a team of researchers dropped a preprint on bioRxiv that might be a step in that direction.

After dozens or hundreds of hours of sitting, experienced meditators have brains that look different from everyone else's. Their default mode network (DMN, the brain's self-referential chatter machine, the thing running rumination loops, and the thing that psychedelics seem to quiet) decouples from the central executive network, the system that handles focused attention. That decoupling is the neural signature of equanimity: letting experiences arise and pass without getting caught up in them. It's associated with reduced stress, lower depression, and generally being a calmer human. But it takes hundreds of hours to get there, and most people quit long before anything rewires, because the early stages are so uncertain and even boring.

What if you could use technology to do the same thing, much faster?

Shinzen and the team at the University of Arizona ran a randomized controlled trial with 24 meditation-naïve participants who did a two-week mindfulness program. Half received transcranial focused ultrasound targeting the posterior cingulate cortex (the hub of the DMN) during four in-person sessions. Half got sham stimulation.

After two weeks, the active group's brains showed the decoupled network pattern that normally takes experienced meditators hundreds of hours to develop (p < 0.001). The sham group's connectivity actually increased; they got more tangled up, not less, which is exactly what frustrated novices do. Within the active group, greater decoupling also predicted bigger increases in self-reported acceptance and longer voluntary meditation sessions. The ultrasound made the practice better and gave novices experienced meditator brains.

It's a small study and a preprint, so it needs replication, but as someone who has spent a lot of hours trying to go deep, it also seems remarkable.

I love this because I’ve written that the worst outcome would be for us to get all of the technological wonders we could have asked for and still be unhappy, and I think meditation, and the associated ability to pay attention, can help there.

It’s also just very cool to see someone who spent his entire career arguing that the cross-fertilization of Eastern contemplative technology and Western science would eventually produce something neither could produce alone being proven right.

“The next Buddha is a sangha,” indeed.

EXTRA DOSE:

  1. Scientific Breakthroughs with Ulkar Aghayeva

  1. My Favorite Essay of the Week

Read more

World Models: Computing the Uncomputable

2026-03-19 20:55:52

Welcome to the 458 newly Not Boring people who have joined us since our last essay! Join 260,170 smart, curious folks by subscribing here:

Subscribe now


Hi friends 👋 ,

Happy Wednesday!

A few months ago, Pim De Witte and Kent Rollins invited me to their office right here in New York City to show me what they’ve been cooking up at General Intuition. I’d heard about the company, from the announcement of their leet $133.7 million Seed round, and I’d heard about the class of product they were building, World Models, but I didn’t know much beyond that.

What they showed me that day, models that learn to predict the near future from action-labeled gaming clips, and what I’ve learned from many conversations and dozens of hours of research since, has changed my perception of what models can do. I am on the record as being skeptical that LLMs will take us to superintelligence, but I think there is a real shot that World Models will drive superhuman, complementary machines that do things that we can’t, or don’t want to, do.

Since that first meeting, the World Models space has heated up. Fei-Fei Li’s World Labs raised $1 billion. Yann LeCun’s AMI raised $1.03 billion. World Models were one of the stars of this week’s NVIDIA GTC. But the field is so nascent and there is so much going on, so many geniuses pursuing competing and collaborative approaches, that it’s hard to make sense of it all.

So I asked Pim to team up with me on a co-written essay about the history, theory, progress, and potential of World Models. He agreed, and both he and the General Intuition team have been incredibly generous with their time and human intelligence in helping me get up to speed, so that I can help you get up to speed.

I have the coolest job in the world. Over the past couple of months, I’ve gotten a front row seat to the future of embodied AI, of Models and Agents, trained in dreams, that direct machines to do things for us in the physical world.

I’m thrilled to share the fruit of that exploration, what I think is the most comprehensive guide to World Models that exists. Obviously, Pim and the GI team have a perspective on the best way to build World Models, but I was impressed with how careful they were to present the pros and cons to every approach, including theirs, and with their admission that the future is not yet determined.

The space continues to change and progress incredibly fast. I hope this will help you navigate and make sense of all of the exciting news that continues to drop.

Let’s get to it.


Today’s Not Boring is brought to you by… Framer

Framer gives designers superpowers.

Framer is the design-first, no-code website builder that lets anyone ship a production-ready site in minutes. Whether you’re starting with a template or a blank canvas, Framer gives you total creative control with no coding required. Add animations, localize with one click, and collaborate in real-time with your whole team. You can even A/B test and track clicks with built-in analytics.

Launch for free at Framer dot com. Use code NOTBORING for a free month on Framer Pro.

Just Publish it With Framer


World Models: Computing the Uncomputable

A Co-Written Essay with Pim De Witte

“I wanted to fall asleep last night. Instead, I started imagining all of the scenarios I might run into the next day, and how I might react to them.”

This is a common experience. As humans, we imagine easily, whether it’s complex sports stadiums, potential romance, or heated discussions. We don’t have to work harder to imagine ourselves at the next Manchester United game than we do to imagine talking to a friend we’ve known for years, even though imagining a Manchester game includes simulating and modeling the behavior of thousands of people, something that would take years for traditional computers and game engines today1.

Think about writing the code to describe the Man U match: at any moment, a fan might bring a random, home-crafted flag. The entire stadium starts singing a song related to it. Only some will sing, though; others will jump with their kids, while an old couple sits still, wondering if this is their last game together, soaking in every second in silence.

The world is a place where unexpected futures unfold, but in somewhat predictable ways. As humans, we can envision almost all of them with roughly the same amount of effort with a very similar amount of time given to each thought. Computers can’t.

It’s no wonder traditional computing struggles with this complexity. Imagine anticipating and coding each and every action, as well as the interactions between all of those actions. Mathematically, in a traditional engine, simulating N fans is at least an O(N) or O(N2) problem. Each person, flag, chair, and ball must be explicitly calculated — and really, the interactions between them need to be calculated, too.

In robotics, machines must respond to situations in the real world in the same amount of time, regardless of their complexity, even though, in traditional computing, different situations can take wildly different amounts of time to simulate. This has been a major bottleneck for robotics and embodied AI progress.

World Models are a solution to that problem.

World Models learn to predict those dynamics from video and, often, the actions taken in them. They reduce situations that are dynamic and computationally difficult to simulate at scale — including stochastic, action-dependent group behavior like soccer games — into a single fixed cost operation in a neural network.

In a World Model, the entire stadium is simulated as a fixed cost forward pass through the neural network. The complexity of the scene doesn’t exponentially slow down the ‘engine’ during inference because the weights have already absorbed the patterns of the world in training.

How? Actions.

Actions act as a form of compression to predict unfolding dynamics: they hold the information to unroll future states in an environment, until more actions take place and add new inputs into the environment. Each action carries enough information to predict what happens next, until the next action updates the picture.

This action-conditioned approach allows models to learn and plan interactively. Today, this is intractable in even the best simulation engines, and definitely not at predictable compute costs. Actions help models interact with the world like we do.

Over and over again, every single day, you observe, you compute, you decide what to do, you act. This is life. At any point, all gathered information about space and time collapses into the action you take.

For computers, actions are a cheat code around the costs of simulation. If human brains are much more efficient than best-in-class LLMs, then we can get all of that computation practically for free by observing how humans respond to the countless variables in their environments. This gives us a way to do non-deterministic computing efficiently and create simulations that shouldn’t be possible under traditional compute constraints.

This ability to compute the uncomputable is why we believe World Models will unlock progress in embodied AI in a way that current model architectures can’t.

Think about models like dreams.

Have you ever had a dream where you simply stood and watched what was happening without the ability to intervene? That’s a video model.

The real world is different. It responds to what you do or instruct to do, and predicts the full range of things that could happen as a result, not just the single most likely or most entertaining next frame.

Have you ever had a lucid dream in which you were able to shape the story inside the mind-generated dreamscape? That’s a World Model.

I coded up a comparison that you can play with here.

More formally, while a standard video model predicts the next frame based on probability, P(xt+1 | xt), a World Model predicts the next state based on intervention, P(st+1 | st, at).

That at, the action at time t, is the magic.

At General Intuition, we believe (and are seeing early signs) that World Models are a new and potentially more powerful class of foundation model than LLMs for environments that require deep spatial and temporal reasoning. Environments like our real world.

World models — these systems that learn from watching the world and the actions taken in it — are a fundamentally new kind of foundation model. They can compute what was previously uncomputable.

They will matter far more than anyone currently realizes, because they offer a path to general intelligence that language and code alone cannot. Being human, after all, is spending a lifetime taking actions based on what we experience, observe, and learn.

Pause. You might be confused by that claim, that World Models offer a path to general intelligence that LLMs cannot. Understandably so.

World Models are getting a lot of attention as of late. Yann LeCun, who has been skeptical that LLMs are the path to general intelligence, just announced that he raised $1.03 billion for AMI. Fei-Fei Li’s World Labs has also raised more than $1 billion to pursue World Models. Google DeepMind, which has the closest thing to an infinite money printer in tech, is betting money on World Models too. But what we’ve seen so far from that investment are cool videos and 3D worlds.

LLMs can quote Shakespeare and solve Erdős Problems. World Models, on the other hand, still seem more like a path to the Metaverse than a path to general intelligence.

But part of the reason World Models don’t yet have the hype of LLMs is that their definitions are still shaky.

What are World Models? We’ve already said that video models don’t fit the definition. 3D space models don’t, either. That said, both may be paths to World Models. Are the models that animate robots today World Models? Not really, although some are, and even the ones that aren’t share features with World Model architectures.

As always, hype adds to confusion. “My prediction is that ‘World Models’ will be the next buzzword,” Alexandre LeBrun, the CEO of AMI Labs (which is definitely a World Model company) told TechCrunch. “In six months, every company will call itself a World Model to raise funding.”

Hype is a small part of it. What we — and everyone else building in this space — believe is that World Models are the path to controlling machines in the physical world. There are differences in what we believe this path will look like. But all of us believe that the future runs through World Models.

“...very few understand how far-reaching this shift is…,” NVIDIA Director of Robotics and Distinguished Scientist Jim Fan said recently. “Unfortunately, the most hyped use case of World Models right now is AI video slop (and coming up, game slop). I bet with full confidence that 2026 will mark the first year that Large World Models lay real foundations for robotics, and for multimodal AI more broadly.”

Today, we’d like to welcome you into the group of the “very few” who “understand how far-reaching this shift is.” We are going to share the history of World Models, the state of the field as it stands today, broad explanations of the approaches each major lab is taking, and the convictions that drive General Intuition’s directions.

Whether you come with us is up to you. You take the blue pill, the story ends. You wake up in your bed and believe whatever you want to believe. You take the red pill... you stay in Wonderland, and we show you how deep the rabbit hole goes.

For example…. how can you be sure that you’re not an Agent operating inside of a World Model yourself?

Can Agents Learn Inside of Their Own Dreams?

Wake up, Neo.

World models aren’t a new idea. They are one of our oldest. Since humans gained the ability to think about our place in the universe, to ask why we are here, we have pondered whether our reality is just a simulation.

In 380 BC, Plato, via Socrates, offered The Allegory of the Cave. Imagine human beings who live underground in a cave, necks chained, forced to look ahead at the shadows on the wall. Those humans would believe those shadows to be reality, when in fact they are mere shadows of reality. This was Plato’s metaphor. He suggests that we are all stuck in the cave, necks chained, mistaking our perception for true reality.

Eighty years later, Chinese Daoist philosopher Zhuangzi contemplated similar questions in a passage of his Butterfly Dream:

Once Zhuang Zhou dreamt he was a butterfly, a butterfly flitting and fluttering around, happy with himself and doing as he pleased. He didn’t know he was Zhuang Zhou. Suddenly, he woke up and there he was, solid and unmistakable Zhuang Zhou. But he didn’t know if he was Zhuang Zhou who had dreamt he was a butterfly, or a butterfly dreaming he was Zhuang Zhou. Between Zhuang Zhou and a butterfly there must be some distinction! This is called the Transformation of Things.

As the centuries passed and our technological capabilities evolved, sci-fi writers joined the long lineage of thinkers inquiring about the true nature of reality. Frederik Pohl’s 1955 The Tunnel Under the World. Daniel F. Galouye’s Simulacron-3. Stanislaw Lem’s Non Serviam. Vernor Vinge’s True Names. William Gibson’s Neuromancer. Neal Stephenson’s Snow Crash. All painted textual pictures of simulated worlds.

During a 1977 speech in Metz, France, sci-fi legend Philip K. Dick confidently told the audience: “We are living in a computer-programmed reality, and the only clue we have to it is when some variable is changed2, and some alteration in our reality occurs.”

Your first interaction with the simulation was probably The Matrix. Ours was. In the original script for The Matrix, the Wachowskis conceived of the Matrix as a simulation collectively produced by human brains chained into a neural network.

Ignorance is Bliss

The studio thought humans-as-computers was too confusing a concept for mass-market audiences, so they made the thermodynamically questionable decision to turn humans into batteries that powered the simulation. That was probably the right commercial call. The Matrix franchise has done nearly $2 billion in worldwide gross. More impactfully, it introduced the masses to the idea of a simulated world generated indistinguishable from the “real” one.

It’s no wonder that this idea has taken hold of our collective imagination. It’s certainly the right kind of weird but it’s also surprisingly hard to disprove. If the observations are the same, and the actions are the same, then the computation is the same. If what you see is the same and what you do is the same, it doesn’t matter whether you’re in a simulation or reality. It doesn’t matter whether you’re walking down a real street or a simulated one. Your brain processes both identically. Neo had no idea he was in the Matrix until Morpheus woke him up.

Christopher Nolan, throwing audience confusion to the wind — savoring it, even — released Inception3 in 2010. Dreams within dreams within dreams.

Nolan’s central premise is that the dream is a controllable space from which information can be extracted or, more importantly, into which information can be implanted.

But it’s all just sci-fi, right?

In 1990, Jürgen Schmidhuber, a young researcher at the Technical University of Munich, published Making the World Differentiable.

The paper proposed building a recurrent neural network (RNN), a neural network with two jobs: first, learn to predict what happens next in a simulated world and second, use that simulated world to train an Agent to act in it.

The Agent wouldn’t need to interact with a “real” environment at all. It could learn inside the model. Inside a dream.

The following year, Richard Sutton, of Bitter Lesson fame, dreamt up a similar idea. In Dyna, an Integrated Architecture for Learning, Planning, and Reacting, he argued that learning, planning, and reacting shouldn’t be separate systems. They should be unified in a single architecture. Which would mean that it’s technically possible to build a model of the world, practice inside it, and transfer what you learn back to reality.

Both papers were visionary. They would have a lasting impact as progress in the field enabled the researchers’ visions to become reality. But coming when they did, both papers may as well have been sci-fi.

In 1990, the world had something like 100 trillion to 1 quadrillion times less compute than we have today. Back then, the entire world had maybe 10-100 gigaFLOPS of total capacity. Tens of zettaflops (10^22 FLOPS) of computing power were sold in 2024 alone. In 1990, the global digital datasphere was approximately 10 petabytes, a volume so small it could barely hold 0.005% of the video data we now use for a single training run. By 2026, that volume has exploded by a factor of 22 million to 221 zettabytes.

But technology improves, and the most powerful dreams do not die.

Nearly three decades later, in March 2018, David Ha (then at Google Brain) and Schmidhuber published a paper titled World Models.4

The paper asked: Can agents learn inside of their own dreams?

To answer their own question, Ha and Schmidhuber built a fictional system with three components: a vision model (V) that compressed raw pixel observations into a compact representation, a memory model (M), a recurrent neural network that learned to predict what happens next, and a tiny controller (C) that decided what to do based only on V and M’s outputs.

The World Model was V + M: it could take in observations and imagine plausible futures. The controller was the Agent or policy: it chose which actions to take.

World Model + Agent

The paper joined in conversation with those centuries of thought experiments, novels, and movies. A dream might be reality, reality might be dreams. But what if we could actually act in our dreams? What would that do to reality?

Ha and Schmidhuber trained their World Model on observations from a car racing game and a first-person shooter game. The World Model generated new digital worlds. Then, they let the Agent practice entirely inside the World Model’s hallucinated dreams. Afterwards, they transferred the learned policy back to the actual environment.

And... it worked. The Agent could solve tasks it had never encountered in reality. The dream was real enough.

It was shocking, from a computer science perspective. But was it really so surprising? Isn’t this how humans navigate the world?

Ha and Schmidhuber noted that humans constantly run World Models in their heads. A baseball player facing a 100 mph fastball has to decide how to swing before the visual signal of the ball’s position even reaches their brain. The reason that every at-bat doesn’t result in a strikeout is that batters don’t react to reality, but to their brain’s “internal World Model’s” prediction of where the ball will be.

Donald Hoffman, Professor of Cognitive Sciences at University of California, Irvine, takes that idea a million steps further. He believes that we all walk around wearing “reality headsets” that simplify the staggering complexity of the quantum world into a user-friendly interface. Reality is too rich, so we navigate it via a sort of persistent waking dream.

This rabbit hole goes as deep as you want it to. But it’s World Models all the way down.

Ha and Schmidhuber showed that computers might be able to approach the world like we do: creating simulations to predict future states based on actions, acting based on those predictions, updating, and looping.

Actions, not words.

Language is Not Enough (Neither is Code)

Let’s play a game.

Clap your hands five times.

Now, instead of physically clapping your hands, I want you to describe clapping your hands using just words.

Where they are positioned in space, where they are relative to each other, by the picosecond. The points of contact. The sounds. What your hands look like as they move closer to each other, make contact, and pull apart. How they squish each other. What happens to the air between your two palms. What you see while your hands clap. Don’t forget your arms. How do they bend to facilitate the claps? Remember to do this by the picosecond, too. How does the fabric on your sleeve respond? What is happening in the background? Did the person next to you notice you clapping? How did they respond? Did you get fired for clapping in the middle of the meeting, following the instructions of an essay you shouldn’t have been reading while you should have been paying attention to work? Describe to me the vein on your boss’ forehead. Is it popping?

You can’t, can you? OK, stop. The point is made.

Language is an incredibly lossy compression of reality.

Language is important, of course. It is how we communicate and coordinate. The game Charades illustrates that to communicate ideas, language can be much more efficient than actions. LLMs are important in that capacity. But language alone is not enough.

What about code? Code is a form of very precise language that makes machines do things.

I asked Claude to “code me a simulation of hands clapping five times in a realistic environment.” It built me this. Which looks very painful.

Hand-Clapping Simulation Generated by Claude

There is a belief that, with scale, language and code will be able to solve all spatial-temporal intelligence challenges and produce Artificial General Intelligence (AGI) or Artificial Superintelligence (ASI).

Some argue that code is the key to solving many real-world intelligence challenges because it can perfectly instruct all physical form-factors with precision.

We do not share that belief. A code-based simulation is a poor version of a dream. It is rule-bound and unable to handle the stochastic messiness of reality.

To know the world, you must interact with it.

In The Glass Bead Game (Das Glasperlenspiel), a novel by Herman Hesse that won him the Nobel Prize for Literature in 1946, readers are introduced to Castalia, a future intellectual utopia devoted to pure thought. At Castalia’s center is an elaborate game, the titular Glass Bead Game, that synthesizes all human knowledge into a single formal language. Players compose “games” the way one might compose a fugue. A move might link a Bach cantata to a mathematical proof to a passage from Confucius. The game is the ultimate abstraction: all of human culture compressed into symbolic manipulation.

The protagonist, Joseph Knecht, rises to become Magister Ludi, Master of the Game, the highest position in Castalia. But he grows disillusioned. The game, for all its beauty, is sterile. Castalia’s intellectuals have retreated so far into abstraction that they’ve lost touch with the world. They can represent reality with extraordinary elegance, but they cannot act in it.

Knecht ultimately decides he must leave Castalia, and becomes a simple tutor. He chooses the messy, embodied, unpredictable world over the perfect symbolic one. He dedicated his life to the Game, the mastery of which involves operating on a level of abstraction beyond words, something closer to world modeling. But it wasn’t enough. Symbols alone, without contact with reality, eventually run dry.

Large Language Models are our Castalians. They are exquisite manipulators of symbols, capable of drawing connections across the entirety of human textual knowledge. They can discuss physics, compose poetry, write code, and explain the rules of baseball. They are, genuinely, one of the great intellectual achievements in human history.

But they operate entirely in the realm of representation. They can describe clapping, but they cannot clap. They can talk about gravity, but they do not know gravity the way a toddler knows gravity. They do not learn, the way a body learns, through thousands of falls and stumbles, what “down” means.

Language models predict the next token extraordinarily well. The only problem is that tokens are like shadows on Plato’s cave wall. And you cannot code your way to a realistic stadium crowd any more than you can describe your way there.

The real world is — or wasuncomputable.

If language and code, two of mankind’s most powerful inventions, are inadequate to represent our world, what do we have left?

The Answer is World Models

World Models offer another approach on the path to AGI. They offer a path to compute the things that are, today, uncomputable. They learn from the messy contact with reality that Knecht sought.

World Models offer a way to do non-deterministic compute efficiently, and to run simulations that shouldn’t be possible under traditional compute constraints.

World models are not a replacement for LLMs. Language remains essential; text can be used to condition World Models, to tell them what scenario to imagine, what goal to pursue, to give them a long-term goal. The thinking and the doing work together. But the doing has to come from somewhere other than text.

Joseph Knecht must come down from Castalia.

Real intelligence must come from observation of the world; from understanding actions and their consequences; from the things that language can only point at.

The Dao that can be told is not the eternal Dao.

In the beginning was the Word. Then came humans, to act imperfectly and unpredictably.

Maybe this is the way of things. In the beginning were LLMs. Then came World Models.

What Are World Models?

A World Model simulates environments and responds when you act inside them.

More formally, a World Model is an interactive predictive model that simulates spatial-temporal environments in response to actions.

While LLMs predict the next word in a sentence, World Models predict the next state (as in, the immediate future), conditioned on the current state and control input.

More succinctly: LLMs learn the structure of language. World Models learn the structure of causality.

This is a simple definition of World Models. It is accurate, but it’s not enough to understand how World Models work. For that, you’ll need to know four things:

  1. What World Models do,

  2. How they’re built,

  3. Why “action” is so important, and

  4. The relationship between World Models and policies.

What World Models Do

Think about what happens when you catch a ball. Your eyes take in a scene: the thrower’s arm, the ball in flight, the wind, the sun in your eyes, all of it. From that flood of sensory data, your brain builds a compressed model of what’s happening and, crucially, what’s about to happen. It predicts the ball’s trajectory a few hundred milliseconds into the future. Then it sends a motor command to your hand. You catch the ball. The whole loop — observe, predict, act — takes a fraction of a second and involves no language or “thinking” whatsoever.

A World Model does the same thing, computationally. It takes in observations (often video frames, though it can use any sensory data), builds a compressed internal representation of the environment’s state, and predicts how that state will change in response to actions.

It is, in essence, a learned physics engine, but one that doesn’t rely on hand-written equations. Instead of calculating gravity, collision, and friction from first principles, it has watched gravity, collision, and friction billions of times and learned the patterns.

This makes World Models a powerful tool for building Agents, AI systems that act in environments. World Models help Agents in three ways:

  1. They serve as surrogate training grounds. An Agent can practice inside the World Model (basically, inside a dream) and transfer what it learns back to reality. This is important for safety (some things should not be tested or trained in the real world) and cost or sample/data efficiency (real world data is expensive, costly to gather, not available, you need a lot of it, etc.).

  2. They enable planning over longer time horizons. An Agent can “imagine” the consequences of different actions before committing to one, the way a chess player thinks several moves ahead, except here, the board can be any environment or the real world.

  3. They provide rich representations of the world for Agents to learn behaviors from. An Agent trained on a World Model’s internal representations learns to “see” the world in terms of the features that matter for acting in it, rather than raw pixels.

For these three reasons, the promise of World Models is that they are a path towards generalization. If you can create worlds that respond to actions the way the real world does, you can use them to safely, economically, and efficiently train embodied agents that can act in any virtual world, or the real one.

To be clear, this is the massive question in World Models: whether the simulated environments are faithful enough to reality that you can train on them and have that training transfer to the real world or more generally, whether you can “pre-train in sim.” Increasingly, the answer seems to be yes.

Ai2, the Allen Institute for AI, is a non-profit founded and funded by the late Microsoft co-founder, Paul Allen. It does great open source research and tooling, including its recent release of MolmoBot, an “open model suite for robotics, trained entirely in simulation.”

“Our results show that sim-to-real zero shot transfer for manipulation is possible,” they tweeted.

Dhruv Shah, a Princeton professor and Google DeepMind researcher who worked on the project, shared: “Within the scope of easily simulate-able tasks, a purely sim-trained policy outperforms SOTA VLAs trained on thousands of hours of real data!”

Ai2, MolmoBot

It is a pretty astonishing finding. A big focus of ours, and of the broader World Models field, is to expand the scope of tasks that are easy to simulate.

This is how it works. First, World Models imagine realistic environments and future states, ideally that respond to actions or instructions in the way the real and virtual worlds they’ve been trained on do. Next, the Agents are let loose inside of the generated worlds to train. Then, the Agents are brought back into real environments and are tested on what they’ve learned.

This is what Ha and Schmidhuber demonstrated in 2018. It remains the central promise of the field.

How World Models Are Built

World Models are fairly young. No single approach or combination thereof has proved superior, which means that the final architecture for general World Models is still an open question. There are, however, repeatable ingredients for training.

Start with data; massive quantities of observation data. Often, observations are paired with the actions taken to produce them. This pairing can come about in several ways. Observations (typically video) are collected in advance and actions are either recorded alongside them, or inferred via another model after the fact. Alternatively, the model learns by taking actions itself, generating its own observations and action data through direct interaction with an environment.

When the training data is observations or videos, the raw frames serve as observations of an environment unfolding over time. These videos are ideally labeled with the actions that produced them (either because they were recorded together or inferred with a separate AI model). The actions provide the causal link: what someone did that made the environment change. A gameplay clip where a player turns left and the camera pans to reveal a hallway. A driving recording where the wheel turns and the car follows a curve. A teleoperation session where a robotic arm reaches and a cup moves. In each case, the model sees a before, an action, and an after.

When the model learns through interaction, the same structure applies — before, action, after — but the data is generated on the fly rather than collected in advance, and the actions come from the model’s own developing policy rather than from an external source.

The World Model’s core objective remains the same: given the current state and an action or instruction, predict the next state. It sees frame t and action a, and tries to produce state frame t+1.

But predicting raw pixel worlds for everything can be expensive and often wasteful. Most of what’s in a video frame doesn’t change from one moment to the next; the walls stay where they are, the sky remains the sky. And most of the details within a frame are redundant; the color of the sky, the texture of a wall. They could be described in a more compact form.

So modern World Models involve a latent space: a compressed, learned representation where only the most essential information is retained.

The visual encoder compresses each frame down to a compact vector (a mathematical fingerprint of the scene) and the model learns to predict the next fingerprint — not every pixel in the 4K frame — in response to actions. This is where the computational efficiency comes from.

To accurately model the evolution of the world, World Models must also learn to represent the full set of possible outcomes. This uncertainty in outcomes is usually referred to as the stochasticity of the environment.

World Models have to learn to navigate what they don’t know yet (epistemic uncertainty: for example, a model that has never seen a traffic light will not know that red follows after yellow) and the inherently unknowable (aleatoric uncertainty: the randomness, like rolling dice5).

Even when the model has learned all that’s possible to know about the behavior of the environment (it has reduced its “epistemic” uncertainty to a minimum), there will almost always be some inherent uncertainty (“aleatoric” uncertainty) in what happens next. This is in contrast to pure entertainment video models, which only need to be able to predict a common evolution of the world state to perform well.

If you use a straightforward prediction approach (for example, a model naively trained with Mean Squared Error, or MSE) to predict a car turning a corner, the model can become ‘blurry’ because it averages every possible outcome. The car could turn and stay in the left lane, or it could merge into the right lane. The trajectory that actually minimizes the error is the implausible one where the car stays in the middle of the two lanes. That’s the blurriness, and different models handle it differently.

Diffusion models avoid this problem by gradually diffusing towards the outcome, enabling the model to commit to a specific mode of the outcome distribution, sampling a sharp, plausible future rather than averaging all possibilities.

Autoregressive models with multiple tokens per outcome also handle multimodality; by sampling one token after the other, they ensure that future token predictions are consistent with previous ones.

JEPA-style architectures, by contrast, address blurriness by simply sidestepping it. JEPA largely avoids having to model that distribution explicitly by never decoding back to pixel space at all. It operates in a space where averaging is less catastrophic, because we don’t expect these models to predict frames, but rather to develop representations that are useful for downstream tasks.

What comes out of this process depends on what you need. If you’re building a visual world simulator — something you can watch or explore — you decode the latent predictions back into pixels through a visual decoder, producing imagined video of plausible futures. This is what makes the demos from Google DeepMind and World Labs look realistic and impressive.

There are a number of approaches used to train World Models. We will cover them and how they evolved and built on each other through the lens of the brief eight-year modern history of the field shortly.

For now, keep this in mind: observation data in, paired with the actions that caused what’s happening in those observations, train World Models to predict the next state, Agents train to predict the next action in those Worlds.

Why Actions are the Ultimate Form of Compression

Here is a key insight behind World Models: actions are the ultimate form of compression.

Consider what happens when you decide to step left to avoid a puddle. Your brain processes the visual scene (the sidewalk, the puddle, the people around you, the curb, the approaching bus), predicts the immediate future (the puddle won’t move, the bus will pass, the person behind you will keep walking), evaluates options (step left, step right, jump, accept wet shoes), and selects one.

An outside observer can’t see inside your head, can’t know exactly what you were thinking, can’t know what you’re processing subconsciously. They don’t know if you’re tired or if you’re in a rush. They don’t know your moral code, how you, specifically, would answer the Trolley Problem. They don’t need to. They see the output of all of that near-instantaneous calculation: step left.

That, to me, is magic.

Of course, not everyone makes the right decisions. Play the video forward and you are able to learn the consequences, too. Step left, into an even bigger puddle. Step left, and get clipped by a car. Step left, and knock a baby out of its stroller. Over billions and billions of observations and instructions and actions, we learn not just how humans decide to respond based on inputs, but the consequences of those decisions. The collective World Model learns to act smarter than any individual.

Zoom back into the individual. If you could perfectly reconstruct someone’s stream of observations and actions, you would have a nearly complete record of their interaction with reality. You would know what they saw and what they did about it. The World Model learns exactly this mapping. It compresses space and time into a compact representation, and then uses actions to unroll what happens next. That’s what makes World Models so computationally efficient.

It’s also the same reason why World Models can handle stochasticity that traditional simulation cannot. To understand why, let’s revisit our Man U match with our new understanding of how World Models work.

In a traditional simulation engine, every possible behavior must be coded. If you want a thousand soccer fans to react realistically to a goal, you need to write rules for each type of reaction. The computational cost scales with the number of Agents and the complexity of their interactions.

In a World Model, the cost is fixed to one neural network pass. The stochastic, messy, human reality is already baked into the learned weights and absorbed from the millions of hours of video the model was trained on. The model doesn’t calculate what a crowd should do. It has seen what crowds actually do and it uses this information to make probable predictions.

This is what I mean when I call World Models compute for the uncomputable. Traditional computing is deterministic: known inputs, known rules, known outputs. The real world is not deterministic, so World Models don’t even try to code these things in. They watch, learn, and do, at a fixed computational cost, regardless of how complex the scenario gets.

World Models and Policies

There is one more distinction to make before we go further, one that gets muddled in typical conversations about World Models.

A World Model is a simulation of the environment; it takes in actions and produces predicted observations; it shows you what will happen if you do something.

A Policy is the brains of the Agent that acts within that environment. It takes in observations (and often instructions) and produces actions; it decides what to do.

The World Model is the dream. The Policy is the dreamer. The dreamer acts, and the dream responds. The dream responds, and the dreamer acts.

In practice, the relationship between the two turns out to be even more intimate and intertwined than that distinction suggests. Recent research has investigated training policies on top of World Model foundations or building them together from the get go. Start with the weights of a World Model — a system that has learned how to predict what happens next — and then, instead of training the model to predict future frames, or states, you train it to predict future actions.

A system that learns to predict the world can also learn much faster how to act in it. Understanding and doing aren’t two separate skills bolted together. They are the same skill, seen from different angles. At least this is what our research, and that of other labs, is starting to suggest.

That means that if you build a good enough World Model, you can also more effectively train a policy to act in the worlds it generates.

This is one of many important things the field has learned in a very short amount of time. Turns out intuition and imagination are two sides of the same coin.

A (Very Brief) History of World Models

On one hand, it should be very easy to summarize the modern history of World Models. It has only been eight years since Ha and Schmidhuber published World Models.

On the other hand, an awful lot has happened in just eight years. In that time, the field has gone through four waves: major periods when the field shifted its focus to prioritizing new questions. We highlight some of the most important papers here, and not boring world subscribers can find a full downloadable list of key papers at the end of the essay.

Wave 0, in 1990-1991, was the pre-deep learning era. Researchers first articulated the idea that Agents could learn internal models of the world and use them for prediction and planning. They asked, and answered, the question: what would a World Model do?

This is Richard Sutton and Dyna. This is Jürgen Schmidhuber and Making the World Differentiable. Before we had the compute, the data, or the architecture, we had the dream, waiting in dreamspace for reality to catch up.

Wave 1, in 2018-2019, asked: “Can this even work?”

Based on Ha and Schmidhuber’s work, the first paradigm involved using Video Auto-Encoders (VAE) to compress frames, model dynamics with Recurrent Neural Networks (RNN), and train policies inside the resulting dreams. So: compress what you see, predict what comes next, and train Agents to act inside that simulation.

At the time, the question was whether learning in imagination — dreams — was feasible. Researchers attempted to answer it using small models and simple environments to generate proof-of-concept results. Quite literally, the next big thing started out looking like a toy. Model Based Reinforcement Learning for Atari introduced the Atari 100k benchmark: whether the SimPLe algorithm could learn Atari games with only 100,000 real environment steps, or about two hours of gameplay.

The World Model Inside of SimPLe

The answer was yes. SimPLe learned how to play 26 Atari games and beat a competitor model on sample efficiency, or how many steps it took to reach a given score.

But could it play as well as humans?

That was the question that drove Wave 2 (2020-2022): “Can the World Model match human performance?”

DreamerV2, developed by Danijar Hafner at Google DeepMind, reached an answer quickly. They used a Recurrent State-Space Model (RSSM) with discrete latent representations — a system that maintains a compressed, running memory of the world and updates it with each observation. DreamerV2 became the first World Model Agent to achieve human-level performance across the 55-game Atari benchmark6. It was trained entirely in imagination, on a single GPU.

That same year, another DeepMind team published Mastering Atari, Go, chess and shogi by planning with a learned model in Nature. The paper described its MuZero model, which also beat Atari games (and others like Go), but did so by taking almost the exact opposite philosophical approach.

Comparison From the DreamerV2 Paper

Whereas DreamerV2 generated observable dream environments and trained inside of them, MuZero never generated anything observable at all, planning entirely in abstract latent representations it invented for itself, and it did well.

It did so well, in fact that it leapfrogged the Go-specific models. In 2016, DeepMind’s AlphaGo beat human Go Champion Lee Sedol 4-1. It had been trained on a large database of human expert games plus self-play, with the rules of the game hard-coded in. The next year, AlphaGoZero beat AlphaGo 100-0 after being trained entirely from self-play with no human game data at all, just the rules. That same paper season, AlphaZero generalized AlphaGoZero’s approach to other games, like chess and shogi, both of which it came to dominate within hours. Then in 2019 (pre-print), MuZero learned everything, including the rules, the game dynamics, and the value function, from scratch, purely from observation and outcome. It matched AlphaZero on Go, chess, and shogi (where AlphaZero knew the rules) while also generalizing to 57 Atari games (where “rules” aren’t even a well-defined concept).

MuZero

With each new model, something that humans had previously hard-coded — the rules, the strategy, the value of a position — was removed. The model learned each from scratch instead. MuZero was the terminus of that progression, entirely learned.

And MuZero did this without imagining future board states at all. It imagined hidden states, or abstract vectors it invented for itself during training that have no guaranteed correspondence to anything human-observable or interpretable. A human looking at MuZero’s internal representation of “three moves from now” would have absolutely no idea what it was thinking. And yet… it outperformed all previous models.

With MuZero’s success, the field now had two opposing schools of thought: generative World Models that produce observable futures, and latent World Models that predict in abstract space, even if they weren’t called “latent” yet.

From then on, progress in World Models has happened in both directions, generative and latent.

On the latent side, in 2022, Yann LeCun published a sweeping position paper from his dual positions at Meta and NYU Courant proposing a fundamentally different philosophy from generative models, one that looked more like MuZero: A Path Towards Autonomous Machine Intelligence. His new World Models company, AMI, is named after this paper.

LeCun’s Joint Embedding Predictive Architecture (JEPA) argued against generating pixels entirely. Similar to MuZero, instead of predicting what the world will look like, JEPA predicts what it will mean. It forecasts abstract representations of future states, deliberately discarding unpredictable visual details.

That same year, on the generative side, IRIS (2022), developed by Vincent Micheli and Eloi Alonso, two of General Intuition’s future co-founders, reframed World Modeling as language modeling over a learned vocabulary of image tokens. Instead of recurrent state-space models, IRIS used a GPT-style autoregressive transformer over discrete visual tokens. Basically, IRIS borrowed the machinery of language models and applied it to World Modeling.

In doing so, IRIS filled a number of previous gaps. The IRIS World Model was, in effect, a language model, but its vocabulary was images and actions instead of words. This brought the scaling properties of LLMs directly into World Modeling: efficient attention, scaling laws, and all the engineering infrastructure that had been built for large language models could now be applied to learning about the physical world.

Where Dreamer was missing the ability to model the joint law of the next latent state (for example, to handle multimodality), IRIS represented the next latent state as a series of discrete tokens to predict autoregressively, which meant that it was now able to predict multiple outcomes. And while Dreamer beat humans by using much more data than they do, IRIS was the first learning-in-imagination approach to beat humans with the same amount of available gameplay data (two hours).

IRIS Results

JEPA aside, practically all of the work up to this point in World Models happened within games, and it’s worth floating between Wave 2 and Wave 3 for a second to appreciate the special relationship between AI and games.

Games have always played an important role in the development of AI. Claude Shannon’s 1950 paper Programming a Computer for Playing Chess is one of the founding documents of AI. In 1959, Arthur Samuel’s checkers program introduced the concept of machine learning itself. The first time the world woke up to the idea that intelligent machines could beat humans at anything was when IBM’s Deep Blue beat Garry Kasparov in chess.

Garry Kasparov (l), dejected

Before DeepMind was an AI lab, Demis Hassabis was a game designer. At 17, he designed the commercially successful Theme Park. DeepMind’s founding breakthrough is detailed in the DQN paper, published in Nature in 2015, in which it was demonstrated that Atari games could be played from raw pixels using deep reinforcement learning. Then came AlphaGo in 2016, which beat the world champion at Go, a game that once was believed to require the kind of intuition that was uniquely human, with more possible board positions than atoms in the universe.

The path from AlphaGo to AlphaFold ran through exactly the insight that World Models formalize. As Hassabis put it:

Wouldn’t it be incredible if we could mimic the intuition of these gamers, who are, by the way, only amateur biologists?

General Intuition is named after this quote from Demis, which points towards a future where our models power research far beyond the dynamics of what pixels can describe today, beyond games themselves, and into our bodies.

And then DeepMind taught machines how to fold proteins. AlphaFold won Hassabis and his DeepMind teammate John Jumper the 2024 Nobel Prize in Chemistry.

Games are fun, of course. But the reason games keep showing up is that games are the only domain where you get massive amounts of labeled spatial-temporal data with clear action-outcome pairs, consistent physics, unambiguous reward signals, and a controlled environment where you can run millions of experiments. The real world has none of these properties.

Early World Models, like a human child, spent most of their time watching and playing games. The Atari 100k benchmark became the standard arena for World Model research, DreamerV3 played Minecraft, and many current World Model companies retain a connection to games, with many World Models being “playable.”

Games are the lab bench of embodied AI. But they are only a small fraction of the ambition.

For World Models to be truly useful, they need to interact with the world.

That’s Wave 3 (2023-2024). It asked: “Can World Models be truly interactive?”

We got the first answer from driving. GAIA-1 (2023), developed at Wayve, scaled the sequence-modeling approach pioneered by IRIS to 9 billion parameters and trained on real-world driving video. It could generate driving scenarios in response to actions (steer the car), text prompts (“rainy day, highway”), or both. Anthony Hu, who led this research, now leads World Modeling at General Intuition.

GAIA-1 confirmed that the scaling laws everyone had observed in LLMs also held for visual World Models. More data and more parameters yield predictably better performance for World Models, too. This was not a given. It meant that the path forward was clear even if it was expensive: scale up and the models get better.

The following year, DIAMOND (2024), developed by future General Intuition co-founders Eloi Alonso, Adam Jelley, and Vincent Micheli, opened a new architectural frontier. Rather than compressing observations into discrete tokens and predicting them autoregressively, as researchers had been doing since IRIS, DIAMOND used diffusion models to predict future frames directly.

The visual fidelity was meaningfully richer, and that richness translated directly into better Agent performance. The subtle visual details that discrete tokens discarded, the little clues that tell you a surface is slippery, a door is ajar, a person is about to change direction, turned out to matter for decision-making, which is unsurprising when you think about it.

As a brief aside, it’s worth noting that many of the open source advancements that have been made in World Modeling were built on top of the DIAMOND architecture. Multiverse, the first AI-generated multiplayer game, is DIAMOND-based, as is Alakazam, the “1st ‘World Model game engine’.” DIAMOND is essentially the Deepseek or Llama of Generative World Models.

DIAMOND itself set a new best on Atari 100k and demonstrated something that captured the public imagination: trained on Counter-Strike gameplay, it produced a fully interactive, playable neural game engine from roughly 87 hours of footage on a single GPU.

It showed that it was possible to run an interactive 3D World Model in real time too.

Counter-Strike environment generated by DIAMOND with just 87 hours of footage

DIAMOND got very good at playing Atari. The Agent plays a real game and gathers real data there, with which it trains the World Model. Then it tests itself inside the World Model’s synthetic environment, gets better in there, and goes back out for more real interaction, to test itself in the wild. This loop between ground truth and synthetic, back and forth, is how World Models improve, almost like working problems out in a lucid dream then testing them in reality upon waking. This is the Dyna paradigm mentioned earlier.

Would that loop work in real-world conditions?

It turns out that the answer is yes, too. And that it would work beautifully.

GAIA-2 (March 2025) pushed the diffusion approach to its most ambitious application yet: multi-camera autonomous driving simulation. Using latent diffusion with flow matching and space-time factorized transformers, the model could generate high-resolution surround-view driving video conditioned on ego-vehicle dynamics, other Agents’ trajectories, weather, time of day, road structure. In short, it could reproduce the full complexity of real driving. It could simulate scenarios that were too dangerous or too rare to collect from real roads: sudden cut-ins, emergency braking, pedestrians stepping off curbs.

GAIAs 1 and 2, and DIAMOND, like IRIS, were the products of researchers we now get to work with at General Intuition. Diffusion or flow-matching models like GAIA-2 were the starting point of our team’s current research efforts.

But they are not the only approach.

Google DeepMind is one of the central players in this space. Their World Model, Genie (2024), is an 11-billion-parameter model trained on unlabeled internet video of 2D platformer games. It learned an action space entirely from scratch; no one ever told the model what the controls were. Give it any image and it can generate a playable world from it.

Genie: A Whole New World

OpenAI’s Sora (2024, with Sora 2 following in 2025) and Google’s Veo 3 (2025) pushed video generation to extraordinary visual quality and framed these systems explicitly as “world simulators.”

The field’s vocabulary can get muddied. Let’s make it clear.

Video generation models produce beautiful visual sequences, but they aren’t quite World Models in the sense that we’ve been describing them. In these videos, you can’t take an action and watch the environment respond live to your intervention. They predict what a scene will look like over time; they don’t model what happens because of what you do.

Think of the difference between watching a movie of someone driving and actually steering a car. The visual output might look similar, but the underlying computation is fundamentally different. Interactivity, the ability to take actions and observe their consequences, is what separates a World Model from a very impressive video.

And interactivity is what it takes to impact the real world.

This is the central question of Wave 4, the wave we’re in right now: “Can models act in the real world?”

As in: Can Agents trained in World Models work outside of research settings, in real vehicles, real robots, real deployments? We are now getting awfully close to sci-fi’s predictions.

This is where the current frontier is being pushed. Right now. As you read this.

Comma.ai took the most direct path in driving from World Model to product: Learning to Drive from a World Model. They trained a driving policy entirely inside a learned World Model — inside the dream — and deployed it in openpilot, their open-source driver assistance system running on production vehicles driven by real people. The World-Model-trained policy outperformed both traditional imitation learning and policies trained in conventional simulators. This is arguably the first consumer product powered by a World-Model-trained Agent.

In robotics, Meta’s V-JEPA 2 animated LeCun’s latent prediction philosophy. The model is the clearest large-scale proof point so far. It’s a 1.2B-parameter model pre-trained on over a million hours of video via self-supervised masked prediction: no labels, no text. In the second stage, it fine-tunes on just 62 hours of robot data from the Droid dataset. It turns out that is enough to produce an action-conditioned World Model that supports zero-shot planning. V-JEPA 2 was deployed zero-shot on real Franka robot arms in new environments to perform pick-and-place tasks. It planned all of this entirely in latent space, without pixel generation, task-specific training, or hand-crafted rewards. And it was fast; where pixel-space approaches took minutes to plan a single action, V-JEPA 2 did it in seconds.

Google DeepMind’s SIMA 2 took an entirely different approach. Rather than build a dedicated World Model, it fine-tuned Gemini, its large foundation model, to act directly as an Agent in 3D game environments. SIMA 2 can reason about high-level goals, follow complex multi-step instructions, converse with users, and generalize to unseen environments.

It represents an alternative paradigm: instead of building a specialized World Model, leverage the implicit world knowledge already embedded in a model trained on the breadth of human knowledge.

This is one of the field’s open questions. Will this path, using a large foundation model or a video model as the basis for an Agent, rather than training an Agent from scratch in a World Model, win out?

In fact, there are many open questions. And nearly as many World Model startups trying to answer them.

The State of the World (Models)

That brings us to the present moment.

What has become clear is that talented researchers and investors alike are excited by World Models’ potential, as evidenced by the massive funding rounds to support companies led by legends in the field.

In February 2026, World Labs, the company founded by legendary researcher Fei-Fei Li, announced that it had raised a fresh $1 billion from investors at a $5.4 billion post-money valuation.

Not to be outdone, Yann LeCun, who launched AMI Labs in late 2025, announced last week that it had raised $1.03 billion at a $3.5 billion valuation.

In October 2025, our company, General Intuition, announced $133.7 million in a very large seed round. Last summer, Decart raised $100 million at a $3.1 billion valuation. In November, Physical Intelligence raised $600 million at a $5.6 billion valuation for its robot foundation models. And just this past February, Wayve, the UK-based self-driving startup whose researchers built GAIA-1 and GAIA-2, raised $1.2 billion at an $8.6 billion valuation.

Google DeepMind, which doesn’t need to fundraise because it’s fueled by history’s greatest business machine, is pouring resources into SIMA, Genie, and Veo, and using it to power initiatives like Waymo. Demis has publicly stated that he believes World Models will become an important part of Gemini’s planning capabilities. GDM is also merging many of these capabilities into a “Video Thinking” team, with the reasoning described best by Shane Gu and Jack Parker Holder from GDM.

What is less clear, but even more interesting, is that we are at the point in this technology’s development where we know that something big is happening, but it’s still unclear exactly which approach, or combination of approaches, will win out. We are seeing breakthroughs almost every day at General Intuition, and we hear rumors of leaps happening in other labs too.

Below is a framework in which to fit any news you see coming out on World Models. We won’t cover everything, and we apologize in advance if we miss your embodied AI of choice. A fun exercise for the reader will be to fit what we’ve missed into what we’ve laid out.

World models have three main types of approaches: Current Foundation Models, World Models, and Embodied Agents.

The thing to keep in mind here is that, despite different World Model approaches, we all share the same end goal. The end goal is to produce Agents that generalize and do things in various environments, including the real world. Some of the Agent approaches get there using LLMs as their stepping stone, others start with video models. Other agent approaches use World Models as their training environments. And some Agents learn directly from experience.

With us? Goed, daar gaan we dan!

Current Foundation Models

Current Foundation Models are the ones that learned to make sense of the world’s data without being able to simulate the stochastic world environment itself. They are models that process inputs — text, images, video — and learn to predict, generate, or reconstruct. But they don’t yet give an Agent a place to act. They are not action-conditioned. They don’t respond or interact. They are potential substrates on which World Models can be built, or even, in some cases, on which Agents are pre-trained.

Three categories of stepping stone models we’ll focus on here are Large Language Models, Video Models, and 3D Reconstruction Models.

Large Language Models

LLMs learned from staggering quantities of text that the world has structure. They know that a glass falls when pushed, that fire is hot, that if you leave the house without an umbrella in a rainstorm you will get wet. They encoded an enormous amount of causal and physical knowledge. But none of this was from experience. Like digital Castelians, they read about the world rather than perceiving it. This makes them extraordinarily useful as a backbone for reasoning and planning, which is why you’ll find LLMs embedded in many agent architectures we’ll discuss later. But a language model alone cannot simulate what happens when a robot arm reaches for a cup.

In our context, LLMs are particularly relevant when we discuss VLAs, or Video Language Action models, which take advantage of the enormous amount of research, capital, tooling, and infrastructure that has gone into developing LLMs in order to bootstrap robots that can do things in the physical world.

Video Models

Sora. Veo 3. Kling. Seedance 2.0. Runway. Pika. Moonvalley. Haiper. Luma AI.

No one confuses an LLM for a World Model, but plenty of people conflate Video Models and World Models.

These models are trained on the enormous amount of video data on the internet, and produce extraordinary videos themselves. Sora can generate a convincing shot of a woman walking through a neon-lit Tokyo street. Veo 3 can render photorealistic scenes with synchronized dialogue.

But you can’t interact with them. You can’t take an action inside of them and watch the environment respond instantly. They predict what a scene will look like over time but they don’t try to model what happens because of what you do.

Of course, the lines get blurry.

Odyssey, founded by self-driving heavyweights Oliver Cameron (ex-Cruise) and Jeff Hawke (ex-Wayve), is building “a world simulator that dreams in video.” Currently, they don’t let you take an action and watch the environment respond, but they do let you prompt the video mid-stream to steer it in real-time. Where do you draw the line?

Wherever the line is, these video models are getting good, and really funny.

Really, really funny.

Video models aren’t quite World Models in the sense we define them; they are a stepping stone. Runway began as a video generation company – its Gen 4.5 is among the best on the market – but has concluded that physics-aware video generation is a path toward something bigger. This thinking led to GWM-1, their explicitly labeled “General World Model, built to simulate reality in real time,” which is interactive, controllable, and general-purpose. The real value, financially and societally, won’t come from video for its own sake, but from models that use video as a training environment on the way to controlling embodied systems.

3D Reconstruction and Generation Models

Take it a step further. What if you could navigate through the scenes depicted in video generation models? That feels like a world, right?

World Labs, led by the legendary Fei-Fei Li, the “Godmother of AI” who created ImageNet, is the most interesting example in this category. While the company is the one most people would associate with “World Models,” World Labs is not currently building what I would define as World Models.

Instead, in its early days, World Labs has focused on immersive virtual worlds, but not action-conditioned ones. Its first product Marble generates and edits persistent 3D environments from text, images, video, or 3D layouts. They call it a “Multimodal World Model.”

World Labs

Marble is thus far not interactive, other than being able to move through the generated environments. They say this themselves. On the Marble product page, World Labs frames interactivity as a future opportunity:

Future World Models will let humans and agents alike interact with generated worlds in new ways, unlocking even more use cases in simulation, robotics, and beyond.

It is worth noting that World Labs has recently started exploring World Models that do generate frames directly, instead of the underlying splats of the entire world.

World Models

A World Model, as we define it, is an environment that an Agent can act in, and that responds in real time. It is a simulation, a dream, one learned from observation and actions data rather than hand-coded. The Agent takes an action, the world changes, and the Agent observes what happened. Repeat, millions of times, across an enormous variety of situations, and the hope is that you get an Agent that generalizes, that can do things that were not in the original training data.

This is the key distinction that everything else hinges on: a World Model is action-conditioned. It predicts what the world will look like next given whatever the Agent did.

The intuition is simple. A robot trained only on real-world data has seen a finite set of kitchens, a finite set of cups, a finite set of ways a cup can fall. Put it in a kitchen it hasn’t seen, with a cup it hasn’t encountered, and it struggles. A robot trained inside a World Model, on the other hand, has, in principle, encountered infinite kitchens because the World Model can generate them. Situations that would be rare, expensive, or dangerous to collect in the real world become routine in simulation. Out-of-distribution becomes in-distribution.

Within World Models, there are two main approaches: Latent World Models and Generative World Models.

I apologize for bringing you so far in the weeds here, but I want to clarify something that confuses people: both Generative World Models and Latent World Models rely on latent states, but Generative World Models rely on latent states that were designed with reconstruction objectives (autoencoders) which enable frame predictions, whereas Latent World Models directly build self-predictive representations.

Latent World Models were born in the darkness and still live there; Generative World Models were merely born in the darkness.

Latent World Models

Latent World Models are the descendants of MuZero but let loose in open-ended, no-rules environments like the real world.

This is Yann LeCun’s current world. Yann pioneered modern computer vision architectures with LeNet, where he introduced the idea behind convolutional neural nets (CNNs) in the 1990s. In the 2010s, he championed self-supervised learning, arguing that human labeling millions of examples doesn’t scale to real intelligence and that models should create their own signal from raw data. In the 2020s, he led the JEPA team. Yann is a GOAT.

The deep thread in Yann’s work is teaching models to learn useful representations of the world automatically from raw data. Latent World Models are the latest, and perhaps ultimate, strand in this thread.

The approach is philosophically the converse of Video Models or 3D Reconstruction Models, as mentioned earlier in the history section. While those approaches care about producing and understanding every pixel, latent World Models, like JEPA, says ne vous embêtez pas. The French would rather speak English to you than listen to you butcher their language. JEPA is similarly impatient; rather than let the model stumble over every pixel of an unpredictable future, it doesn’t predict pixels at all.

As LeCun puts it: “The world is unpredictable. If you try to build a generative model that predicts every detail of the future, it will fail. JEPA is not generative AI.”

Instead, JEPA learns to represent videos in abstract, compressed space and makes predictions there. It deliberately throws away unpredictable visual details. This makes JEPA potentially very efficient for planning and representation learning.

AMI Labs is LeCun’s bet that this approach is the path to real intelligence, and investors recently backed him with $1.03 billion.

AMI Launch Post

There are trade-offs to the latent approach, as there are trade-offs to generative approaches.

LeCun argues that the thing that seems like the biggest trade-off, a loss of fidelity in exchange for speed, is not actually a trade-off. His position is that the detail you lose is detail you should lose, that trying to predict every pixel is not just expensive but actively counterproductive — the model wastes capacity on inherently unpredictable visual details instead of learning the abstract causal structure that actually matters for reasoning and planning. Imagine if you needed to simulate every photon when you imagine catching a ball. Your brain might explode. There’s some level of detail that is not “every single detail” that is optimal. LeCun’s argument is that with World Models, the optimal level requires fewer details than many people, including us, think.

There are other trade-offs to keep in mind, however, that LeCun hasn’t mentioned.

One is that latent models are trickier to evaluate. You can’t look at the output and see if it makes sense intuitively the way you can with generated video, and they can’t serve as training grounds for human-in-the-loop systems, because humans can’t operate inside latent space. We need to see the world to act in it.

Another related downside is that your iteration speed slows down when you can’t visualize predictions or interpret the loss. Humans are very good at noticing when something is visually off; we did not evolve to spot discrepancies in predicted latent encodings of the future ([0.13, -1.02, 0.44, 0.07, …], MSE = 0.0187). And iteration speed is what matters most in modern ML, because modern ML progress mostly comes from empirical search, not from knowing the right design ahead of time.

Latent models are also more challenging to train for similar reasons. Additionally, the lack of strong supervision in the learning objective leads to collapse issues which require a bunch of tricks to fix. Why? The JEPA objective is to predict the encoding of the future based on the encoding of the past, but you can satisfy that objective with trivial encodings (e.g. set everything to 0, there is 0 loss), so we need to make sure representations don’t collapse.

There is a spectrum in creating environments in which agents can train. On one side is what is practical today, and on the other is the platonic ideal. Latent World Models are almost the opposite side of the Practical ←→ Platonic spectrum to VLAs, which we will cover below.

They are closer to what researchers believe to be the technical platonic ideal, but they face real challenges in practice today. That said, new methods like LeJepa are closing the gap, and talent is flooding into the field.

Chris Manning, Ian Goodfellow, and Fan-Yun Sun have also joined the cohort of Latent World Models, starting latent lab Moonlake. Their entrance on the side of latent is notable. Manning helped pioneer neural natural language processing and co-created GloVe, which was the dominant word-embedding model before transformers. And Goodfellow invented GANs (generative adversarial networks), which were the first widely successful way to train neural networks to generate realistic synthetic data.

In a recent X post, the Moonlake co-founders explained their approach to building efficient World Models. It is an interesting hybrid.

The plan is to generate full game environments to attract human players and collect action-labeled data. Afterwards, they model the world in semantic/symbolic space rather than pixels. As in, they use beautiful game environments to attract real human players, because they need humans to generate action-labeled data. But once they have that data, they discard the pixels entirely and train on abstract representations instead, betting that the underlying patterns matter more than the visual detail.

Ultimately, we don’t view latent and general models as opposed to each other. Moonlake’s hybrid approach is evidence of that. They just serve different goals. Latent World Models are generally more computationally efficient since they discard some information, with advantages for representation learning and planning. Generative World Models should be more general, since in theory they capture all visual information, with advantages for interpretability and generalization. Both can be used for many different purposes, including training agents with reinforcement learning.

Now, let’s turn to Generative World Models.

Generative World Models

Generative World Models are the closest thing to simulating human-perceived reality that we’re aware of. If our world is a simulation, it’s probably a Generative World Model of some sort.

This is the paradigm we at General Intuition predominantly focus on to build the World Models in which our policies learn. It’s also the one that recently blew the world’s minds when Google DeepMind released Genie 3.

The video — and, Genie 3 itself, if you get a chance to play with it — gives you a felt sense for what makes Generative World Models different. They’re interactive. They respond.

They generate human-observable, interactive futures that you can see, act in, and learn from. You can see what the model thinks will happen next. The model takes in a state and an action and produces a plausible next state, which you can act in again. Based on the updated state and new action, it produces the next plausible next state, and so on and so forth. A human can look at the output and say, “That’s wrong, walls don’t bend like that” or “Yes, that’s exactly what happens when you turn a steering wheel at speed.”

Generative World Models predict the observations themselves in pixels, video, or 3D scenes, allowing agents and humans to interact with the simulated environment. The dream is visible and playable.

This is what improves the training loop in many cases. Both generative and latent models can learn in imagination. Yet when visual details matter, or when the downstream task is not yet known, Generative World Model learning, with all its pixel-level detail, tends to outperform.

This only works if the generated environment is rich enough to learn from. The further the generated world is from reality, the worse the lessons the agent learns are. And the less successful it is when it goes back to real games. This is what DIAMOND showed, that when there is more detail in the generated world, agents are smarter.

At General Intuition, we are building on this diffusion and flow-matching architecture. It is developed, in part, by researchers who are now our co-founders and who built IRIS, DIAMOND, and GAIA-2.

Wayve, the birthplace of GAIA-1 and GAIA-2, is the leader in Generative World Models for autonomous driving. By using a large latent diffusion World Model offboard, they aim to dream up edge cases that would take millions of driving miles to find in reality, train driving policies on them, score the driving policies on their performance in simulation, then distill that dreamed experience into a smaller onboard policy that can reason through those same scenarios in real-time. The tweet below shows Wayve zero-shotting a drive on Japanese roads in the latest installment in a series doing this around the world.

Decart is applying Generative World Models to real-time generative simulation, producing playable worlds that respond to the users’ actions. It’s the playable version of the Generative Video Models or 3D Reconstruction Models. On the Oasis landing page, it calls the model a “video model,” but follows up with this distinction: “Every step you take will reshape the environment around you in real-time.”

Interestingly, Decart currently runs on Nvidia GPUs but plans to use Etched Sohu chips. Etched chips are custom ASICs designed to run transformers and would allow Decart to improve latency and run continuous inference, both of which are much more important when generating responsive worlds in real-time than when generating a video or 3D rendering upfront.

Runway, too, is blurring the lines between video generation and world generation, as mentioned in the Video Model section. During its Research Demo Day 2025, Runway co-founder and CTO Anastasis Germanidis explained the company’s evolution that started from “generative AI models [as] viable tools for creative expression”. They then evolved towards World Models (while still making incredible progress in video models.)

“To build a World Model,” Germanidis explained, “We first needed to build a really great video model. We believe that’s the right path to building World Models, that teaching models to predict pixels directly is the best way to achieve general purpose simulation.”

Google DeepMind took a similar approach; Genie 3 was built on top of Veo.

These World Models are extremely important. But remember that they are only half of the equation. From the very start, whether it was Schmidhuber in 1990, Sutton with Dyna in 1991, the plan was to use World Models to train Agents to act inside of the world, and then to transfer those learnings out into the real world.

Embodied Agents

We want to share a few of the main Embodied Agent examples out there today, and their respective approaches: Physical Intelligence and other robotics companies’ VLAs (Vision-Language-Action Models), DreamerV4’s a Latent World Model Agent, Google Deepmind’s Sima2 General Embodied Agent, and General Intuition’s General Agent approach.

Physical Intelligence - Vision-Language-Action Models (VLAs)

Modern multimodal LLMs come with a model called a VLM, or Vision-Language Model, a model that can see and read. Feed it an image and a question, like “What objects are on the table?” or “Is this door open or closed?”, and it produces a coherent, grounded answer.

GPT-5, Gemini, and Claude are all VLMs in this sense; they can see and reason. When you send it a picture of a mountain and ask it to geolocate it, it’s using its VLM. VLMs are also the perceptual and reasoning backbone of most modern Agent systems designed to operate in physical or interactive environments, like PaLM-E or SpatialVLM.

VLMs are not exactly Agents, but they are core components of most of them. We bring them up because a VLA is a VLM that has learned to act, and it is the pragmatist’s answer to the Agent problem.

In 2023, Google DeepMind published a paper called RT-2: Vision-Language-Action Models: Transfer Web Knowledge to Robotic Control to propose a solution.

Take a VLM that understands a scene and what to do in it, and then bolt on an action head that translates human language instructions into instructions that the robot understands, like to change a position or rotate.

Google DeepMind, RT-2

Since then, VLAs have become the dominant paradigm in robotics, and they’ve worked surprisingly well.

Every other paradigm we’re discussing says something like: “Images, videos, spaces, and actions are fundamentally different than words. We need to train and architect the models that generate them differently than the models that generate words.”

Vision-Language-Action Models (VLAs) say: “That may be true! Those approaches might be platonically better. But that doesn’t matter in practice, because the vision-language model infrastructure and data are so far ahead.”

In his not boring primer on robots, Standard Bots’ Evan Beard wrote a thorough explanation of VLAs for robotics that included what he called a “spicy take”:

We’re not working with language-model infrastructure because it’s the perfect architecture for robotics. It’s because we, as a species, have poured trillions of dollars and countless engineering hours into building LLM infrastructure. It’s incredibly tempting to reuse that machine.

So, despite its imperfections, taking an LLM and sticking on an action head to predict robot motions (all together known as a VLA) is the best way for us to train the base models that learn many skills from demonstrations across many different customers and tasks.

It’s pretty ingenious. Of course, there are challenges to this approach, which Evan highlighted:

  • Robotics success so far has leaned heavily on diffusion-style control

  • LLMs are autoregressive and token-based, with less room for error

  • Physical actions don’t map cleanly to tokens

Additionally, compared to World Models, VLAs require collecting a large amount of real-world robotics data; they don’t seem to generalize out-of-distribution particularly well.

That said, Physical Intelligence, known as π or Pi, has gotten incredibly far with its VLA bet.

Pi’s first generalist policy, π₀: Our First Generalist Policy, inherits semantic knowledge and visual understanding from internet-scale pretraining and trains on data from seven different robotic platforms across 68 unique tasks, including folding laundry, bussing dishes, routing cables, assembling boxes, and packing groceries, all of which require dexterity in the real world on real hardware. Their follow-up, π₀.5: a VLA with Open-World Generalization, performs better in new environments like cleaning up a kitchen or bedroom in a home the model has never seen before.

OK, but can it actually learn and get better over time as it works and makes mistakes in the real world?

November 2025’s π*0.6: a VLA that Learns from Experience suggests that it’s possible, with demonstrations in tasks like making espresso, folding boxes, and folding laundry.

But those are simple, repetitive tasks. Most of what the robot sees is in distribution. Can it actually do more complex, multi-step tasks that take a long time to complete?

Earlier this month, Pi released VLAs with Long and Short-Term Memory and showed that robots using MEM (Multi-scale Embodied Memory) can clean up an entire kitchen, set up the ingredients for a recipe, and grill a grilled cheese sandwich. They can also learn from their mistakes.

A robot tries to pick up a chopstick or open a refrigerator door. Without memory, it fails the same way repeatedly. Each attempt is a clean slate with no knowledge of what just went wrong. With memory, it tries a different approach after the first failure. And it succeeds.

MEM doesn’t change the underlying architecture, which is still sub-optimal for embodied systems. Most of the parameters still live in the language backbone. The action head is still downstream of reasoning. But Physical Intelligence’s existence raises a fascinating question. Do these architectural limitations actually matter in practice?

If Latent World Models are on one side of the Platonic ←→ Pragmatic Spectrum, VLAs are on the other.

To date, Pi has been able to engineer their way around architectural limitations to make increasingly capable robots. Their progress is not slowing down. It seems to be accelerating.

Theirs is a bet with historical precedent. The ideal technology — the solution that is technically superior — does not always win. This is the key takeaway from W. Brian Arthur’s 1989 paper, Competing Technologies, Increasing Returns, and Lock-In by Historical Events. Markets often converge on the technology that gets adopted first, because adoption creates increasing returns: better early product means more users and more capital, which means better data, more internal talent, and more developers, which means better products which means more users and capital, and so on.

This is also the point of Sara Hooker’s 2020 paper, The Hardware Lottery: “This essay introduces the term hardware lottery to describe when a research idea wins because it is suited to the available software and hardware and not because the idea is superior to alternative research directions.”

From the outside, it seems that Pi’s strategy is to ride the transformer architecture’s increasing returns and trying to generate its own, to create path dependence with VLAs before World Model-specific architectures gain traction, in an attempt to win its own hardware lottery.

They’re not the only company making this bet. Skild, the closest direct competitor, is building on VLAs. A number of robotics companies incorporate VLAs and VLMs in one way or another. And now, it looks as if the approach is spreading to the whole factory.

Recently, the WSJ reported that former OpenAI Chief Research Officer Bob McGrew is raising $70 million at a $700 million valuation for his new startup, Arda, in a round led by Founders Fund and Accel, with participation from Khosla and XYZ. Details are light, but the WSJ’s description sounds like it’s going to at least involve VLMs and VLAs in some way: “Arda is developing an AI and software platform, including a video model that can analyze footage from factory floors and use it to train robots to run factories autonomously.”

The more well-funded and talented companies that move in this direction, the more deeply grooved the path becomes.

Personally, I don’t think VLAs and World Models are really competing. They’re trying to reach acting in the physical world from different directions. VLAs are language first, while World Models are video actions-first. My guess is that they’ll converge and both be part of the solution.

Dreamer V4 - Latent World Model Agents

Latent World Model Agents are Agents trained inside of Latent World Models. The latent approach has a natural elegance for Agent training specifically.

Because a Latent World Model operates in compressed abstract space, the Agent’s planning and policy learning can happen very efficiently, with no pixel generation required. The agent basically practices by thinking, in the same way a chess grandmaster runs through variations in their head without moving the pieces, or the way a lucid dreamer trains inside the dream.

The canonical example is Dreamer, from Danijar Hafner, now at Google DeepMind. Dreamer’s insight is elegant: if you have a good enough Latent World Model, you don’t need to touch the real environment during training at all. The Agent imagines sequences of actions and their consequences entirely in latent space, receives a reward signal, and updates its policy, all without a single real-world interaction. When it finally goes into the real environment, it already knows what to do.

Dreamer has achieved remarkable results across a wide range of tasks, from games to continuous control to robotics, all from this purely imagined training. It is the research proof of concept that World Model training works, that an Agent can learn to act in the real world by dreaming. It seems as if Hafner is taking his research proof commercial. Earlier this month, The Information reported that he and Wilson Yan are raising $100 million to build a World Model company in this paradigm called Embo, which suggests they’re going after embodied systems.

The challenge, as with Latent World Models generally, is that the Agent’s learned behavior is only as good as the latent representation. If the World Model’s abstract encoding misses something causally important, like the exact texture of the floor that determines whether the robot slips or the precise angle of an object that determines whether it can be grasped, the Agent won’t know to care about it, because the model didn’t encode it. Garbage in, garbage out, but the garbage is invisible.

Moonlake’s hybrid approach, which we discussed earlier, is an attempt to thread this needle: attract humans with beautiful generative environments to collect action-labeled data, then discard the pixels and train the Agent in abstract space. Use the generative world to get the data. Use the latent world to do the learning. It’s an interesting bet that the two approaches are more complementary than competing, and it may prove correct.

Notably, we haven’t yet seen the JEPA Agents. JEPA is a World Model architecture, not an Agent architecture, but we expect AMI Labs will close this loop. AMI is still building its World Model, and the Agents that train inside it haven’t yet been publicly demonstrated, but we’re watching closely.

General Embodied Agents

SIMA2 - Generalist Embodied Agents from VLM Backbone

In November 2025, Google DeepMind released SIMA 2: An Agent That Plays, Reasons, and Learns with You in Virtual Worlds.

SIMA 2 combines a Gemini backbone with a World Model trained on 3D game environments, giving the Agent an understanding of language that allows it to receive and reason about goals, as well as the spatial-temporal understanding to execute on them. In this architecture, Gemini fills the role that we mentioned VLMs play in our system.

What makes this a different paradigm from VLAs is the direction of citizenship. In a VLA, language is first class, images are second class. Beyond the ordering of modalities, there is also the training data, mostly static images interleaved with text. In an Agent equipped with a World Model, video is first class, actions are introduced from the beginning, and the training data is directly aligned with the downstream behaviors we’re looking for. The Agent’s fundamental competence is spatial-temporal. If you tell it what it needs to do, it knows how to move through the world to do it.

SIMA 2 can play games on its own. It can learn, reason, and improve. The more it plays, the better it becomes, not just in the games it’s played, but in any game it plays. It is even able to play in whichever generated world it gets thrown into, even when it’s never seen it before. This, Google DeepMind believes, is a “step towards creating AI that can help with any task anywhere, including one day in the real world.”

Google DeepMind has put out an extensive amount of research. They have pushed World Models and embodied AI forward on multiple fronts. They coined the term “VLA.” They released Genie 3. They developed SIMA 2. The way they trained AlphaGo, letting the Agent play against itself over and over and over again, informed how World Models are trained to this day.

General Intuition - Generalist Agents From Actions & World Models

Similarly to Google DeepMind, we also believe that Generalist Agents will play a major role in how embodied systems operate to do useful things.

First, create the dream. Then, let Agents run around inside of it. Let them play and mess up and learn and win. Then, transfer those learnings into other dreams, and even into the real world.

Recall the Matrix. When Neo needed to learn Kung Fu, he plugged into a virtual dojo where he trained against Morpheus in a training environment superior to the “real world.” After that? “I know Kung Fu.” World Models are the virtual dojo. Neo is the Agent.

I Know Kung Fu

This is the question that Ha and Schmidhuber asked eight years ago: can Agents learn inside of their own dreams?

In a remarkably short amount of time, the space has come to an answer: yes.

Yes… If You Have Action-Labeled Data (Or Can Get It)

Today, I want to share a little bit more about our approach and the results we’re starting to see.

Every approach I’ve written about so far runs into the same wall, eventually: It needs better data. Video is abundant, but it lacks depth. It has no action labels. And without knowing what actions caused what we’re seeing, video data is like shadows, the shadows on Plato’s cave wall.

And Yann may be right that you can infer action, but anyone using inferred action has separate scaling laws to consider: inferring the actions themselves. Inferring actions takes compute, time, and attention away from doing the things you can do once you understand actions, and while inferred actions might look good on benchmarks, they struggle deeply on edge cases. Even well-inferred actions are approximations of what someone actually did: some things just aren’t visible in video, like moving the rudder on a plane landing from the cockpit.

Hint: if you don’t do it, you crash. That’s why ground truth is crucial.

You need to find a way to get action-labeled data. The closer to ground truth, the better. Luckily, we have a fantastic starting point, thanks to Medal.

Before there was General Intuition, there was Medal

Earlier, we talked about the importance of games in the development of AI. AlphaGo. Deep Blue. These are intentional uses of games in AI.

There is an even richer history of accidental links between games and AI, lucky breaks.

Nvidia is the example you likely know. Jensen founded Nvidia to make chips for real-time graphics in games in 1993. Six years later, in 1999, Nvidia released its first “Graphics Processing Unit” (GPUs), the GeForce 256.

A few years later, around 2005, researchers began experimenting with GPUs for neural nets. In 2007, Nvidia released CUDA, to make ML on GPUs practical. In 2009, three Stanford researchers — Rajat Raina, Anand Madhaven, and Andrew Ng — showed that GPUs could accelerate deep learning by 70-100x for unsupervised learning.

Three years later, in 2012, the AlexNet team7 decimated their ImageNet competition using GPUs. Within a year, everyone in deep learning had switched to GPUs. “Everyone in deep learning” was still a small community at the time, but by then, Bitcoin miners had already been using GPUs. They were 50-100x more efficient for Bitcoin’s SHA-256 hashing than CPUs.

They soon switched to ASICs, but in 2015, Vitalik Buterin and his team released Ethereum, whose memory-heavy workloads were harder to optimize with ASICs. Ethereum mining ran on GPUs from 2015, through the GPU shortage it caused during the 2020-2022 crypto boom, right up until Ethereum switched from Proof-of-Work to Proof-of-Stake and left a GPU glut in its wake. Crypto tanked anyway, and in the same month crypto peaked, Nvidia’s stock did, too, tumbling 66% over the next year, until OpenAI released ChatGPT and, since then, Nvidia’s market cap has grown 10x to the $4.4 trillion behemoth we know today.

Google Finance as of 3/9/2026

I mean, who could have predicted all of that?

When I taught myself to reverse engineer things, and learned how to code to build a private Runescape server when I was 13, I couldn’t have predicted that it would lead me to where I am today, either. Reverse engineering is the ultimate form of deductive reasoning, and spending a lot of time doing it as a kid is very good for your brain. This then lends itself well to figuring out complex systems in a rapidly changing world.

Runescape developers took the wilderness and free trade out of the game. I wanted to put it back, so I learned how to reverse engineer. The business that grew out of that did well for a teenager — we were making ~$1.5 million per year by the time I had to shut it down in 2015 at age 18, when I became an adult and would have been liable for the stuff I built. But I’d made enough money for my age that I could do whatever I was passionate about. I joined Doctors Without Borders (MSF) at 19 years old, and stayed for three years to work on Ebola & Humanitarian Mapping. I spent some time at Google Crisis Response before the gaming itch got me again.

At the time, we worked in London, very close to the DeepMind team. It was 2014 and I did not think it was that interesting or that likely to work. Demis deserves so much credit and respect for his vision. Few understand how hard it was for them to get here.

Working on Ebola with MSF at 19. We almost got the Google London office evacuated that day, and had to label the PPEs with “fake test” from thereon out

In 2018, I teamed up with my previous colleagues from building RuneScape servers. We built a game called Get Wrecked, which got a lot of signups. But it lost players quickly because we didn’t have enough player liquidity; it was a competitive game, and we needed to have enough people of all skill levels that people would always be able to find someone at their level to play against, which is very hard to bootstrap. To fix that problem, we built a way for people to watch game clips on the platform. A couple of times a day, we’d send a push notification that the game was live to get enough players playing at once.

The clips platform, Medal, went viral on the Rocket League subreddit. It was getting so many downloads that it became almost immediately clear that that was the bigger opportunity. We decided to focus on Medal.

We never ended up releasing the game. Medal just kept growing. Today, players from around the world upload 1B+ gaming clips to Medal each year.

We couldn’t have planned a better dataset with which to build World Models and policies.

Medal’s upload volume puts it on par with YouTube. Gamers upload millions of clips per day, across tens of thousands of environments, already hand-selected by players for highlights and adverse events. In other words, they share the content they think is worth sharing: their best performance, wildest encounters, closest calls.

Medal data has something that YouTube data does not. It comes enriched with metadata from our social network (views, likes, comments) and most importantly, in-game actions. We only record the game actions on the local machine, only storing the in-game action names (e.g. Move Forward) and never the keys that were pressed to achieve that action. More than data, this has enabled us to ship Medal’s most requested feature, keyboard and controller overlays. These overlays let our gamers showcase the precise actions they took behind every amazing moment.

Each clip has exactly what the player saw, next to the exact player actions that followed, using many of the same systems we use to control robots today. Frames from games also have the benefit of being information-complete. Unlike real world video, where you have to account for pose estimation (estimating what the human sees, which in itself is a lossy process) — you may see things the camera does not in the real world — but not in games. What is recorded and what you see is always identical, which we think makes it better training material.

This gives us trillions of examples of players running the loop of observe, predict, and act. This is the foundation of intelligence, and there is no loss of information throughout.

On Data

To understand what we’re doing, you need to grok the difference between game data and synthetic data.

The confusion is that people associate “digital” with “synthetic,” but the real distinction isn’t the environment in which the data was generated, but the data itself.

There can be synthetic (i.e., generated) data created in the physical world, like in the human-constructed environments Boston Dynamics and other robotics companies train their robots in, just as there can be human ground truth data in the digital world. Data breaks down into a quadrant like this.

What makes our gaming data “ground truth human data in a digital environment” is that what we are capturing is real human responses that observe → predict → act loop.

The closest comparison to our approach is GitHub data. It captured the history of human engineers’ coding and was used to train machines that can code better than humans. The question is whether that same idea works outside of the computer. We believe (and are seeing signs) that learning from gaming data transfers to the physical world.

Games turn out to be the perfect training ground for learning intelligence. They contain thousands of simulated worlds with physics, strategy, cooperation, text, interface use, competition, and long-horizon planning. They are complex enough to require intuition, but structured enough to learn from at a massive scale.

Physical-world data alone cannot reach the diversity or scale required to learn general intelligence. LLMs lack data on dynamics and atoms. But games act as the ideal intermediary: a bridge between the digital world of bits and the physical world of atoms.

Still, there is a threat to the ground truth stance. As mentioned earlier regarding Yann LeCun’s take, every video is action-labeled data if you’re good enough at inferring action. While this may be true in the long run, it’s also probably extremely impractical today. That’s why you gotta love Yann — nobody else would think to do it this way. Yann and I talked about this dilemma in Paris in December if you want to go deeper into the weeds.

Everything is a trade-off, right?

The optimal path forward is likely somewhere between where VLAs are today — the most practical but least elegant solution — and where AMI might be one day if everything goes well. It all comes down to your approach to data.

Data is the problem for any company that wants to solve embodied AI. Evan and Packy wrote about it in Many Small Steps for Robots, and it is the thing we are focused on at GI.

We believe our dataset is the most elegant answer to the data problem for general models. It is one that paves a path towards a general intelligence that feels familiar, the same way Tesla FSD feels like a familiar driver, but scales far beyond games or driving.

For general models, models that can power embodied AI intuitively and spontaneously in almost any imaginable real-world situation, the question is not simply how much data you can get.

Before throwing data at the problem, you need to understand your transfer curves.

Small Steps, Giant Leaps, and Transfer Curves

In their Robot essay, Packy and Evan wrote that there are two approaches to building economically viable embodied AI: Small Step or Giant Leap.

Evan and his company Standard Bots are pursuing the small step approach: getting paid to learn in the field, one use case at a time. They are collecting real-world data for a growing number of economically viable use cases across a wide variety of domains.

Their strategy is a fascinating one. By getting customers across many different industries and tasks to pay them to deploy, they collect diverse real-world data across wide distributions. Instead of hoping that more data in a narrow domain will generalize to tasks outside of the distribution, their goal is to put a ton of useful tasks in-distribution by going broad in the real world, not deep in one niche.

General Intuition and Standard Bots are coming at the same problem from opposite sides of the spectrum. General Intuition is attempting to solve generalization from the digital side: our bet is that gaming data will lead to broad priors about physics and actions. Standard Bots is attempting to solve generalization from the physical side: their bet is that real-world deployments will lead to broad priors about manipulation and industrial tasks.

These are complementary approaches to the data diversity problem. There’s potential for a GI World Model to be the starting point for Standard Bots’ post-training. We provide a base model trained on observed data in digital environments, which is scalable and economical to collect, and they post-train using the use case-specific data they get paid to collect to bring those use cases in-distribution and get to many 9s of reliability more quickly.

The approach that we find more challenging is the one that general models seem to be taking, which is collecting a ton of data and hoping it generalizes to out-of-distribution tasks. General models need too much data across too many situations to collect it all by paying people to demonstrate tasks.

Additionally, more data in the same domain doesn’t automatically teach a model to handle situations it’s never seen before. At pre-training, not all data is created equal, and I have not met someone building a general model for robotics who has been able to point me to the scaling laws that show that they can solve out-of-distribution use cases (things they weren’t trained on) simply by adding more data. More data in a narrow domain does not automatically buy you generalization to a new one.

The scaling laws don’t exist.

There are, as best we can tell, three distinct transfer curves that govern whether a World Model generalizes to new physical environments. They are not well understood — we are just starting to understand them. We can, however, give them names: input modality transfer, sensor transfer, and environment transfer.

The first is input modality transfer: How well does a policy generalize across degrees of freedom in the physical system it’s controlling? This curve is steep for a humanoid robot with somewhere between twenty and sixty degrees of freedom. Each is continuous and often mechanically dependent on the others, and this curve is steep. A finger movement is not independent of the arm. Training on a game controller and expecting it to transfer cleanly to a twenty-DOF humanoid hand is, in research terms, a bet without scaling laws to back it up.

The second is sensor transfer: If the workload requires specialized physical sensors (tactile feedback, proprioception, depth), there is a separate scaling law for how much of that sensor-specific data you need before the model can reliably reason about it. Tesla explicitly worked on this problem. They spent years figuring out exactly how much LiDAR data they needed before they could entirely drop the chips. Most robotics companies are working on it implicitly, hoping the answer reveals itself in deployment.

The third is environment transfer: How does performance degrade as the environment gets more complex, more stochastic, or more populated? Predicting the right action in a sports stadium with a thousand people around you is a fundamentally harder problem than predicting it on an empty field.

As we explained earlier, complexity doesn’t scale linearly.

These three curves interact. Until you can map them, you can’t know how much data of which type you actually need, which means you cannot justify the capital expense of going to collect it at scale. Companies that are collecting a hundred thousand hours of physical data today may find that a good World Model only needed ten thousand, or that they did need a hundred thousand, but ninety thousand of the hours they got were in the wrong distribution entirely.

Our bet, certainly conditioned on our starting position, is to collapse the problem.

By focusing on game controller inputs, we reduce input modality transfer to a curve we have already solved. We know we have enough data for game controllers, because we have billions of clips of humans using them. That eliminates one unknown. By focusing on vision-based inputs rather than specialized sensors, we eliminate the second.

Almost every physical system comes with a game-controller-like input modality, which includes steering wheels, keyboard and mouse and actual game controllers. Most are straightforward. Even humanoids ship with them. The challenge is just that if the degrees of freedom exceed what the controller can do, transfer is worse. So humanoids are further down our roadmap, but we see no physical limit to suggest that we couldn’t build around the interface limit.

In short: If you can control almost any physical system with a game controller, and we have more data than anyone else in the world of what happens when a player uses a controller to take action, our Agents should be able to control almost any physical system.

The only remaining question is about environment transfer: can Agents trained inside of the dream operate in reality?

The Superhuman Future of the World (Models)

It has been a wild few weeks at General Intuition’s offices in New York and Geneva. Everything that we’ve written about here has been working better than we expected. Like others, we’re gaining conviction that Agents trained inside of the dream can operate within reality.

Why do World Models transfer?

The observe-predict-act loop is an abstraction for how causally structured systems work in general. Once a world model has seen N variations of the world via a diverse set of games, it only takes a small amount of fine-tuning to understand the dynamics of the N+1 variant that corresponds to the real world.

World Models learn to model the cause-and-effect of reality. If this cause-and-effect is understood at a fundamental enough level, this should enable World Models to generalize to new scenarios.

What might that mean? What are the implications of World Models that generalize?

Our goal is to enable embodied AI to understand the world, with our models controlling machines in any environment, including the real world. We aim to deliver a breakthrough moment for robotics, where out of nowhere, the progress is obvious and the models are easy to use.

That breakthrough won’t look like the breakthroughs in LLMs, which went mainstream when they started talking to us like humans. We don’t want machines that simply do what humans do. In fact, the point of machines is to do things that humans can’t, to give us superpowers.

Robots don’t need to look like us to work for us. Humanoids as a form factor were largely chosen due to the assumption that they had the most data to learn from on the internet, because so many of humanity’s videos feature humans. If you don’t need those videos, if you can learn directly from the actions in video games across embodiments, and need a lot less to transfer to reality, that assumption doesn’t hold. We believe the future of robotics should be shaped by simpler, cheaper systems: machines with only the degrees of freedom that match the actual jobs to be done.

The human body is an incredible general-purpose platform, but it’s rarely the optimal (or most cost-effective) form for any specific task. Instead of copying our anatomy, we should mirror the interfaces we already use instinctively: joysticks, wheels, gamepads, and keyboards. These tools are the product of decades of iteration, compressing human intent into a clean, universal action space, much like language does for thought. Robots can learn from the actions transmitted through these interfaces and specialize around them in a very general way, making broad deployment far more practical than chasing full human embodiment.

If you get rid of the assumption that our machines don’t need to, and probably shouldn’t, mimic us or take our place in any way, a whole world of possibilities opens up.

At General Intuition, we’re actively working on simulations that will eventually allow our systems to go beyond everything that’s currently described in pixels, to everything governed by cause and effect. The methods we use are very general. This is a long way out, but a necessary step.

To really understand our world, it turns out, poetically, we may need World Models; compute for the uncomputable.

The implications of all of this are cosmic. If we can model three-dimensionality, physics, and time, and their interaction, then the ability to manipulate these arenas at superhuman macro and micro scales is on the horizon.

There is a tremendous amount of work ahead. Today, nobody is capable of simulating a biological cell, let alone an ecosystem made up of 1030 of them. However, what captivates me is that we don’t need to map all of reality’s details. We just need to observe how those details manifest in actions, and use those actions to predict what comes next, over and over again.

There is also a tremendous amount of responsibility that comes with building these models, and it’s something I take very seriously and personally.

I’m from the generation most at risk of AI displacement; half my childhood friends can’t find jobs. I’m spending a lot of time exploring how we bring our community and my generation along in this shift.

For example, like Tesla, Medal sits on over $10 billion of global hardware infrastructure — GPUs, CPUs, plugged into power, with cooling — powered by over 15 million users. We’re actively exploring ways to let our community share in what’s coming, for example by generating income serving inference from their GPUs, or tele-operating from their gaming rigs. If the demand for general intelligence is anywhere close to what we think, that could be the largest economic tailwind my generation has ever seen.

These are just my dreams for now. But one day, they won’t be. One day, we leave the boring problems to superintelligence, so we can explore the stars or the deep sea from our gaming rigs, and dream up the next uniquely human, most interesting, not boring things to do.


If our work interests you, I’m recruiting to work on simulators to generate new environments, and we continue to be excited to talk to researchers and engineers who are in the top of their field and want to join General Intuition.

A special thanks to Eloi Alonso, Adam Jelley, Vincent Micheli, and Paula Wehmeyer, my co-founders and colleagues who spent many hours discussing the ideas behind this article. - Pim


Big thanks to Pim, Paula, Adam, Eloi, Vincent, Kent, and the whole General Intuition team for sharing their knowledge, and to Badal for the cover art. - Packy


That’s all for today.

For not boring world members, I played around with Claude to produce some extra goodies. We made a World Model Research Archive, with links to more than 30 key papers that have defined the space. Members can also ask Pim questions in the subscriber chat today and tomorrow.

Join us in not boring world for all of this and more by subscribing below

Thanks for reading,

Packy

Subscribe now

not boring world members can download the World Model Research Archive here.

1

Unless, of course, we made it wildly unrealistic; forcing everyone to jump once a goal was scored, regardless of which team scored. Or rendering the audience as a 2d flat image of an oval (looking at you, Rocket League).

2

This idea was behind his book (which became a Matt Damon movie) The Adjustment Bureau.

3

Fun Fact: the team working on the Agent inside of Google DeepMind’s World Model, Genie, is named Inception.

4

World Models is one of the best-presented papers in ML history. It has an interactive web version — you should try it out. Its concepts feel both philosophical and technical. There are also fun, retrospective Easter eggs, like the fact Ha and Schmidhuber could only swap between real environments and the dream because of Gym, a library of benchmarks and API built by a young non-profit called OpenAI.

5

The word Aleatoric even stems from the Latin word for dice, alea.

6

It was superhuman on average, although it underperformed humans and some other models in specific games. It had a particularly hard time with Video Pinball.

7

The ML community was so small back then, or Geoffrey Hinton was so prominent within it, or both that Hinton was both the first researcher mentioned in the 2009 Stanford GPU for Deep Learning paper and one of the three on the team that popularized GPUs in Deep Learning.

Weekly Dose of Optimism #184

2026-03-13 20:54:26

Hi friends 👋,

Happy Friday! Here in New York City, it was the sunniest of times, it was the snowiest of times, but most of all, it was the incredibly hard to keep up with all of the incredible stuff getting done and funded of times.

Big week in robotics, solar, brains, world models, manufacturing, flight, AI, and even affordable luxury.

Let’s get to it.


Today’s Weekly Dose is brought to you by… Ramp Sheets

A few weeks ago, I wrote that I wrote that “a Ramp engineer with an AI will build something better than a CFO with an AI.” Ramp Sheets is exactly what I’m talking about.

See, a surprising amount of my high-leverage work ends up in a spreadsheet.

A deck comes in. A CSV export lands in my inbox. I want to benchmark a market, pressure-test a company, or run an analysis for a Deep. Dive. Usually, that means a lot of manual cleanup and model-building before I get to the actual insight.

Ramp Sheets skips a lot of that.

It’s an AI-powered spreadsheet from Ramp that lets you upload a PDF, pitch deck, CSV, or Excel file and ask it to do the work: build a model, research competitors, clean messy data, and help you get to an answer faster. For founders, investors, and operators, that means less spreadsheet mechanics and more actual thinking.

Best part: anyone can use Ramp Sheets today for free. You don’t need to be a Ramp customer to try it (but why wouldn’t you be?).

Try Ramp Sheets Free


(1) Swift Solar Acquires Meyer Burger to Build Gigawatt-Scale Solar Factory

for

“If your technology is so good, why aren’t you using it to compete?”

In Power in the Age of Intelligence, I wrote that that’s a question I’d like to see more investors ask. If the technology you’re building with is so much better, why in the world are you just selling it to someone else as a component? Why not just make a better end product?

Joel Jean at Swift Solar said, “Ok bet.”

I met Joel a few years back on an introduction from his brother Neal, the co-founder and CEO of Beacons. Joel, like Neal, is a genius, with a PhD in electrical engineering from MIT, and he was putting that genius to work by making silicon-perovskite tandem solar cells work.

The promise of perovskite is that they’re much more efficient than solar panels, meaning they can convert more incoming sunlight into electricity. Current commercial silicon panels are 22% efficient, with a lab record of 26.8% and a theoretical limit of 29%. Swift Solar’s panels are already at 28% efficiency. China’s Longi, the world’s largest manufacturer of monocrystalline silicon wafers, recently hit 34.8% with a silicon-perovskite tandem solar cell, and the theoretical limit, for which Swift is gunning, is 45%.

More efficiency means more watts per panel, which means more watts per acre and a lower balance-of-system cost; 30–40% efficiency modules could cut solar system costs 20–40% while doubling power density.

This all sounds great, but the problem is that tandems tend to degrade quickly outside, where the sun is. Swift thinks that they’ve cracked that, and that China hasn’t yet, which gives the west an opportunity (certainly not guaranteed, probably not even likely, but an opportunity!) to leapfrog back over China in solar production.

So if you have the magic, degradation-resistant tandem cells, what do you do? Use it to compete.

Swift announced that it acquired the core assets of Europe’s leading solar manufacturer, the German company Meyer Burger, to “build the next generation vertically integrated US solar manufacturer.” Meyer Burger’s former CEO joined Swift to lead the manufacturing effort.

Solar panel manufacturing is one of the most cutthroat, competitive industries in the world. China keeps driving costs lower and lower and lower. It’s been all about scale. Meyer Burger itself went bankrupt after having sold solar cell manufacturing equipment to China, only for China to reverse engineer and mimic the technology.

But I love it. Pressing a technological advantage into vertical integration and competing directly is the kind of thing that gets my heart racing. And this echoes a deal that hits close to home. A few weeks into Puja’s time at Harry’s, she told me the 10-month-old startup was acquiring one of the world’s oldest and best blade manufacturers, Germany’s Feintechnik. That has been an incredibly successful partnership.

There’s something about a the US innovation / German manufacturing tandem.

(2) Reflect Orbital Built a Space Mirror

Ben Nowack for Reflect Orbital

Imagine you are a photon that does not want to get turned into usable electricity on earth. It’s been a really bad week for you.

First, Swift’s tandem cells. Before, there was a 78% chance you’d escape once you hit the panel, but that looks like it’s going to fall. But at least you have nighttime, right?

At night, you can just fly off into space at the speed of… well, you know. Run away from Earth as fast as is physically possible.

Not so fast, says Reflect Orbital. CEO Ben Nowack just announced that the company had made a solar mirror, one of four huge guys that will go up on each satellite, stop the photons’ getaway, and redirect them down to Earth.

Sometimes, they’ll be used to light up a concert or a remote worksite in the evening (imagine how much less creepy True Detective: Night Country would have been if Reflect existed then), sometimes, they’ll shine on crops to help them grow faster, and mostly, they’ll bounce into solar panels that capture them and turn them into electricity.

Efficiency is one of solar’s challenges, but a bigger one is one that seemed almot insurmountable: the sun only really shines strongly enough for max solar output for 4-6 hours a day. What if it could shine on solar farms - out in the desert, away from homes (where it wouldn’t imapact our sleep, Huberman) - round the clock?

Solar farm economics would get ridiculous, solar would get much cheaper, we’d get a lot more of it, and we’d be able to power all sorts of new, better electric things.

Mostly, though, I’m sharing because that video is a bright spot in a sea of sameness.

(3) The First Multi-Behavior Brain Upload

Dr. Alex Wissner-Gross for Eon Systems

As big a week as it was for solar, it may have been an even bigger week for brains.

A couple of weeks ago, an Australian company Cortical Labs grew brain cells on a microchip and taught it to do the first thing hackers try to do on any new computer, from TI-83s to ATMs to pregnancy tests to vape screens: play Doom. And yup, it could play Doom.

On Tuesday, Dr. Alex Wissner-Gross shared a video made by a company he works with called Eon Systems, that one above, that shows a virtual fruit fly being controlled by a brain with 140,000 neurons and 50 million synaptic connections running on a computer, mapped from a real fly’s actual connectome.

The Eon team took the adult Drosophila brain connectome, the complete wiring diagram of a real fly brain, painstakingly mapped by the broader neuroscience community, combined it with a physics-simulated fly body built from an X-ray scan of an actual fruit fly, and got the two to talk to each other. Then sensory information from the virtual world enters the brain model, the brain’s neurons issue commands, and the body moves. It does all of the things a fruit fly would do in response to stimuli. It navigates toward food by “taste.” It grooms itself when virtual dust activates its “antennal mechanosensory circuits.” It “eats.”

The team is careful not to oversell the accomplishment, and Dr. Doris Tsao at Astera Neuro has a good list of thoughts on connectonomics and uploads here which basically says, “This is awesome, but we’ll get further by understanding instead of blindly mapping, and anyway, it’s going to be really expensive to map bigger and more complex brains.”

But still… this is sci-fi stuff. I don’t even know if it’s good. But it’s sci-fi!

Meanwhile, in regular old science, two very cool brain papers:

  1. Isotonic and minimally invasive optical clearing media for live cell imaging ex vivo and in vivo. A team of Japanese scientists basically figured out a way to “make brains see-through without killing them.” Kording Lab explained: “The mechanism of this is so obvious and simple and yet brilliant. Match the refractive index inside and outside of cells - and tissue becomes transparent. Because the scaling of scattering with object size, small things don't matter much. So cool!”

  2. Scientists revive activity in frozen mouse brains for the first time. A team in Germany made progress on cryopreserving and thawing mouse brains that leaves some of the processes necessary for brain functioning like neuronal firing, cell metabolism, and brain plasticity intact. A researcher at another lab cautioned that we’re still a long way away from freezing ourselves and waking up in the future, but he also said, “This kind of progress is what gradually turns science fiction into scientific possibility.”

OK, so maybe all of this is just very sci-fi.

(4) Emergent Quantization from a dynamic vacuum, or Zero Point Energy

Harold “Sonny” White et al, published in APS Physical Review Research, via Andrew Côté

Maybe my favorite rabbit hole I’ve fallen down over the past few years is the Zero-Point Energy Rabbit Hole, which I discovered via .

The idea is that at absolute zero, classical physics says everything should stop moving, but the Heisenberg uncertainty principle prevents any particle from simultaneously having a precise position and zero momentum, so it has to keep jiggling. That constant jitter is ZPE.

The vacuum, which we think of as empty, is a seething foam of energy, with virtual particles constantly popping into and out of existence. This sounds like woo, but it’s been experimentally confirmed via the Casimir effect: two uncharged metal plates placed extremely close together in a vacuum attract each other, because ZPE fluctuations are slightly suppressed between the plates compared to outside, creating a net inward pressure. This has been measured in real labs like Los Alamos.

Quantum theory predicts, and experiments verify, that empty space contains an enormous amount of ZPE. The amount of energy supposedly packed into the vacuum would constitute a seemingly ubiquitous energy supply, a “Holy Grail” energy source, which is really just about as close to a “Holy Grail” as we can get. You can do a lot with unlimited energy.

Stuff gets wilder from there in a chain of logic that leads from ZPE to mass, inertia, and gravity as engineerable properties of the vacuum to the ability to actually engineer the vacuum to create phenomena like antigravity, which is what some believe Thomas Townsend Brown did.

Anyway, a team led by Dr. Harold “Sonny” White, a physicist and aerospace engineer with experience at Boeing, Lockheed Martin, and NASA who is now the CEO of ZPE company Casimir, showed that “Instead of random noise, the quantum vacuum can have structure, meaning waves, dispersion, resonances,” per Andrew Côté, who added that “their analytical model reproduces all the 'larger scale' predictions and measurements of quantum mechanics exactly.”

Andrew said that while the difference sounds subtle, “it means the vacuum can have local structure, that it is in effect a dynamic medium,” which means that, potentially, we can pull energy out of it. That is what Casimir is trying to do.

Now look, Casey Handmer replied betting $1 million that it wouldn’t work, and who am I to bet against Casey Handmer.

But 1) there are a lot of smart people on the other side of that bet, like Andrew, Sonny, Hal Puthoff, etc… and 2) even if there’s a 1% chance there’s something there, it’s hard to think of anything bigger.

(5) Replit Launches Agent 4

Amjad Masad for Replit

Look, I’m biased. I’m an investor in Replit. I wrote about them in 2021 and predicted they’d be a $100 billion company. This week, they raised at a $9 billion valuation, so we’re getting there.

More importantly, though, they launched their new product, Agent 4. While I love Replit, and think Amjad is one of the best CEOs out there, I’ve been a little worried about vibe coding as a category. It’s so competitive, and won’t the big labs just do it? Won’t Claude Code kill everyone else?

Maybe. Certainly a possibility. But you should watch this video. By focusing maniacally on this one software creation product, the same thing they’ve been focused on making easy, from the infrastructure up, for over a decade, before AI was even a thing, they’ve built what I think is a truly differentiated product, and one that feels like one of the first good AI-native products I’ve seen.

You can collaborate in teams, set multiple agents to work at once, workshop designs, give feedback on specific ones, tweak them, iterate until you get something good, even doodle to show the AI what you want, and when you’ve found a design you like, you can keep it consistent across website, mobile app, presentations, docs, and more. It feels like a workspace for teams, where some of the teammates, the ones doing most of the grunt work at the humans’ direction, happen to be agents.

I’m going to be playing with it this weekend. If you have some time, build something.

EXTRA DOSE: Enormous Week of Fundraising + Nietzsche as Mystic for not boring world subscribers (cmon, join us)

Read more

Weekly Dose of Optimism #183

2026-03-06 22:02:45

Hi friends 👋,

Happy Friday and welcome back to our 183rd Weekly Dose of Optimism.

I was at a16z’s American Dynamism event in DC this week, where I got to meet a bunch of people who read not boring, including one who told me she never opens the Weekly Dose because the title doesn’t tell her anything.

If you’re reading this: thanks for the feedback, I guess it worked. I’m running a title test; I’ll check open rates this week and decide what to do going forward.

Those of you who opened are in for a treat. What a week for the optimists.

Let’s get to it.


Today’s Weekly Dose is brought to you by… Guru

Guru is the best way to make sure that your employees get accurate company knowledge from whatever AI you use.

Companies like Spotify and Brex use Guru because it’s the only AI verification system that automatically validates company knowledge before your AI agents use it, like quality control for your AI’s brain.

Like them, your company’s systems weren’t built to power automated systems. Your people send Slack messages, write docs, and even quasi-maintain wikis for other people to consume. Guru translates all of it to speak AI so that AI doesn’t give your people incorrect answers ~40% of the time.

Put your messy company data to clean use in a couple of clicks. Get a Guru to do it for you.

Try Guru Today


(1) a16z American Dynamism Summit

On Tuesday, I Joe Bidened myself1 down to Washington, DC for a16z’s American Dynamism Summit.

Complete with its very own Jason Carman hype video (see above), it was like a living, breathing version of the Weekly Dose of Optimism, and full of the people behind the stories we cover here each week.

Drew Baglino, whose Heron Power led off the Dose last week, was there, as was Noah Smith, whose Decade of the Battery I linked in that story. Cameron McCord, whose fundraise for Nominal you’ll read about today, was there, too. I spent time with probably a dozen people whose work has appeared in the Dose, and got to watch others on stage, like NASA Administrator Jared Isaacman.

My job is to spend time with these people, write about them, and invest in some of their companies, but there was something about seeing so many of them in one place that brought home the fact that a small group of sufficiently talented and motivated people can change how the future plays out.

Power generation, transmission, and storage, motors, power electronics, autonomous boats, supersonic planes, educational choice, modern manufacturing (terrestrial and in space), new weapons systems, cheap and secure communication everywhere, autonomous everything, safer neighborhoods, new cities altogether. If the people in that room succeed, America’s next 250 years will be better than its first 250.

(2) Physical Intelligence Models Get Memory So Robots Can Do Complex Tasks

Last week, I was talking to a Physical Intelligence investor who told me the progress the company had made in just the first couple of months of 2026 has blown his mind. Now, we know why.

Pi gave its models a new memory system that helps both short-term visual memory, to recall what it did recently with fine-grained details, and long-term semantic memory, which lets it remember what it did for up to 15 minutes.

With better memory, robots using Pi’s model can do long, multi-step things, like clean up an entire kitchen, set up the ingredients for a recipe, and grill a grilled cheese sandwich. It also learns from its past mistakes.

Yesterday, I wrote that LLMs are producing a lot of dead text that we’re probably better without. Robotics is way more interesting. I would love for the robots to spend billions of tokens doing my laundry and making me healthy gourmet meals while I kick back and read human-written classics. This is one more small step in that direction.

For a full primer on robotics, check out my cossay with Evan Beard:

(3) Arc Institute Releases Evo 2, Fully Open-Source Biological Foundation Model

A year ago, the Arc Institute released Evo 2, the largest fully open biological AI model ever built, as a preprint. This week it was published in Nature, and Arc dropped a one-year recap showing what the thing has actually done in the wild.

Trained on 9.3 trillion nucleotides from over 128,000 whole genomes, Evo 2 can read and write across all three domains of life: bacteria, archaea, and complex eukaryotes including us. The architecture, called StripedHyena 2, trained on 30 times more data than its predecessor Evo 1, and can reason over 8 times as many nucleotides at a time. Greg Brockman, the OpenAI co-founder and ex-Stripe CTO (Stripe co-founder Patrick Collison is a founder of Arc), helped build it during a sabbatical.

Evo 2 generalizes “across biological prediction and design tasks, across all modalities of the central dogma, across molecular to genome scale, and across all domains of life.”

It can predict which genetic variants cause disease and which don’t, without being specifically trained on that task. In tests on the breast cancer gene BRCA1, Evo 2 achieved greater than 90% accuracy in predicting which mutations are benign versus disease-causing.

It can generate new genomes. Entire genomes. The team demonstrated the first AI-designed and experimentally validated bacteriophage: 16 of 285 tested designs successfully propagated and inhibited growth of the target bacteria, with no impact on unrelated strains. For those who, like me, don’t fully speak the language of biology, that’s a whole functional new organism.

And in a stunt that’s both scientifically real and deeply fun: the team guided Evo 2 to design DNA sequences with controllable chromatin accessibility patterns, then wrote Morse code messages—”EVO2,” “LO,” and “ARC”—into the epigenome of mouse embryonic stem cells, experimentally validated with ATAC-seq.

The whole thing is fully open: model weights, training code, inference code, and the OpenGenome2 dataset. So more scientists will be able to experiment with a model that understands the grammar of life as the cost of reading and writing biology keeps falling. The design space for medicine, agriculture, and materials is going to keep opening up in ways that are genuinely hard to imagine.

(4) Injectable “satellite livers” could offer an alternative to liver transplantation

Sangeeta Bhatia Lab / MIT / Cell Biomaterials

More than 10,000 Americans are on a waitlist for a liver transplant. There aren’t enough donated organs. And many patients with liver failure aren’t even eligible for the surgery because they’re too sick to survive it. For those people, the best current answer is: sorry.

MIT engineer Sangeeta Bhatia’s answer is: what if we just injected a second liver?

Her lab has developed “satellite livers,” tiny engineered tissue grafts made of hepatocytes (the liver's workhorse cells) mixed with hydrogel microspheres and supportive fibroblasts, all deliverable through a syringe. Once injected into fatty tissue in the abdomen, the spheres act like a liquid going in and then solidify, giving the hepatocytes a scaffold to cluster around. Blood vessels grow in and the mini liver takes root. In mouse studies, the grafts remained viable and producing the right proteins for at least eight weeks (the full length of the study).

The insight that makes this work is a clever one: the hydrogel microspheres create an “engineered niche” for the cells. Without them, injected hepatocytes disperse and fail to integrate. With them, the cells localize, connect to the host circulation faster, and behave the way liver cells are supposed to.

The vision is a twofer. Short-term, these grafts could act as a bridge to transplantation, buying time for patients waiting on the donor list. Long-term, they could be the treatment: repeatedly administered, minimally invasive, titrated to how much liver function a patient needs. It could replace surgery altogether with routine injections.

The liver does about 500 essential things for the human body. The transplant chain is broken. People die. The fact that we might soon be able to restore meaningful liver function with a syringe, a handful of cells, and some hydrogel spheres is miraculous.

(5) Ginkgo Bioworks Launches Ginkgo Cloud Lab

In early February, we shared Ginkgo’s collaboration with OpenAI in which GPT-5 integrated with Ginkgo’s autonomous lab and autonomously designed, executed, and analyzed over 36,000 cell-free protein synthesis experiments. GPT-5 reduced costs by 40% compared to benchmarks (from $698 to $422 per gram). Good robot.

Now, Ginkgo is making its Cloud Labs available to anyone in a push to move all of its R&D services away from traditional benches and to these futuristic ones. Building out a lab can cost a bunch of money, Ginkgo CEO Jason Kelly tweeted that it can cost $10 million, and Ginkgo’s strategy has long been to offer lab services to startups, academic labs, and pharma companies as a service.

We hope that Cloud Labs will someday allow anyone to be a scientist with their own lab just like personal computers and cloud data centers democratized programming and the web.

With this move, presumably the service gets better as Ginkgo’s margins do, too. The starting price for Ginkgo’s first three protocols, two for cell-free protein expression and one for bacterial pixel art, is just $39.

Awesome news for the democratization of science, although I can’t imagine this is going to help my friend sleep any better at night.

Noahpinion
Updated thoughts on AI risk
So the other day I wrote a post about how humanity is inevitably going to be disempowered by the existence of AI…
Read more

EXTRA DOSE: Ulkar is out for the week, so in our Extra Dose, I’m rounding up a big week in fundraising news for software for hardware and sight restoration, plus recommending my favorite sci-fi series of all time.

Read more