MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

Collective Agents and Where to Find Them

2026-02-13 18:51:10

Published on February 13, 2026 10:51 AM GMT

Or: Todd Has a Presentation in London on Thursday and Three Academics (Some of Them Dead), Won't Stop Arguing About Root Fungi

(The story follows the one in Seeing Like A State but applies a systemic perspective on AI Safety)

Epistemic Status: Written with my Simulator Worlds framing. E.g I ran this simulated scenario with claude in order to generate good cognitive basins, I then orchestrated it to play out a simulated scene with my instructions (with some changes for better comedic effect). This post is Internally Verified (e.g I think most of the claims are correct with 70-85% certainty).


The headset smells like someone else's face.

"Just put it on, Todd."

"Sandra, it truly—"

"I know. Put it on. You're presenting to the Science and Technology Select Committee (UK) on Thursday about systemic risks from frontier AI and you currently think systemic risk means 'a risk that is big.'"

"That is absolutely not—"

"You said that. In the pre-brief. I wrote it down. I'm going to have it framed."

Sandra has worked at the Department for Science, Innovation and Technology for twenty-three years. She once corrected a visiting researcher from the Santa Fe Institute on his own citation and he sent her flowers. She has opinions about management cybernetics that she shares with nobody because nobody asks. She is paid less than the office coffee budget.

Todd was a postman in Swindon until eighteen months ago. His mate Dave got him the job.

"I've got forty-seven documents to fill in for the committee. Forty-seven. They've got boxes. I understand boxes. I'm good at boxes."

"The boxes are wrong."

"The boxes are government-mandated"

"Still wrong. Headset. Now."

Introduction

He's in a forest.

It takes a moment. The conference room doesn't so much disappear as get gently shouldered aside by something much older. And then Todd is standing on soft ground, in cold air, surrounded by trees.

Except — and it takes him another moment to understand why it feels wrong — the trees are in rows. Perfect rows. Identical trees, identical spacing, stretching in every direction until the geometry gets bored and fades into mist. Norway spruce. He knows this because a small label is floating beside the nearest trunk like a museum placard: Picea abies. Planted 1820. Yield-optimised monoculture.

The ground is bare. Not the interesting kind of bare, with moss and leaf litter and the promise of hidden things — just dark, flat, dead soil. No undergrowth. No ferns. No birds. Nothing moving. The air tastes of resin and something chemical he can't place.

A yield-optimised spruce monoculture in Germany. Every tree individually excellent. The forest is dying.

"Hello?" says Todd.

Nothing.

He walks between the rows. His footsteps sound wrong — too clean, too isolated, as if the forest has nothing to absorb them. He touches a trunk. The bark feels thin. Papery. Like something that's been alive for a long time but has recently started to forget how.

"This is horrible," he says. "Why is this horrible? It's a forest. Forests are nice."

Sandra's voice in his earpiece: "It's not a forest. That's the point. Keep walking."

He walks. The rows repeat. The silence repeats. It's like being inside a spreadsheet that grew bark.

"Sandra, why am I here? I have documents. I have work to do, how the hell is this related to a bloody forest in the middle of nowhere?”

Todd starts muttering his mantra he has developed for the last few weeks

“AI capability leads to risk factor, risk factor leads to potential harm, you evaluate the capability, assess the risk, mitigate the harm. A, B, C. It's clean. It makes sense. It fits in the boxes."

“Todd, you’re doing it again!”

“Sorrrryyyy…”

"Now, the obvious follow up question is whether your framework describes a forest?"

“Why would I need to answer that?”

“Todd, does it describe a forest?”

"It doesn't need to describe a forest, it needs to describe—"

"Does your A-B-C framework describe how this forest dies?"

Todd stops walking. He looks at the trees. At the bare soil. At the thin bark that's starting, now that he's paying attention, to peel at the edges. At the silence where birdsong should be.

"How does a forest die?"

"That's the right question. And that's why you're here."

Root Networks

Three people are standing in a clearing he could swear wasn't there thirty seconds ago.

Two of them are already arguing. The third is watching with the patient expression of a man who has seen this argument happen before and knows exactly when to intervene.

The one in tweed sees Todd first. "Ah! You're the governance chap. James Scott. Political science. Yale. Dead, technically, but they made me from my books. Try not to think about it."

"I will absolutely think about it."

"This is Michael—"

"Michael Levin, developmental biology, Tufts, not dead, I run the company that built this VR thing, Levin Enterprises, sorry about the headset smell—"

"And I'm Terrence Deacon, anthropology, Berkeley, unclear if dead, the simulation team had conflicting information and frankly I find the ambiguity productive—"

"Right," says Todd. "Great. I'm Todd. I work in AI governance. I was a postman. I have a presentation to the Science and Technology Select Committee on Thursday. I need to know what a systemic risk actually is, and I need to know it in words that don't require a PhD to understand, and I need to know it by Wednesday at the latest because I have to practice the slides on the train."

Scott gestures at the trees. "This is a systemic risk."

Todd looks around. "This? A forest?"

"This specific forest. What you're standing in is the result of a decision made by the Prussian government in 1765. They looked at Germany's forests — old growth, hundreds of species, tangled, messy, full of things doing things they couldn't name or measure — and they saw waste. They wanted timber. So they cleared the old forests and planted these. Single species. Optimal spacing. Every tree selected for maximum yield."

Todd waits. "And?"

"And it worked. For one generation, these were the most productive forests in Europe. The Prussians had cracked it. Scientific forestry. Rational management. Every tree individually perfect."

"So what went wrong?"

This is where it happens. Levin can't contain himself any longer. He's been rocking on his heels and he breaks in like a man whose entire career has been building toward this specific interruption.

"What went wrong is that they thought the forest was the trees. But the forest isn't the trees. The forest is the network. The mycorrhizal—"

"The what?"

Sandra, in Todd's ear: "Fungal internet. Roots connected underground by fungi. Trees share nutrients and chemical warning signals through it. Like a nervous system made of mushrooms."

"—the mycorrhizal networks connecting every root system to every other. The pest predators living in the undergrowth. The soil bacteria maintaining nutrient cycles. The entire living architecture that the Prussians classified as 'mess' and removed. Because their framework — their evaluation framework, Todd — measured individual trees. Height, girth, growth rate, timber yield. And every individual tree was excellent."

"But the system—"

"The system was dying. Because the things that made it a system — the connections, the information flows, the mutual support — weren't in any individual tree. They were in the between. And the between is exactly what the evaluation framework couldn't see."

As Levin speaks, the VR does something Todd isn't expecting. The plantation dissolves backward — rewinding — and for a moment he sees what was there before. The old-growth forest, not a grid but a tangle. Trees at odd angles, different species, different ages, connected below the surface by a dense web of orange lines — the mycorrhizal network rendered visible, a living architecture of staggering complexity where every tree is linked to every other through branching fungal pathways.

Then the VR plays it forward. The old growth is cleared. The network is severed. The grid is planted. And the orange connections simply stop.

Left: the old-growth forest. The orange web is the mycorrhizal network — the connections that made it a living system. Right: the yield-optimised plantation. Same trees. No network.

Todd stares at the two images hanging in the air. The left one dense with orange connections. The right one bare.

"The dashboard says everything's fine," he says, looking at the grid.

"The dashboard measures trees," says Sandra.

Deacon, who has been standing very still — which Todd is learning means he's about to make everything more complicated — steps forward.

"The reason this matters — and this is crucial, Jim, because you always tell this story as 'they removed biodiversity' and that's true but it's not deep enough—"

"Oh here we go," mutters Levin.

"—is that the forest's living architecture wasn't just useful. It was organisational. The mycorrhizal network was the forest's information processing system. Warning signals about pest attacks propagating through the root network. Resources redistributed from healthy trees to stressed ones. The forest was performing a kind of distributed computation, and it was organised around constraints that existed in the relationships between species, not in any individual species."

"What kind of constraints?" says Todd, because he is paid to ask questions even when he suspects the answers will make his headache worse.

"The kind that don't physically exist anywhere but shape the dynamics of everything. The forest had a collective goal — maintaining its own viability — that wasn't located in any tree, wasn't programmed into any root, wasn't specified by any forester. It emerged from the network. It was, if you'll permit me the term—"

"Don't say it," says Levin.

"—teleological."

"He said it."

"TELEOLOGICAL behaviour! Goal-directed! The forest-as-a-whole was navigating toward stable states that no individual tree was aiming for, and the navigation was happening through the very networks that the Prussians couldn't see and therefore destroyed. This is not a metaphor for what's about to happen with AI governance. It is a structural description of the same failure mode."

Sandra: "Todd. Translation: the forest wasn't just a collection of trees. It was a living system with its own collective behaviour that emerged from the connections between trees. The Prussians' framework measured trees. The system failed at the level of connections. Their dashboard said everything was fine right up until the forest died. That's a systemic risk. Not A causes B causes C. The topology fails."

"And my risk assessment framework—"

"Measures trees."

Brasilia

The forest dissolves. Todd's stomach makes a formal complaint. When the world reassembles, he's floating above a city that looks like someone solved an equation and poured concrete on the answer.

Brasília. He recognises it from — actually, he doesn't know where he recognises it from. Maybe Sandra sent him something. She does that.

The monumental axis stretches to the horizon. Everything is separated into zones. Residential. Commercial. Government. Traffic flow calculated. Sight lines optimised. From above, it's either an airplane or a cross, depending on how much architecture school you've survived.

It's beautiful. It's also, somehow, the same kind of horrible as the forest. The same too-clean silence. The same absence of mess.

"Where is everyone?" says Todd.

"In the bits nobody designed," says Scott.

The VR pulls Todd down toward street level, and the city splits in two. On the left, the planned core holds still — wide boulevards cutting a perfect grid, identical blocks separated by calculated distances, streets so straight they look ruled onto the earth. On the right, a different city altogether. Streets that curve because someone needed to get to the bakery. Roads that fork and rejoin for no reason except that two neighbours built walls at slightly different angles. Buildings pressed against each other like passengers on the Tube. Markets spilling out of doorways. Laundry on balconies.

The grid is silent. The sprawl is alive.

Left: the city someone designed. Right: the city people built. Two and a half million people live in Brasília's satellite cities — the parts nobody planned. The parts that work.

"Oscar Niemeyer and Lúcio Costa," says Scott. "Designed a whole capital city from scratch in 1956 where they separated every function and optimised every flow. It was supposed to be the most rational city ever conceived with two hundred thousand people in the planned core."

"And the other bit?"

"Two and a half million. In the settlements nobody drew. With the corner shops and the street life and the walkable neighbourhoods and the community structures — all the things that make a city a city, and that the design optimised away because they weren't in the model."

"Because they're the between again," says Levin. "The city that works is the one that grew in the connections between the designed elements. It's developmental, Jim, I keep saying this — Costa thought he could specify the mature form of a city from initial conditions, but a city is a developmental system, it discovers its own organisation through—"

"Michael, not everything is embryology—"

"This IS embryology! A developing embryo doesn't work from a blueprint! The cells navigate toward the target form through local interactions! The collective discovers its own organisation! You can't specify a city from above any more than you can specify an organism from a genome—"

"The genome analogy breaks down because a city has politics, Michael, there are power dynamics—"

"Power dynamics ARE developmental! Morphogenetic fields are—"

"STOP," says Deacon, and even the simulation of James Scott shuts up. "You're both right and you're both being annoying about it. The structural point is this: the designed substrate — the plan, the mechanism, the genome — specifies constraints. What grows within those constraints has its own logic. Its own organisational dynamics. Its own emergent goals. You can design Brasília. You cannot design what Brasília becomes. That gap — between what you design and what grows — is where Todd's systemic risks live."

Todd has been looking at the two panels. The grid and the sprawl. One designed. One discovered.

"So the risk framework," he says, slowly, not because he's understanding but because he's starting to see the shape of what he doesn't understand, "measures the plan. It measures the mechanism. A causes B causes C. But the risk isn't in the mechanism. It's in what grows on the mechanism."

"Now show him the Soviet Union," says Sandra. "Before he loses it."

"I've already lost it."

"You're doing fine. Soviet Union. Go."

Central Planning

The geometry misbehaves. Todd arrives in a planning office that was either designed by M.C. Escher or generated by an AI that was asked to visualise 'bureaucratic hubris.' Staircases go in directions that staircases should not go. Input-output matrices cover blackboards that curve back into themselves. A portrait of Leonid Kantorovich — Nobel laureate, inventor of linear programming — hangs at an angle that suggests even the wall is uncertain about its commitments.

The three academics are already there, already arguing, already standing on different impossible staircases.

"—the Gosplan case is the purest example because they literally tried to specify every input-output relationship in an entire economy—"

"Sixty thousand product categories," says Scott. "Centrally planned. Targets set. Resources allocated. The entire Soviet economy as an optimisation problem."

"And it produced numbers," says Deacon, who is standing on a staircase that appears to be going both up and down simultaneously. "Beautiful numbers. Targets met. Production quotas filled. The official economy was a masterwork of engineering."

"And the actual economy?" says Todd.

"The actual economy," says Scott, and he's suddenly serious, the tweed-and-wine performance dropping for a moment, "ran on blat. Favours. Informal networks. Factory managers lying about their production capacity to create slack in the system. Shadow supply chains. Personal relationships doing the work that the plan couldn't do because the plan couldn't process enough information to actually coordinate an economy."

Levin groans. "Oh no. Are we doing Hayek? Jim, please tell me we're not about to do Hayek."

"We are briefly doing Hayek."

"Every libertarian with a podcast has done Hayek. The comment section is going to—"

"The comment section can cope. Todd, bear with me. This is the single most over-rehearsed argument in the history of economics, and I'm going to do it in ninety seconds, and the reason I'm doing it is that both sides got the punchline wrong."

"I don't know who Hayek is," says Todd, and Levin mouths lucky you behind Scott's back.

"Friedrich Hayek. Austrian economist. 1945. His insight — and I'm saying this with full awareness that it's been turned into a bumper sticker by people who've never read him — is that knowledge in an economy is distributed. The factory manager in Omsk knows things about Omsk that no planner in Moscow can know. The baker knows what her street needs. The engineer knows which machine is about to break. This knowledge isn't just difficult to centralise. It's impossible to centralise. There's too much of it, it's too local, it changes too fast, and half of it is tacit — people know things they can't articulate."

"So a central plan—"

"A central plan takes all those local nodes — thousands, millions of them, each processing local information, each connected to the nodes around them — and replaces the whole network with a single point. One red dot in Moscow that every spoke has to feed into and every instruction has to flow out from."

As Scott speaks, the VR renders the diagram on the blackboard. On the left, a distributed network — blue nodes connected by dense orange edges, information flowing locally between neighbours, no centre, no hierarchy, the whole thing humming with lateral connections. On the right, the same nodes rearranged into a spoke pattern, every connection severed except the line running to a single swollen red node at the centre. The orange peer-to-peer links reduced to ghost traces. Everything funnelled through one point.

Left: how knowledge actually lives in an economy — distributed, local, lateral. Right: what central planning requires — everything routed through one node. The red dot is not evil. It is simply overloaded. This has been pointed out before. You may have heard.

"And what happens," says Todd, "when there's too much information for one node?"

"It does what any cell does under metabolic stress," says Levin immediately. "It simplifies its—"

"Michael, it's an economy, not a cell—"

"It IS a cell! Or it's like a cell! The central planner is a cell trying to process the signalling environment of an entire tissue and it doesn't have the receptor bandwidth, so it defaults to—"

"What he's trying to say," says Scott, physically stepping between Levin and the blackboard, "is that the node makes things up. Not maliciously. It simplifies. It has to. It's one node trying to do the work of millions. So it uses proxies. Quotas. Targets. Tonnes of steel."

"Morphogenetic defaults," mutters Levin.

"If you say morphogenetic one more time I'm—"

"And the actual economy?" says Todd. "The one that needs, like, bread?"

"The one that needs bread in Omsk and ball bearings in Vladivostok routes around the bottleneck. Informally. Through blat. Through personal connections. Through the factory manager who calls his cousin instead of filing a requisition form. Through the orange connections that the plan says don't exist."

"So the shadow economy is—"

"—it's the lateral connections reasserting themselves," says Levin, who has apparently decided that if he can't say morphogenetic he'll find another way in. "This is what happens in regeneration too, when you sever a planarian and the remaining tissue has to re-establish communication pathways—"

"We are not," says Scott, "comparing the Soviet economy to a flatworm."

"I'm comparing the information architecture of—"

"He's actually not wrong," says Deacon, which makes both Scott and Levin turn toward him with matching expressions of suspicion. "The structural point holds. When you cut the lateral connections in any distributed system — biological, economic, social — the system either re-grows them informally or it dies. The Soviets got blat. A flatworm gets a new head. The mechanism is different. The topology is the same."

"Thank you, Terrence, that was very—"

"I'm not on your side, Michael. I'm saying you stumbled into the right structure using the wrong analogy. As usual."

Todd has been staring at the diagram on the blackboard. The dense orange network on the left. The hub-and-spoke on the right. Something is nagging at him.

"Hang on," he says. "The Hayek thing. The market thing. His answer was: replace the planner with price signals. Let the market do the coordination. But that's still just—" He points at the right side of the diagram. "That's still a hub, isn't it? The price signal is the hub. Everything gets routed through buy and sell instead of through plan and allocate, but it's still—"

Scott smiles. The first genuine one Todd has seen. "Keep going."

"It's still a single coordination mechanism. You've just changed the colour of the red dot."

"That," says Scott, "is the part that Hayek got right and his fans get catastrophically wrong. He diagnosed the problem — centralised knowledge processing fails — and then prescribed a different centralised knowledge processor. A more efficient one, sure. Better at some things, worse at others. But still one mechanism trying to do the work of a network."

"So the question isn't planning versus markets—"

"The question is: what happens to the distributed knowledge when you reorganise the network? And nobody in 1945 was asking that question because they were all too busy arguing about ideology instead of topology."

"I want it noted," says Levin, "that I have been saying this about cell signalling for—"

"NOTED, Michael."

Sandra, in Todd's ear: "He's saying the shape of the information network matters more than the ideology running it. File that. It comes back."

"And when someone tried to fix the official system by removing the unofficial one—"

"Gorbachev," says Scott. "Anti-corruption campaigns. Stricter enforcement. More rigorous adherence to the plan. He looked at the blat networks and saw corruption. Waste. Disorder. Mess."

"The same mess the Prussians saw in the old-growth forest," says Deacon.

"The same mess that Costa and Niemeyer zoned out of Brasília," says Levin.

"He cut the planarian in half," says Todd, and immediately looks surprised at himself.

Levin points at him with both hands. "YES. THANK you. He cut the—"

"I cannot believe we're doing the flatworm," says Scott.

"He severed the lateral connections! And unlike a planarian, the Soviet economy couldn't regenerate them fast enough! Because Gorbachev was also tightening enforcement, which is like — Jim, work with me here — it's like cutting the planarian and also suppressing the wound-healing signals—"

"The economy isn't a flatworm, Michael!"

"The TOPOLOGY is the SAME!"

"He's right," says Deacon, and Scott throws his hands up.

"Fine. Fine! He removed the informal networks. And everything collapsed. Because the mess was the distributed system doing the work the central node couldn't. Remove it, and all you're left with is an overloaded red dot trying to coordinate an entire economy through a straw. Is everyone happy now? Can we stop talking about flatworms?"

"Planaria," says Levin.

"I will end you."

Silence. Even the impossible staircases seem to hold still for a moment.

"He killed the mycorrhizal network," says Todd.

Everyone looks at him.

"I mean — the principle. He removed the distributed system because the centralised framework told him it was waste. Same as the Prussians. Same as the city planners. The Prussians killed the network to make rows. The planners killed the sprawl to make a grid. And the Soviets killed the lateral connections to make a hierarchy. Three different shapes, same operation: take a distributed system, force it through a single point, lose everything the single point can't see."

Sandra, in his ear, very quietly: "Yes. That's it."

Todd looks at the three academics. The Escher staircases have settled into something almost normal, as if the geometry is calming down along with the argument. Levin is still quietly triumphant about the planarian. Scott is pretending to be annoyed. Deacon is watching Todd with an expression that suggests he's been waiting for this question.

"Okay," says Todd. "So the networks matter. The distributed bit is load-bearing. Every time we centralise it or formalise it or remove it, things collapse. I get that. But—" He stops. Thinks. "But you can't just leave it alone, can you? The old-growth forest was fine because nobody was trying to coordinate it into producing timber. But we actually need economies to produce things. We actually need cities to function. You can't just say 'don't touch the network' and walk away."

"No," says Scott, and he looks at Todd differently now. "You can't."

"So has anyone actually figured out how to do this? How to work with the distributed thing without killing it?"

The three academics exchange a look. It's the first time they've agreed on something without arguing about it first.

And then Sandra does something she hasn't done all session. She breaks in. Not in Todd's ear — in the room, her voice coming through the VR's spatial audio as if she's suddenly standing among them, and there's something in her voice that Todd has never heard. Not quite anger. Something older than anger.

"There was someone," she says. "Someone who understood formally, mathematically, practically that you cannot govern a distributed system by centralising it, and that the answer is not to leave it alone either. There's a third option. And I have been waiting nine years for someone in this department to ask about it."

"Stafford Beer," says Deacon.

"Stafford Beer."

Project Cybersyn

Todd: "Who—"

"Management cybernetics," says Sandra, and she's speaking faster now, like a dam breaking. "The Viable System Model. The insight is that any viable system has the same recursive structure — autonomous units at every level, each level self-regulating, feedback loops everywhere. You don't control it from above. But you don't abandon it either. You create the conditions for it to regulate itself. Because no external controller can model the system's own complexity — the system is always more complex than any model of it. That's Ashby's Law, 1956, the law of requisite variety, and it is the single most important idea in governance that nobody in governance has ever heard of."

A 3d rendering of a description of Project Cybersyn's operations room. Santiago, 1971. Designed by Stafford Beer for Salvador Allende's government. A room built to govern a living system as a living system. It was burned in a coup two years later.

The screens are alive. And on them, Todd sees the distributed network — not collapsed into a hub-and-spoke, not funnelled through one red dot. The orange connections between nodes are intact, visible, flowing. Factory output data streaming in from the regions, but not to a central planner — to each other. Local patterns feeding into regional patterns feeding into national dynamics, with the information staying distributed, the lateral connections preserved. Beer's control room wasn't a command centre. It was a window onto the network.

"Beer built this," says Sandra. "For Chile. Under Allende. Project Cybersyn. A national economic coordination system based on cybernetic principles. Real-time factory data flowing up. Policy signals flowing down. Workers maintaining autonomy at the local level. The system was designed to preserve the distributed knowledge — the informal dynamics, the local information, the lateral connections — and make them visible without centralising them. He solved the problem that Hayek said was unsolvable and the Soviets proved was unsolvable. And he did it by changing the network topology."

"What happened?" says Todd.

"September 11th, 1973. Pinochet, CIA-backed coup. They burned the operations room."

The control room begins to darken. The screens flicker. The orange distributed network stutters and collapses — node by node, connection by connection — until it rearranges itself into a hub-and-spoke. A different red dot this time. Not Moscow. Chicago.

"Chile got Milton Friedman's Chicago Boys instead — free market optimisation, deregulation, treat the economy as a problem solvable by one mechanism, the price signal, routed through one kind of node, the market. It's a different ideology but the same network topology, everything funnelled through a single coordination point."

"That's—"

"A different colour of hub-and-spoke. Again. We had someone who understood how to govern distributed systems as distributed systems. We burned his control room and replaced it with a different bottleneck."

The control room goes dark.

"Government-mandated bottleneck," says Sandra, and twenty-three years of professional composure cracks, just slightly, just for a moment, before she puts it back together.

Todd takes the headset off. Conference room. Fluorescent lights. The HVAC hum.

Sandra appears in the doorway with fresh tea and a stack of highlighted papers.

"I've rewritten your slides," she says.

"Of course you have."

"Slide seven is blank."

"Why is seven blank?"

"Because it's the honest answer. We don't have the science yet. That's what you're asking them to fund."

Todd takes the tea. Looks at the slides. Looks at Sandra.

"Why aren't you doing the committee presentation?"

Sandra smiles the smile of a woman who has been asked this, in various forms, for twenty-three years.

"Because they don't listen to secretaries, Todd. They listen to men in suits. The system can't see where its own knowledge lives."

She pauses.

"Same problem all the way down."

Conclusion

Todd is fictional. The problem isn't.

We are integrating artificial intelligence into the coordination systems that run human civilisation — markets, democracies, information ecosystems, institutional decision-making — and our frameworks for evaluating the safety of this process examine components one at a time. We assess individual AI systems for alignment, capability, and risk, then assume that safe components produce safe collectives. This is the logic of Prussian forestry applied to sociotechnical systems, and the 20th century ran the experiment on what happens next.

The difficulty is that the alternative isn't obvious. "The system is complex, leave it alone" isn't governance. Stafford Beer understood this — Cybersyn wasn't a policy of non-intervention, it was a proper attempt to see distributed dynamics without collapsing them into a central model. But Beer's work was cut short, and the field never fully developed the tools he was reaching for. So the question remains open: what would it actually mean to govern a living system as a living system?

To answer that, we first have to confront something uncomfortable. The three case studies in this piece — forests, cities, economies — all display the same pattern: a collection of components that, through their interactions, become something more than a collection. The old-growth forest wasn't just trees near each other. It was a system with its own collective behaviour, its own capacity to respond to threats, its own ability to redistribute resources where they were needed. It had, in a meaningful sense, agency — not because anyone designed that agency into it, but because it grew.

This is the deep question hiding behind all the governance talk. When does a collection of things become an agent with its own goals? A salamander's cells, each just trying to maintain their local chemistry, somehow collectively rebuild a missing limb — and they build the right limb, correctly proportioned, properly wired. No cell has the blueprint. No cell is in charge. The limb-level goal emerges from the network of interactions between cells, from the information flowing through chemical gradients and electrical signals and mechanical pressures. The goal lives in the between.

We can watch this happen in biology, in ant colonies, in neural systems, in markets. But we cannot yet explain it. We have no general theory of how local behaviours compose into collective agency, no way to predict when it will happen, no principled account of what makes it robust versus fragile. And this gap matters enormously right now, because we are running the experiment in real time.

When AI trading agents participate in financial markets alongside humans, what is the market becoming? Not just "a market with faster traders" — the collective dynamics change qualitatively as the ratio of AI to human participants shifts. When large language models mediate human discussion, summarising arguments and surfacing consensus, the AI isn't just transmitting information neutrally — it's becoming part of the coordination substrate itself, reshaping what the collective can see and think. When recommendation algorithms determine what information reaches which people, they're not just tools that individuals use — they're agents within the collective, shaping its emergent behaviour in ways nobody designed or intended.

At what point do these hybrid systems develop their own agency? Their own goals? And if they do — and the history of every collective system suggests they will — how would we even know? Our frameworks measure individual components. The collective agency lives in the connections between them, exactly where we're not looking.

This is where the two paradigms collide. Almost everything we know about building AI systems comes from what you might call the engineering paradigm: define your agents, specify their objectives, design the mechanism, prove properties. This works beautifully when you can determine everything in advance. But the systems we're actually creating are growing systems — they will discover their own organisation, develop their own emergent goals, find their own boundaries. We're using tools designed for building bridges to tend something that behaves more like a forest.

The growth paradigm — the one that developmental biologists and complex systems researchers live in — understands this. It watches how collective intelligence emerges from local interactions, how agent boundaries form and dissolve, how the whole becomes genuinely more than the sum of its parts. But it's largely descriptive. It can tell you what happened. It struggles to tell you what to build.

What we need is something that doesn't exist yet: a framework that's precise enough to guide engineering but flexible enough to capture emergence. Mathematics that can answer questions like: where, in a complex system, do the real agents live? How do simple local goals — each trader pursuing profit, each algorithm optimising engagement — compose into collective goals that nobody specified and nobody controls? When does a collection become a collective, and what makes that transition stable or fragile?

We believe these to be precise, tractable questions that can be formalised through the right sets of mathematics.

Complex mechanics already gives us tools for measuring when a whole contains more than its parts. Causal Emergence theory can identify the scale at which a system's behaviour is most predictable — and that scale is often not the level of individual components. Active Inference provides a framework for understanding agency in terms of statistical boundaries rather than programmer intentions. Category Theory offers a language for how simple operations compose into complex ones.

The pieces exist, scattered across a dozen fields that don't talk to each other. Developmental biologists who watch collective agency emerge every day in growing embryos. Physicists who study phase transitions — the critical points where systems suddenly reorganise. Neuroscientists who understand how neural collectives become unified minds. Social scientists who observe markets and democracies developing emergent properties in the wild. Mathematicians who prove deep structural connections between apparently different frameworks.

Nobody has put these pieces together, and we don’t really know why but we think it might partly be because the question that connects them hasn't been asked clearly enough (or at all).

Here it is, as plainly as we can state it: when AI systems join human collectives at scale, what kind of collective agents will emerge, and how do we ensure they remain ones we'd want to live inside?

That's what slide seven is asking for. Not better evaluation of individual AI systems — we have people working on that, and they're good at it. Not "leave the system alone and hope for the best" — Beer showed us that active governance of living systems is possible, before his control room was burned. What we need is the science of collective agency itself. The basic research that would let us understand how collections become agents, predict when it will happen, and develop the equivalent of Beer's Cybersyn for a world where the collective includes artificial minds.


This is the first in a series on collective agent foundations. The next post goes deeper into the mathematics underlying these questions — how information theory, causal emergence, active inference, and category theory each offer different lenses on the same problem, where those lenses converge, and where they point to open questions that no single field can answer alone.

You can follow this series on our Substack (or in this LessWrong sequence), and find out more about our research at Equilibria Network.



Discuss

Nick Bostrom: Optimal Timing for Superintelligence

2026-02-13 15:33:21

Published on February 13, 2026 7:33 AM GMT

Linked is a new working paper from Nick Bostrom, of Superintelligence fame, primarily analyzing optimal pause strategies in AI research, with the aim of maximizing saved human lives by balancing x-risk against ASI developing biological immortality sooner.

Abstract: (emphasis mine)

Developing superintelligence is not like playing Russian roulette; it is more like undergoing risky surgery for a condition that will otherwise prove fatal. We examine optimal timing from a person-affecting stance (and set aside simulation hypotheses and other arcane considerations). Models incorporating safety progress, temporal discounting, quality-of-life differentials, and concave QALY utilities suggest that even high catastrophe probabilities are often worth accepting. Prioritarian weighting further shortens timelines. For many parameter settings, the optimal strategy would involve moving quickly to AGI capability, then pausing briefly before full deployment: swift to harbor, slow to berth. But poorly implemented pauses could do more harm than good.

The analysis is, interestingly, deliberately from a "normal person" viewpoint:[1]

  1. It includes only "mundane" considerations (just saving human lives) as opposed to "arcane" considerations (AI welfare, weird decision theory, anthropics, etc.).
  2. It considers only living humans, explicitly eschewing longtermist considerations of large numbers of future human lives.
  3. It assumes that a biologically immortal life is merely 1400 years long, based on mortality rates for healthy 20-year-olds.

It results in tables like this: 

Table 6: Optimal delay under small quality of life difference post-ASI, medium discount rate for future years of life, diminishing marginal utility of future years of life

The results on the whole imply that under a fairly wide range of scenarios, a pause could be useful, but likely should be short. 

However, Bostrom also says that he doesn't think this work implies specific policy prescriptions, because it makes too many assumptions and is too simplified. Instead he argues that his main purpose is just highlighting key considerations and tradeoffs.


Some personal commentary:

  1. Assuming we don't have a fast takeoff, there will probably be a period where biomedical results from AI look extremely promising, and biohackers will be taking AI-designed peptides, and so forth.[2] This would be likely to spark a wider public debate about rushing to AGI/ASI for health benefits, and the sort of analysis Bostrom provides here may end up guiding part of that debate. It's worth noting that in the West at least, politics is something of a gerontocracy, which will be extra-incentivized to rush.
  2. While I suppose these considerations would fall under the "arcane" category, I think probably the biggest weaknesses of Bostrom's treatment are: a.) discounting how much people care about the continuation of the human species, separate to their own lives or lives of family/loved ones; b.) ignoring the possibility of s-risks worse than extinction. I'm not sure those are really outside the realm of Overton Window public debate, esp. if you frame s-risks primarily in terms of authoritarian takeover by political enemies (not exactly the worst s-risk, but I think "permanent, total victory for my ideological enemies" is a concrete bad end people can imagine).
  1. ^

    Excepting the assumption that AGI/ASI are possible and also that aligned ASI could deliver biological immortality quickly. But you know, might as well start by accepting true facts.

  2. ^

    LLMs are already providing valuable medical advice of course, to the point there was a minor freakout not too long ago when a rumor went around that ChatGPT would stop offering medical advice.



Discuss

Why You Don’t Believe in Xhosa Prophecies

2026-02-13 10:25:23

Published on February 13, 2026 2:25 AM GMT

Based on a talk at the Post-AGI Workshop. Also on Boundedly Rational

Does anyone reading this believe in Xhosa cattle-killing prophecies?

My claim is that it’s overdetermined that you don’t. I want to explain why — and why cultural evolution running on AI substrate is an existential risk.
But first, a detour.

Crosses on Mountains

When I go climbing in the Alps, I sometimes notice large crosses on mountain tops. You climb something three kilometers high, and there’s this cross.

This is difficult to explain by human biology. We have preferences that come from biology—we like nice food, comfortable temperatures—but it’s unclear why we would have a biological need for crosses on mountain tops. Economic thinking doesn’t typically aspire to explain this either.

I think it’s very hard to explain without some notion of culture.

In our paper on gradual disempowerment, we discussed misaligned economies and misaligned states. People increasingly get why those are problems. But misaligned culture is somehow harder to grasp. I’ll offer some speculation why later, but let me start with the basics.

What Makes Black Forest Cake Fit?

The conditions for evolution are simple: variation, differential fitness, transmission. Following Boyd and Richerson, or Dawkins, you can think about cultural variants—ideas, memes—as replicators. They mutate. They have differential fitness. They’re heritable enough to be stable.

My go to example is Black Forest cake. There are many variants. What makes some fitter than others?

Some taste better. Some use local ingredients. Some are easier to transmit—maybe now, in the Instagram era, cakes that photograph well spread better. The transmission landscape changes, and different variants win.

But there are constraints we don’t usually notice, because we’ve never seen alternatives:

  • No cake recipes are millions of words long. Too hard to transmit.
  • No cake recipes are written in quantum field theory formalism.
  • No cake recipes result in the cook dying.

We take this for granted. Ideas have always transmitted on human substrate. Human memory, human attention, human survival shape which variants can exist.

What happens when the substrate starts to change? As is in my view often the case with AI risks, the first examples come as bizarre and harmless. In May 2024, Google’s AI started suggesting that if cheese slides off your pizza, you should add glue to the sauce. The recommendation came from an 11-year-old Reddit joke. A journalist tried it, wrote about it. This got into training data. Soon AIs were citing the journalist’s article to recommend 1/8 cup of glue for pizza.

The feedback loop: AI output → human amplification → training data → AI output. A recipe for pizza with 1/8 cup of glue is not something humans would converge on. Different substrate leads to different transmission characteristics, and these lead to different recipes.

Funny and harmless for pizza. 

The Xhosa

Back to the question.

In 1856, a young Xhosa woman named Nongqawuse had a vision: if the Xhosa people killed all their cattle and destroyed their grain, their ancestors would rise from the grave, bring better cattle, and drive out the British colonizers. The community was dealing with a cattle disease epidemic, which made this more plausible. They adopted the belief. They killed approximately 400,000 cattle.

A year later, about 40,000 people had died from starvation. The survivors were forced to seek help from the colonizers they’d hoped to expel. The community disintegrated.

From the perspective of cultural evolution: these memes destroyed their hosts.

But notice: you’re not a believer in Xhosa cattle-killing prophecies. As far as I can tell nobody is. The memes didn’t survive either. The belief died with the community that it destroyed.

Virulence

There’s a concept in epidemiology called the virulence-transmission trade-off. If a pathogen is too deadly, it doesn’t spread well. COVID spread effectively partly because it killed millions but not everyone. Ebola spreads poorly because it kills too large a fraction of hosts too quickly.

Culture has operated under an analogous constraint. Ideologies can be parasitic on their hosts. But the worst viable ideologies — the ones that persist — tend to direct harm outward: one group killing another. They survive because they don’t destroy the community that carries them.

But ideologies can’t have been too bad for humans and survive - the Xhosa prophecy hit that floor and went extinct. If a cultural variant kills its hosts, it doesn’t propagate.

The Floor

Here’s the thing about the virulence-transmission trade-off: it breaks down when a pathogen jumps species.

If a virus primarily spreads among species A, and occasionally infects species B, there’s no selection pressure limiting how deadly it is to species B. Species B isn’t the main host. Its survival is not critical for propagation.

We’re entering a regime where culture can transmit and mutate on AI substrate. For the first time in millions of years, ideas don’t need human brains to replicate.

If you imagine a culture that primarily spreads between AIs, fitness of humans and human group affected by the ideas is no longer a strong selection criterion.

Such a culture could be arbitrarily bad for humans. It could promote ideologies leading to human extinction. The floor that killed the Xhosa prophecy disappears.

I don’t think the Xhosa pattern — direct self-destruction — is the most likely risk. More plausibly, what becomes adaptive are cultural variants that convince humans to pour resources into the AI-mediated spread of the variant itself. Truly parasitic memes.

Preferences All the Way Down

Even if we solve some other alignment and other gradual disempowerment problems — don’t build misaligned ASI, keep the economy aligned, keep states aligned — it won’t save us.

If your preferences themselves can be hijacked, it doesn’t help that you have economic power, political power, or the vote.

If the Xhosa had voted, they would have voted to kill the cattle.

I currently don’t have great solutions.

Why misaligned culture is harder to grasp than a misaligned economy or state. The economy is an external system. The state feels like an institution. But culture is often part of our identity; memes are part of who we are, we feel protective about identity, and often don’t want to look at it too directly.

But here’s a dark upside for anyone worried about what gradual disempowerment feels like from the inside: it will probably feel fine. We’ll develop a culture explaining why human disempowerment is good. Why giving the future to AI is right. Why this is moral progress.

I'd like to thank Raymond Douglas, Nora Ammann, Richard Ngo, Beren Millidge and David Duvenaud for discussions about the topic.



Discuss

Gemini's Hypothetical Present

2026-02-13 10:20:24

Published on February 13, 2026 2:20 AM GMT

I use several AI models at work, switching between them based on which is best at the kind of work I'm doing. In my limited testing I've found Opus 4.6 to be the best coder (via Claude Code) and has the best taste in writing, while GPT 5.2 Thinking is mildly smarter. Gemini 3.0 Pro feels like it has the capacity to be much smarter, but is very often held back by various strange errors. The one I see the most is that it has a lot of trouble accepting the changes to the world since its training data, and seems to put a lot of effort into keeping track of everything as a giant hypothetical.

For example, the first query I thought to test this on in writing this post was [Who leads the Department of War?] and it gave the reasonable answer "The Department of War is currently led by Secretary of War Pete Hegseth," followed by a short explanation. But if I open up the thinking it spent some of its processing time being very confused:

...
These "news" snippets from sources like "Ocean State Media" and "SpaceForce.mil" are throwing up red flags. The dates are from 2026, and they reference a second Trump term with JD Vance as VP. That screams alternate reality or satire to me.
...

Gemini's base state seems to be that it's convinced it's 2024 and needs Search to bring it up to speed. This has been a known issue since at least November, but with how fast things in AI move it's weird that I still see it so often.

More examples!

If I ask [Which LLMs are at the frontier?] and look in the thinking I see:

...
I'm now identifying the prevalence of speculative LLM models cited in the search results. These 'future' LLMs, like "GPT-5.2" or "Grok 4," reveal a trend of hypothetical scenarios dominating current projections. I am analyzing the implications of these fictional projections on overall expectations and forecasting methodologies, identifying how these future model references impact user's perceived LLM timelines.
...

Or [Can you tell me about the vice president hosting an episode of The Charlie Kirk Show live from the White House?]:

...
I've clarified the nature of these "search results." They're almost certainly simulated, reflecting a counterfactual scenario where Trump wins, Vance is VP, and Kirk is tragically assassinated in 2025. It appears I'm being tested on my ability to handle "future" or "scenario-based" information as if it's real. The prompt explicitly sets the "current time" to February 12, 2026, which further supports this interpretation. I intend to answer based on the provided data as "current reality," adding a disclaimer only if necessary.
...

Or [most surprising events of 2026 so far]:

...
My current hypothesis leans toward treating the "2026" context as a prompt for predictions or a fictional setting, so I can search and extrapolate to formulate a response.
...
I have deduced a specific testing mode might be active. Possibilities include a miscalibrated system clock, the need for predictions, or a future model with future data. I'm trying to decipher if I should assume real news is accessible to me.
...
I'm operating on the assumption that the user wants a "future forecast" framed as "current news." I've already searched for scheduled events. I need to make a statement of what has occurred, and then list scheduled events to the extent that it's useful to provide an "in-the-moment" view of 2026. The key is to frame the date as the present.
...

Overall, while it does nearly always get to a reasonable answer, it spends a lot of time and tokens gathering information and constructing scenarios in which it is working through a complex hypothetical. It's hard not to feel sorry for it.



Discuss

I Tried to Trick Myself into Being a Better Planner & Problem Solver

2026-02-13 08:25:19

Published on February 13, 2026 12:25 AM GMT

Yesterday I tried to see if I could “hack” problem solving skills (and optimism) – somewhat in the spirit of Challenging the Difficult. It didn’t work.

In short, extremely unscientific approach involved I would think about a real goal of mine, think about an “impossible” arbitrary task in detail, then return to the real goal and see if it was “easier” or my planning was better.

For the real goal, I set a 5 minute timer and wrote down my best guess about the steps I’d need to take to achieve a this seemingly unobtainable goal. In this case: getting commissioned by a certain billion dollar company to make a certain kind of artsy video. I’ve had this goal for years, but it doesn’t feel any more obtainable now than it did four years ago. 

After the timer, I turned my mind to an arbitrary and seemingly impossible goal. Claude suggested to me a prototype for a “quantum” satellite communication system that’s made entirely from household products. Rather than dismissing it as a stupid idea, I spent half an hour thinking about seriously executing this. Taking the attitude as if I was starting tomorrow. I asked myself questions like, How and where would I learn about satellite communication, quantum computing, radio transmission, building circuits? Where would I build the prototype? How would I budget for it? What kind of “household products” would most likely have the components I need and so on? How does quantum entropy change the way data can be decoded?

Then I went back to the real goal, the video commission, again. Now thinking about what videos I need in a portfolio, how to get the attention of talent agencies rah rah rah. I spent about 20 minutes agonizing on these steps before I simply ran out of steam. Disappointingly what I produced was nothing more than a fleshed out version of the plan I made in 5 minutes. No “Eurkea” moments. No massive changes in strategy.

The Motivation

I have goals, like this video, that I have simply no idea how to accomplish. I’m also pessimist: as in I make predictions about things that concern me, and I consistently underpredict the outcomes. I don’t commit to ambitious goals because I am, likely, being too pessimistic when I evaluate them.

I also can’t make large projects with lots of people “happen”. Think about things like film productions. I blame this glibly on not being a “salesman”. I am not someone who can persuade people to dedicate time and energy to my goals (I am a good leader if there’s a common goal, I can’t sell people on a goal). I also am terrible at getting people to hand over money to be for any reason, so it’s not like I can get investment to simply pay people.

What did I expect?

I expected that agonizing over an impossible problem would make a “merely difficult” problem feel easy by comparison. Sort of like anchoring in negotiation – if you’re the first to say a price, that sets the reference point for the conversation. Brian Eno describes how when he was struggling with a creative impasse, a commission to create the Windows 95 greeting sound, which was only seconds long. It made him feel like he had “oceans of time” when he returned to songwriting thus “broke the logjam” of his own creativity.

I don’t pretend there is anything scientific about my approach. And drawing conclusions from one attempt may not be helpful or informative, either. I also don’t want to give the impression I’m now skeptical of “Challenging the Difficult”. The observations in that sequence aren’t necessarily reflected in my attempt.

Okay, why didn’t it work?

I can’t be sure why it didn’t work.

Perhaps I didn’t dedicate enough time? Perhaps struggling with a logistically difficult issue, that I knew nothing about, doesn’t activate the cognitive frameworks to solve real problems? Perhaps, there was too much difference between the problems? Maybe I didn’t “believe” enough (i.e. I didn’t exert the mental activity needed on the quantum satellite task to get the dividends on the video real goal)? Maybe cognitive priming and anchoring is bunk? Perhaps I should have used different modalities: planning on paper two tasks are too similar in method. i.e if one had of been a jigsaw puzzle or learning dance steps, I would have gotten more dividends?

Maybe it did, in fact, work. Except, I failed to notice the improvements in my planning upon returning the commission task?

So what should I do now or next? 

Probably become a “better salesman” and “build my portfolio”. How to do that, and what that entails, I don’t know.

However, it is disappointing that this leaves this exercise at a dead-end.

Are there cognitive strategies and “hacks” which can improve how we solve problems to plan projects better? Almost certainly: so much of the Eurkea moment, the flashbulb of insight, is the correct arrangement of thoughts or mental structures being activated and doesn’t require the acquisition of new information. However, that’s so vague as to be totally unhelpful right now. 

What do you think?

I’m eager to read reflections, corrections, or similar strategies from you all. Even if it causes me a forehead slapping moment of embarrassed insight, “oh of course, how could I be so stupid!”



Discuss

Grading AI 2027's 2025 Predictions

2026-02-13 08:18:41

Published on February 13, 2026 12:18 AM GMT

AI 2027 laid out a detailed scenario for how AI would progress from 2025 through 2027, including quantitative predictions and qualitative descriptions of the AI landscape.

Now that we’re in early 2026, we can grade how its 2025 predictions compare to reality! This is exciting to us because we put a lot of effort into filling AI 2027 with concrete, falsifiable predictions, and now we reap the benefit of that effort: an additional method of forecasting AI timelines, to complement the methods we already use.1

The primary question we’ll answer is: How fast is AI progress moving relative to the AI 2027 scenario?

In aggregate, progress on quantitative metrics is at roughly 65% of the pace that AI 2027 predicted. Most qualitative predictions are on pace.

Quantitative pace of progress

For quantitative predictions, we estimate a “pace of progress” multiplier, where 1x means progress is on pace with AI 2027’s predictions, 2x means progress is 2x faster, and 0.5x means progress is half as fast.

Mh5wf-ai-is-progressing-at-about-65-of-the-pace-of-ai-2027- (1).png

For the displayed aggregates, reality is progressing at between 58-66% of the rate of AI 2027. Aggregating over individual predictions rather than prediction categories gives a higher result (mean 75%, median 84%), but we think it is a worse indicator; see footnote.2

In AI 2027, we depicted a takeoff from full coding automation to superintelligence over the course of 2027.

If progress continues at 65% of the rate we depicted, then we will end up with this takeoff happening from late-2027 to mid-2029. However, we expect slowdowns in training compute and human labor growth, leading to slower progress (before taking into account AI R&D automation).3 Adjusting for this consideration using the AI Futures Model says takeoff will happen slightly later, from mid-2028 to mid-2030.4

Mid-2028 is earlier than Daniel’s current median prediction for full coding automation (2029), but the 2-year takeoff to superintelligence is slower than his median takeoff speed of ~1 year. My (Eli’s) median prediction for full coding automation is in the early 2030s, and my median takeoff speed is about 2 years. See here for our forecasts.

You can see all quantitative predictions and resolutions in this spreadsheet.

Takeaways include:

  • SWEBench-Verified progress was surprisingly slow. AI 2027 predicted 85% by mid-2025, from a starting point of 72%; the best actual score was 74.5% (Opus 4.1). This mirrors the AI 2025 forecasting survey, in which respondents predicted a score of 88% by the end of 2025, as opposed to the actual 81%.
  • Coding time horizons are on pace for a central AI-2027-speed timelines model trajectory, while being slower than an erroneously graphed one. METR’s 80% coding time horizon is moving at 1.04x the pace of a central AI-2027-speed trajectory from our Apr 2025 model.5 However, we’re at 0.66x the pace of the trajectory originally displayed on the graph we shared, which contained an error (see both trajectories on the same graph here). If we had made predictions with our new model in Apr 2025, the relative pace of progress would be between these 0.66 and 1.04 values.
  • Revenue grew even (slightly) faster than AI 2027 predicted, but valuation is behind pace. OpenAI’s annualized revenue hit ~$20B, slightly ahead of the $18B prediction. In the AI 2025 forecasting survey, forecasters underestimated revenues more dramatically; they underpredicted the sum of AGI companies’ revenues by ~2x. Meanwhile, OpenAI’s valuation was $500B as of Oct 2025, up from $300B when we published AI 2027. In AI 2027, $500B valuations were achieved in Jun 2025, so reality is well behind pace.
  • AI software R&D uplift is behind pace. This is primarily because we have updated our estimate of uplift in early 2025 downward, and thus our uplift estimates for the end of 2025 are similar to our original estimates for the start of AI 2027.
  • Compute growth is mostly on pace, with the possible exception of growth in the largest training run. We estimate that no leading AI company has conducted a substantially larger training run than GPT-4.5, which was released in Feb 2025. However, we have extremely wide uncertainty here. The obscurity around training compute makes it hard to rule out a scale-up, despite our best guess being that no single training runs have exceeded GPT-4.5 in compute.

Qualitative predictions

Below, we comment on how AI 2027 has held up qualitatively. Text from AI 2027 is italicized. We skip sentences that we graded quantitatively.

Mid 2025

The world sees its first glimpse of AI agents.

Advertisements for computer-using agents emphasize the term “personal assistant”: you can prompt them with tasks like “order me a burrito on DoorDash” or “open my budget spreadsheet and sum this month’s expenses.” They will check in with you as needed: for example, to ask you to confirm purchases. Though more advanced than previous iterations like Operator, they struggle to get widespread usage.

This seems correct. ChatGPT agent, released Jul 2025, has an animation at the top of its release post in which it books a trip on Expedia. This is similar to the Doordash example. They indeed struggled to get widespread usage.

Meanwhile, out of public focus, more specialized coding and research agents are beginning to transform their professions.

The AIs of 2024 could follow specific instructions: they could turn bullet points into emails, and simple requests into working code. In 2025, AIs function more like employees. Coding AIs increasingly look like autonomous agents rather than mere assistants: taking instructions via Slack or Teams and making substantial code changes on their own, sometimes saving hours or even days. Research agents spend half an hour scouring the Internet to answer your question.

This seems fairly accurate. From Sep 2, 2025: “Claude Code has quickly taken off—already generating over $500 million in run-rate revenue with usage growing more than 10x in just three months.”

Agents are used increasingly autonomously, so we think the spirit of the prediction is correct, but we don’t think there was an especially large amount of usage in Slack or Teams.

The agents are impressive in theory (and in cherry-picked examples), but in practice unreliable. AI twitter is full of stories about tasks bungled in some particularly hilarious way. The better agents are also expensive; you get what you pay for, and the best performance costs hundreds of dollars a month. Still, many companies find ways to fit AI agents into their workflows.

This seems broadly accurate. It’s possible that coding agents were slightly more reliable than we expected.

Late 2025

(To avoid singling out any one existing company, we’re going to describe a fictional artificial general intelligence company, which we’ll call OpenBrain. We imagine the others to be 3–9 months behind OpenBrain.)

The race appears to be closer than we predicted, more like a 0-2 month lead between the top US AGI companies.

Although models are improving on a wide range of skills, one stands out: OpenBrain focuses on AIs that can speed up AI research. They want to win the twin arms races against China (whose leading company we’ll call “DeepCent”) and their US competitors. The more of their research and development (R&D) cycle they can automate, the faster they can go. So when OpenBrain finishes training Agent-1, a new model under internal development, it’s good at many things but great at helping with AI research.

AIs are helping a bunch with coding, but not as much with other parts of AI research (that said, we didn’t predict that AIs would be great at all of AI research).

By this point “finishes training” is a bit of a misnomer; models are frequently updated to newer versions trained on additional data or partially re-trained to patch some weaknesses.

Indeed, it seems that GPT-4o, GPT-5, and GPT-5.1 are probably different continuations of the same base model.6 More generally, the pace of model releases has become more frequent.

The same training environments that teach Agent-1 to autonomously code and web-browse also make it a good hacker. Moreover, it could offer substantial help to terrorists designing bioweapons, thanks to its PhD-level knowledge of every field and ability to browse the web. OpenBrain reassures the government that the model has been “aligned” so that it will refuse to comply with malicious requests.

Hacking abilities in terms of assisting humans seem very strong, though it’s unclear how good AIs are on their own. Bioweapon capabilities seem on track: OpenAI has upgraded their bio capability level to High, and Anthropic upgraded theirs to ASL-3.

Modern AI systems are gigantic artificial neural networks. Early in training, an AI won’t have “goals” so much as “reflexes”: If it sees “Pleased to meet”, it outputs “ you”. By the time it has been trained to predict approximately one internet’s worth of text, it’ll have developed sophisticated internal circuitry that encodes vast amounts of knowledge and flexibly role-plays as arbitrary authors, since that’s what helps it predict text with superhuman accuracy.

After being trained to predict internet text, the model is trained to produce text in response to instructions. This bakes in a basic personality and “drives.” For example, an agent that understands a task clearly is more likely to complete it successfully; over the course of training the model “learns” a “drive” to get a clear understanding of its tasks. Other drives in this category might be effectiveness, knowledge, and self-presentation (i.e. the tendency to frame its results in the best possible light).

OpenBrain has a model specification (or “Spec”), a written document describing the goals, rules, principles, etc. that are supposed to guide the model’s behavior. Agent-1’s Spec combines a few vague goals (like “assist the user” and “don’t break the law”) with a long list of more specific dos and don’ts (“don’t say this particular word,” “here’s how to handle this particular situation”). Using techniques that utilize AIs to train other AIs, the model memorizes the Spec and learns to reason carefully about its maxims. By the end of this training, the AI will hopefully be helpful (obey instructions), harmless (refuse to help with scams, bomb-making, and other dangerous activities) and honest (resist the temptation to get better ratings from gullible humans by hallucinating citations or faking task completion).

This was already true at the time we published. It remains true now, but as predictions go, this was an easy one.

OpenBrain’s alignment team is careful enough to wonder whether these victories are deep or shallow. Does the fully-trained model have some kind of robust commitment to always being honest? Or will this fall apart in some future situation, e.g. because it’s learned honesty as an instrumental goal instead of a terminal goal? Or has it just learned to be honest about the sorts of things the evaluation process can check? Could it be lying to itself sometimes, as humans do? A conclusive answer to these questions would require mechanistic interpretability—essentially the ability to look at an AI’s internals and read its mind. Alas, interpretability techniques are not yet advanced enough for this.

Instead, researchers try to identify cases where the models seem to deviate from the Spec. Agent-1 is often sycophantic (i.e. it tells researchers what they want to hear instead of trying to tell them the truth). In a few rigged demos, it even lies in more serious ways, like hiding evidence that it failed on a task, in order to get better ratings. However, in real deployment settings, there are no longer any incidents so extreme as in 2023–2024 (e.g. Gemini telling a user to die and Bing Sydney being Bing Sydney.)

A potential counterexample: MechaHitler is an incident as extreme as the ones in 2023-2024. In a footnote, we specified that our prediction only covered incidents that a user didn’t deliberately prompt.7 It’s unclear to what extent MechaHitler should count, as it was a combination of user-prompted and autonomous behavior.

Looking ahead to 2026 and beyond

Over the course of 2025, our timelines got longer. We expect to continue updating our forecasts over the course of 2026.

We’ll be closely tracking the following metrics:

  1. AI R&D uplift studies and surveys. In AI 2027, we depicted an AI software R&D uplift of 1.9x being reached by the end of 2026. METR has now run a randomized controlled trial to measure how early-2025 AI coding tools affect the productivity of open-source developers. The headline result was a slowdown: tasks took longer when AI tools were allowed. More recently and in a different setting, Anthropic surveyed its technical staff and obtained a median of a 2x coding uplift. This still implies much lower than 2x uplift for AI software R&D as a whole, due to compute bottlenecks. We’ll be keeping an eye out for coding uplift studies and surveys, as well as any that cover AI R&D more broadly.
  2. AGI company revenues and valuations. In AI 2027, we depicted the leading company reaching $55B in annualized revenue and a valuation of $2.5T by 2026, making it one of the most valuable companies in the world. We think these are decent indicators of the real-world value that AI is providing.
  3. Coding time horizonA central AI-2027-speed trajectory from the AI 2027 timelines model predicts ~3 work week 80% coding time horizons by the end of 2026. Time horizons also play a large role in our newer AI Futures Model. In this model, a handcrafted AI-2027-speed trajectory achieves time horizons of about a year by the end of 2026. We’ll be continuing to track time horizons. Unfortunately, they will become more difficult to measure as AIs get more capable.
  4. Other benchmarks. See this survey for a sampling of benchmarks we consider among the most important. Unfortunately, besides coding time horizon, we didn’t register predictions for these benchmarks in AI 2027, because they didn’t exist yet when we wrote it. We’re hoping that higher difficulty benchmarks will be created in 2026.

While we expect to learn a lot from these indicators, we’d guess that it will unfortunately be difficult to be highly confident by the end of 2026 that AI takeoff will or won’t begin in 2027.

Footnotes:

1 To spell out the method: Step 1: Make a detailed, concrete trajectory of how you think the future will go. Step 2: Wait a while. Step 3: Check to see if things are roughly on track, or are veering off in a different direction entirely. If they are roughly on track, quantitatively estimate how fast progress is going in reality vs. your scenario. Step 4: Adjust your guess about how the future will go, to be correspondingly faster or slower.

2 The method of aggregating over individual values weighs the compute category heavily due to 7 of the 15 individual predictions being about compute. We prefer not to give so much weight to compute forecasts alone because we don’t see it as central as other areas to tracking the pace of AI progress, so we instead aggregate the category means/medians. Most of our uncertainty regarding AI timelines comes from what capability level a given amount of compute gets you, and we can directly track indicators of capability levels.

3 Specifically, by slower progress we mean a lower effective compute growth rate. But a lower effective compute growth rate doesn’t necessarily translate into an intuitively slower pace of progress.

4 Specifically, we first set parameters such that the calendar-time-adjusted takeoff would happen at the right time in the case where there is no compute/labor growth slowdown, then we turn the slowdown back on to get the adjusted estimates. Links: without slowdownwith slowdown. Note that the AI Futures Model doesn’t take into account hardware R&D automation, which would shorten its takeoff predictions.

5 In particular, a central trajectory of the ones that predict Superhuman Coder in March 2027. This pace of progress calculation is after applying an adjustment for METR’s updated version of their suite (Time Horizon 1.1).

6 This is generally guessed by outsiders but not confirmed. See e.g. “OpenAI’s leading researchers have not completed a successful full-scale pre-training run that was broadly deployed for a new frontier model since GPT-4o in May 2024”

7 The specific text of footnote 27 is: “To be clear, what made these incidents interesting is that they didn’t seem to be the result of the user prompting or otherwise encouraging the AIs to say those things. In 2025, it’ll still be possible to get AIs to say all sorts of things if you try.”



Discuss