2026-02-13 23:03:06
Published on February 13, 2026 3:03 PM GMT
This post is also available on my Substack. If you would like to try the test described in the post, head to onlinetests.me/test/compassion2, where you can get scored and contribute to research. Data is available at the end of the post. If you are interested in the topic of psychometrics, consider joining my Discord server to talk more.
This is a bit of a followup to my previous post, Which personality traits are real? Stress-testing the lexical hypothesis. I haven’t quite gotten rid of my psychometrics addiction yet, and one of my latest projects is to try to measure trait Compassion more objectively.
For personality tests, consider the distinction between asking respondents about abstract statements like “I am concerned about others” versus concrete statements like “I’m open to spending a lot of time listening to a friend who is feeling down”. The more concrete statement has multiple virtues:
On the other hand, the abstract statement has its own advantages:
Conventional personality tests mainly use statements of the abstract kind, yet given their advantages I think there may be value in using statements of the concrete kind too.
I needed a lot of statements related to Compassion. To ensure the realism of the items, I took people who scored high or low on abstract Compassion tests and asked them to explain the meaning of their responses.
Overall I had three studies on Prolific with a total of 421 respondents getting asked. The first study of 101 respondents was what I used to generate the items for Which personality traits are real? Stress-testing the lexical hypothesis. In the second study, I asked 102 people and their 86 close friends to rate them mainly on Compassion (but also on some other traits, for variety). In the third study, I gave 53 personality statements to 132 people and asked them to pick the 5 statements that described them the best.
This gave me texts such as:
I would not see someone go without something that I had in abundance, if I see a homeless person on the streets even when I have very little money I will stop and talk with them maybe offer them a cigarette and if I have money I offer food. I will go out of my way to help people out if I have something they need and I have no use of it then they can have it for free. I hate seeing people upset and will do everything in my power to fix that upset for them even at cost to myself.
I had to convert these texts to brief personality items for the survey. In the above case, the item I ended up with was “I give things to homeless people”. Obviously this is cutting out a lot of the context, but it’s hard to assess details like this in personality surveys.
In total I generated 28 different items assessing Compassion. The full set of items can be seen below:
Then I had to test them.
I recruited 200 people and their romantic partners on Prolific.
The obvious question is whether my concrete Compassion items measure the same trait as abstract Compassion items do. Therefore I asked people to rate themselves on a variety of traits, including Compassion, in both an abstract and a concrete form. The following were my abstract Compassion items:
The raw correlation between the scores for the two tests was a mere 0.66. However, it is to be expected that we don’t get a perfect correlation, because each item carries a certain amount of measurement error, and that measurement error is only partially washed away when taking the average.
One way to estimate the measurement error in the items is to base it on how strongly the items are correlated with each other, since e.g. if the items were not at all correlated with each other, then it’s hard to see how they could “tap into” some latent factor influencing them all.
The easiest way to do that is by a statistic called Cronbach’s alpha. If I divide out by that, I can adjust the correlation for the measurement error due to having only a finite number of imperfectly correlated items, yielding the hypothetical correlation between perfectly-measured versions of the traits in question. After doing so, the correlation jumped up to 0.82, which is pretty respectable. (Though less than the 0.92 or so that I got in the previous study.)
I also asked people’s romantic partners to rate them on the concrete Compassion items (rewritten to say “My partner …” instead of “I …”). This allowed me to get a second perspective on how compassionate the respondents were. Unfortunately the correlation between self-reported Compassion and partner-reported Compassion was a mere 0.42.
It would have been cool if the concrete Compassion items were more highly correlated with the partner-reports than the abstract ones were, because this would indicate my concrete approach reduces measurement error. Unfortunately this was not the case, and the concrete approach instead had a correlation of 0.34.
(Which is suspiciously close to 0.82*0.42, the product of the prior correlations. I think this must be a coincidence, since with 200 respondents I shouldn’t be able to place correlations more exactly than ±0.14 or so.)
I’ve been curious what could account for the difference between the abstract and the concrete Compassion scores. One idea I had was that the abstract Compassion scores might also account for rare extreme acts of compassion that don’t fit into my neat schema. For this reason I did an extra survey, where I asked people to qualitatively describe the most compassionate thing they’ve been doing, and then rate how compassionate it was across a number of dimensions:
My expectation was that the aggregate score from this would correlate more with the abstract than with the concrete Compassion measurements, but when I actually tried, I instead got r~0.09 and r~0.35 respectively, indicating that the compassion measures did in fact differ by how they relate to the most extreme act of Compassion one has been doing, but in the opposite way from how I expected. Perhaps when asked abstractly, people try to adjust for environmental circumstances or something? I don’t know.
Finally, one major question in psychometrics is the stability of responses. I didn’t give it a lot of time, so I can’t measure long-term stability, plus Prolific respondents tend to disappear after a while so I probably wouldn’t be able to measure long-term stability if I tried. However, I did give people the test again after a week, so I could measure week-long retest reliability.
Compared to traditional abstract psychometric items, there were more of my concrete Compassion items that had low test-retest reliability. With such a short timespan, the low reliability is probably less due to the people changing their underlying traits, and more due to people being confused about the meaning of the items. That said, the overall difference in reliability was not huge, and I had some highly reliable Compassion items too:
One finding that may be interesting is that the variance of an item correlated with its reliability:
I can also plot the test-retest reliability of the overall test, which leads to this picture:
I was also interested in whether there was any significant factor structure in the concrete Compassion items. However, as far as I could tell, there was not. While there does seem to be hints of additional correlations (e.g. “I give things to homeless people” correlated especially much with “I purchase toys for families who are too poor to afford them”), the factor structure is dominated by a strong general factor, followed by a distinction into positive-loading and negative-loading items, perhaps because of acquiescence bias.
I would like to see this sort of study executed at an even larger scale, to eventually untangle narrower facets of Compassion. However, I am not willing to pay for it myself.
Statements that have higher test-retest reliability are probably superior to statements with lower test-retest reliability, as low reliability likely reflects confusion about the meaning of the statements. Furthermore, statements with higher correlation to overall Compassion levels are probably superior (as measures of Compassion) to statements with lower correlation. Based on that, I have made the table below:
Reliability: the test-retest reliability of the statement. Abstract λ: the correlation between the test item and abstractly-rated Compassion. Concrete λ: the correlation between the test item and concretely-rated Compassion.
Data is available on osf.
2026-02-13 22:59:03
Published on February 13, 2026 2:05 PM GMT
I reanalyzed the METR task data using a Bayesian item response theory model.
Let’s start with a plot that shouldn’t be too surprising. Four reasonable models fit the METR data equally well. They agree about the past but disagree strongly about the future.
The model selection scores known as ELPD-LOO differ by at most ~7
points.
[1] Calibration is nearly identical, with Brier
These curves are fitted using a Bayesian item response theory model described below. Before describing it, let’s recall METR’s analysis of the time horizon. They proceed in two stages:
Per-model logistic regression. For each model
An OLS trend. Regress
This is good modeling and gets the main story right, but there are some
non-standard choices here. For instance, the slope
In this post I make a joint model, adjust some things to be more in line
with standard practice, and ask what happens when you try different
trajectory shapes. The post is somewhat technical, but not so god-awful
that Claude won’t be able to answer any question you have about the
methodology. Models are fitted with Stan, 4 chains
The first stage of METR’s model is almost a 2-parameter logistic model (2PL), the workhorse of educational testing since the 1960s.
So, what kind of problems was the 2PL model designed for? Say you give 200 students a math exam with 50 questions and record their answers as correct / incorrect. You want to estimate the students’ math ability, but raw percent correct scores aren’t necessarily very good, as they depend on which questions (easy or hard? relative to which students?) happened to be on the exam.
The 2PL model solves this by giving each student a single ability score
(
The model estimates all parameters simultaneously via a logistic regression:
This matters here because METR tasks are like exam questions. They vary in both difficulty and how well they separate strong from weak models, and we want to put all the models on a common ability scale.
Ability and difficulty parameters
Each task’s difficulty has a mean that depends on log human time, plus a random component to account for the fact that same-length tasks are not born equal. (METR treats all tasks of identical length as equally hard.)
Since difficulty increases with log human time at rate
I estimate
Of course, this is a modeling choice that can be wrong. There’s no
guarantee that difficulty is linear in
A plotted dot at 5x means the task’s equivalent difficulty time is 5x
its actual human time. Even within the
There’s not too much curvature in the relationship between log human time and difficulty, so I think the log-linear form is decent, but it’s much more spread out than we’d like. There is a cluster of easy outliers on the far left, which I think can be explained by very short tasks containing virtually no information about difficulty. Overall this looks reasonable for modeling purposes.
By directly modeling ability over time, we can try out shapes like exponential, subexponential, superexponential, saturating, and singularity. Forecasts depend a lot on which shape you pick, and the data doesn’t really tell you much, so it’s not easy to choose between them. Your priors rule here.
The abilities are modeled as
where
| Model | Params | Intuition | |
|---|---|---|---|
| Linear | 2 | Linear |
|
| Quadratic |
|
3 | Superexponential, accelerating growth |
| Power-law |
|
3 | Flexible: sub- or super-exponential. |
| Saturating | 4 | S-curve ceiling on ability. |
If METR’s GitHub repo contained all the historical data, I would also have tried a piecewise linear with a breakpoint around the time of o1, which visually fits the original METR graphs better than a plain linear fit. But since the available data doesn’t go that far back, I don’t need to, and the value of including those early points in a forecasting exercise is questionable anyway. Getting hold of the latest data points is more important.
All models share the same 2PL likelihood and task parameters (
Each model except the saturating model will cross any threshold given enough time. Here are posteriors for the 50% crossing across our models. The saturating model almost never crosses the 1-month and 125-year thresholds since it saturates too fast.
| Trend | 1mo Mean | 1mo 95% CrI | 125y Mean | 125y 95% CrI |
|---|---|---|---|---|
| Linear | 2028-07 | 2027-12 – 2029-05 | 2032-03 | 2031-01 – 2033-10 |
| Quadratic | 2027-08 | 2026-12 – 2028-07 | 2029-07 | 2028-02 – 2031-09 |
| Power-law | 2027-10 | 2027-02 – 2028-11 | 2030-02 | 2028-08 – 2032-11 |
Everything above uses 50% success, but METR also cares about 80% success and fits a separate model for that. We don’t need to do that here since the model estimation doesn’t really depend on success rates at all. We’ll just calculate the 80%-success horizon using posterior draws instead.
But there are actually two reasonable ways to define “80% success,” and they give different answers.
Typical: Pick a task of average difficulty for its length. Can the model solve it 80% of the time? This is roughly what METR computes.
Marginal: Pick a random task of that length. What’s the expected success rate? Because some tasks are much harder than average, the hard ones drag down the average more than easy ones push it up.
At 50%, the two definitions agree exactly. But at 80%, the gap is roughly an order of magnitude!
So, on the one hand, it’s the variance (
The marginal horizon is the one that matters for practical purposes. “Typical” is optimistic since it only considers tasks of average difficulty for their length. The marginal accounts for the full spread of tasks, so it’s what you actually care about when predicting success on a random task of some length. That said, from the plot we see frontier performance of roughly 5 minutes, which does sound sort of short to me. I’m used to LLMs roughly one-shotting longer tasks than that, but it usually takes some iterations to get it just right. Getting the context and subtle intentions right on the first try is hard, so I’m willing to believe this estimate is reasonable.
Anyway, the predicted crossing dates at 80% success are below. First, the 1-month threshold (saturating model omitted since it almost never crosses):
| Trend | Typical Mean | Typical 95% CrI | Marginal Mean | Marginal 95% CrI |
|---|---|---|---|---|
| Linear | 2028-12 | 2028-04 – 2029-10 | 2030-07 | 2029-08 – 2031-09 |
| Quadratic | 2027-10 | 2027-02 – 2028-11 | 2028-09 | 2027-08 – 2030-04 |
| Power-law | 2028-02 | 2027-05 – 2029-04 | 2029-02 | 2028-01 – 2031-01 |
And the 125-year threshold:
| Trend | Typical Mean | Typical 95% CrI | Marginal Mean | Marginal 95% CrI |
|---|---|---|---|---|
| Linear | 2032-08 | 2031-05 – 2034-03 | 2034-02 | 2032-09 – 2036-03 |
| Quadratic | 2029-09 | 2028-03 – 2032-01 | 2030-05 | 2028-09 – 2033-05 |
| Power-law | 2030-05 | 2028-09 – 2033-05 | 2031-04 | 2029-04 – 2035-02 |
Make of this what you will, but let’s go through one scenario. Let’s say
I’m a believer in superexponential models with no preference between
quadratic and power-law, so I have 50-50 weighting on those. Suppose
also I believe that 125 years is the magic number for the auto-coder of
AI Futures, but I prefer
Let’s also have a look at METR’s actual numbers. They report an 80% horizon of around 15 minutes for Claude 3.7 Sonnet (in the original paper). Our typical 80% horizon for that model under the linear model is about 22.0 min, and the marginal is about 1.0 min, roughly 15x shorter than METR’s.
The available METR data contains the geometric mean of (typically 2-3
for HCAST) successful human baselines per task, but not the individual
times. Both METR’s analysis and mine treat this reported mean as a known
quantity, discarding uncertainty. But we can model
I’d expect smaller differences between the typical and marginal plots at
A technical point: When modeling
The ELPD-LOO estimates are: linear
Define
The multiplier is
Quadratic is the simplest choice of superexponential function. You could spin a story in its favor, but using it is somewhat arbitrary. The power-law is the simplest function that can be both super- and subexponential (in practice turns out to be superexponential here though), and I included the saturating model because, well, why not? ↩︎
2026-02-13 21:16:34
Published on February 13, 2026 1:16 PM GMT
Note: This builds on my sketch from September 2025 "You Gotta Be Dumb to Live Forever." Candidly, that work had a lot errors. I've done my best here to correct those and clarify the exact results here, but it is possible this is still all messed up. With thanks to David Brown; and Tatyana Dobreva for her great questions and feedback. All errors are mine.
Another thing that got forgotten was the fact that against all probability a sperm whale had suddenly been called into existence several miles above the surface of an alien planet…
[The whale experiences life as the ground rapidly approaches.]
I wonder if it will be friends with me?
And the rest, after a sudden wet thud, was silence.
— Douglas Adams, The Hitchhiker's Guide to the Galaxy
Why do we die?
And not just why do we humans die, but why does any complex thing die?
The standard answer from biology is that the Weismann Barrier,[1] which establishes a strict separation between the immortal germline (say DNA) and the mortal soma (for example your body), is a strategy that evolution discovered to faithfully preserve inheritance by requiring a disposable vessel.
In reality, I argue death is a computational necessity that is generalizable across all complex organisms, be they organic, artificial life, AI, or otherwise. These systems must die if they want to solve problems of a certain complexity class because doing so requires computational techniques that physically forbid self-replication.
This occurs because any system that must preserve its own description so it can reproduce ends up structurally confined to a lower-dimensional subspace of strategies. By “strategies,” I mean the computations that can be performed, the problems it can solve, and the configurations it can exist as. The complement of this subspace is something I call the Forbidden Zone. In this area, there are a set of peculiar strategies that necessitate the destruction, or irreversible modification, of the system’s own blueprint. We have good examples of these from biology:
All of these strategies, whether they require a cell to discard parts of itself, destroy itself, or commit to an irreversible non-replicating state, exist in the Forbidden Zone. Dramatically, no integrated, self-replicating system can execute them. The body exists because the genome cannot perform these special strategies itself, it must build mortal systems to run computations that self-replication makes mathematically impossible.
This dual immortal/mortal strategy does not apply to all life, for example a bacterium does not need a body to survive. There is, however, a precise threshold where the level of complexity demands relinquishing wholly contained self-integration. I identify a Regime Dichotomy based on how search space scales:
There is a sharp phase-based transition at exactly the exponential regime and this is meaningful because it is not a sliding scale; it proves exactly why the Weismann barrier appears where it does in nature. When a self-replicating system enters the exponential regime, the only architecture that can retain its full computational capabilities is one composed of a simple immortal replicator that builds complex mortal workers. This is why humans need bodies, but bacteria do not.
Above the polynomial and exponential regimes, there exists a theoretical ceiling governed by the uncomputable Busy Beaver function[4][5]. Reasoning about this theoretical limit, we learn that no computable bound can uniformly contain the cost of persistence. At every level of this hierarchy, there exist description lengths where the costs are severe, and as computational power grows, the severity grows without limit.
By working in computational terms, I can show that these results are not just applicable to biological life but are strictly substrate-independent. They apply directly to self-replicating artificial life, Turing machines, Von Neumann probes, and Artificial Intelligence because all of these entities face the identical physical constraints.
Death is not an error. It is supreme computational technology, and we are only smart because we die.
This essay is somewhat longer, but builds the argument through the following sections:
This section is primarily about defining some preliminaries about the minimum requirements for self-replication, the preservation constraint and what it means to be non-trivial (why a computer virus is different from a crystal which also self-replicates.)
Von Neumann solved the problem of how self-replication is logically possible [6]. He did this by resolving the problem of infinite regress (a machine’s description must describe the description itself) by outlining a Universal Constructor
However, self-replication as a concept is too broad to distinguish something like a crystal[9] from an open-ended evolutionary system. Open-ended evolution requires three conditions:
Definition 1.1 (Von Neumann Threshold):
Satisfying
Definition 1.2 (The Preservation Constraint): An integrated self-replicating agent must preserve a valid, recoverable copy of its complete self-description throughout the time it is computing in order to replicate at the end of its generation. It cannot do anything that would irreversibly prevent this reconstruction, regardless of whether the destruction occurs in the
This restriction imposes a strict topological limit on the system’s potential configurations. Notably, somatic units do not face this constraint; they are free to use all
Definition 1.3 (Replication-Compatible State Space): Let
This means an integrated agent is confined to
Definition 1.4 (Destructive Strategy): A strategy
For the restrictions of destructive strategies to be sensible it is important that we distinguish informational duality. Simple replicators like crystals[9] or prions[12] only propagate a physical state. I distinguish these trivial cases from meaningful ones:
Definition 1.5 (Non-Trivial Persistent Replicators - NTPRs): A system
I define a complexity floor (
Remark: NTPR is a universal distinction. Because conditions (C1) and (C2) rely on Kolmogorov complexity and mutual information, metrics that are invariant up to a constant term by the Invariance Theorem[13], the definition holds regardless of the underlying machinery. A computable bijection between systems (like mapping DNA to binary) only shifts description lengths by a constant, guaranteeing that the depth threshold (
Some Examples:
| System | C1 | C2 | C3 | C4 | Status |
| Bacteria | ✓ | ✓ | ✓ | ✓ | NTPR (Integrated) |
| Von Neumann Probe | ✓ | ✓ | ✓ | ✓ | NTPR (Integrated) |
| Ciliate Protozoa | ✓ | ✓* | ✓ | ✓ | NTPR (Differentiated) |
| Crystal | ✗ | ✓ | ✗ | ✓ | Not NTPR - low |
| Fire | ✗ | ✗ | ✗ | ✗ | Not NTPR - No encoded |
*C2 is satisfied by the ciliate's micronucleus; the macronucleus degrades amitotically and is rebuilt from the germline during conjugation. This is an interesting intracellular instance of the germline-soma separation.
Given that self-replication has a structural constraint, how much problem-solving power is relinquished just by virtue of a system keeping itself alive? I define a universal way to consider this by fixing an optimal prefix-free Universal Turing Machine
The Generational Model: Each generation of a self-replicating system is a halting computation:
The agent must allocate a portion of its description to the specification of
Theorem 2.1 (The Productivity Bound). For a self-replicating system of total description length
Proof. Both the integrated replicator and a differentiated soma of the same total size
Please note that the superscript denotes that the time budget is
To characterize this tax we must constrain the conceptual Turing machine to a physically realistic model. I do this by modeling the agent as a Linear Bounded Automaton (LBA) with internal tape length
With this constraint, the preservation mechanism becomes a fixed-cost partition. Exactly
This yields the persistence ratio under the uniform environmental clock
The critical difference from a naive formulation is that both the numerator and denominator evaluate the time budget at the exact same argument
From the physical model above, I derive the main result: the severity of the persistence tax depends entirely on whether the environment's time budget exceeds the system's internal configuration space. This creates a sharp phase transition rather than a continuous decay.
Theorem 2.1 (The Memory-Bound Phase Transition). Let
Proof. Follows directly from evaluating the piecewise limits of the uniform clock
Note: the LBA model governs physically realizable results. The unbounded Turing machine model is used solely for the incomputable ceiling to establish the theoretical limit.
One might intuitively assume that giving an agent a computable super-exponential time budget (e.g.,
If
This reveals a deep property: no computable physical environment can yield a uniform persistent penalty worse than the
Even though I have established the limits of the persistence tax for realizable systems, I want to show the tax is an intrinsic property of self-reference. To do so I remove physical constraints and examine the system in the limit of infinite capacity by moving from the LBA to an unbounded Turing Machine. Here, the ratio is measured against the uncomputable Busy Beaver function
Theorem 2.2 (Unbounded Collapse).
Proof. The Busy Beaver function
This establishes two fundamental truths:
The previous results treated the replication overhead
1. The Cost of Accuracy: We define the noise-dependent overhead as
While the mathematical algorithm for an optimal error-correcting code (e.g., a polar code[15]) might be
2. The
for a fidelity parameter
The previous section established that there is a productivity tax on persistence. This section demonstrates that this tax is not just a reduction in efficiency, but a fundamental restriction on reachability of strategies within the total space of possible behaviors.
The preservation constraint divides strategies into two regions: those compatible with self-replication, and those that are not. Reserving
The size of the Forbidden Zone can be quantified by comparing the volume of the full strategy space
Theorem 3.1 (The Subspace Exclusion Principle).
The ratio of the full strategy space to the replication-compatible subspace is strictly and unconditionally:
Proof. The class of partial functions computable by prefix-free programs of length
Therefore, the full strategy space is
The Forbidden Zone. The set
In Harvard-like architectures where program memory (genome) is physically separate from working memory (proteome), the constraint operates on control program expressiveness. The Forbidden Zone persists because the
The relevance of the Forbidden Zone depends on the environment and particularly whether the fitness landscape peaks in the excluded region. I identify three sufficient conditions where it does.
Condition A: Destructive Strategies. If executing a strategy
Even without destruction strategies, some problems are too complex to be solved by the reduced description space of the integrated agent.
Condition B: Descriptively Dense Strategies. A strategy
An example here from biology is the developmental program used in the vertebrate body plan. Morphogenetic computation which involves coordinating billions of cell fate decisions likely requires a control program
I should note that even for standard, non-destructive problems (i.e. most biological traits like metabolism, color vision, etc. don’t destroy the genome), the integrated agent loses.
Condition C: Probabilistic Exclusion (The Mild Forbidden Zone). Even if a solution
Shrinking the search space by
When
In the last two sections I established that self-replication imposes a tax on productivity and the Forbidden Zone excludes agents from a vast region of strategy space. I now use these findings to operationalize and compare two fundamental architectures of life: those that are Integrated (every agent carries its constructor, like bacteria) and Differentiated (a germline retains replication and constructs mortal somatic units, like multicellular organisms).
One straightforward consequence of replication overhead is a throughput penalty. For finite-resource environment every bit allocated to the constructor
Definition 4.1 (Resource-Constrained Search). This is a persistent query system consisting of agents searching a fitness regime
Theorem 4.2 (Linear Rate Advantage). The asymptotic ratio of throughput between optimally differentiated (
Proof. For the Integrated system, each agent costs
If we assume the somatic units perform the full search task where
This result demonstrates that the architectural trade-off is a matter of resource efficiency. In the ideal case, where coordination costs are negligible (
There is a critical nuance I should mention regarding somatic division: although somatic cells (like the skin or liver) divide mitotically to fill the body, this represents an amplification step within a single generation rather than a persistence step across generations. Because somatic lineages do not need to maintain indefinite information integrity they can tolerate mutation accumulation and telomere erosion because the lineage terminates with the organism's death. Consequently, somatic replication avoids the high fidelity premium of the germline, which is why
Now having quantified the linear penalty of carrying the replication machinery, I examine the computational cost of preserving it.
Theorem 4.3 (Discovery Time by Regime). Let
The mathematical framework of discovery time is parametric in
Different subsystems within a single organism inhabit distinct computational regimes. The germline operates primarily in the Polynomial Regime: DNA replication is a mechanical construction task that scales polynomially. In this regime, the computational tax is negligible. The soma operates in the Exponential Regime: complex adaptation, immune search, and neural computation involve combinatorial search over high-dimensional spaces. The Weismann barrier[1] maps exactly onto this computational boundary: it sequesters the germline in the safe polynomial regime while freeing the soma to operate destructively in the risky exponential regime.
The Functional Density Constraint
I have established two distinct advantages for the differentiated architecture: a linear Rate Advantage (efficiency) and an infinite Reach Advantage (feasibility). I now synthesize these findings into a unified conjecture that predicts the transition between unicellular and multicellular life. The core insight is that these advantages are not fixed, instead they scale differently with problem complexity.
Conjecture 5.1 (Architectural Dominance).
Consider a persistent replicator facing a search problem
In summary, the Weismann barrier is the architectural response to crossing this boundary. It is not just a biological optimization, but rather a computational phase transition required to access the high-complexity regime of the fitness landscape.
There are numerous open questions that this framework does not address, but that would be highly useful to answer with experimental data or additional theoretical work. I am very grateful to Tatyana Dobreva for suggesting a number of interesting questions along these lines, including:
These are just a few of the open questions, research ideas, and some random thoughts I had to answer them. They are interesting and complex topics that deserve more work.
The unfortunate sperm whale from The Hitchhiker's Guide to the Galaxy joins the universe for a brief explosion of complex cognition ending in another sudden, and more unfortunate, explosion. In a way this is the exact same thing we have shown in the paper: according to the mathematics of self-replication it is the precise and necessary shape of any higher intelligence.
I have shown that the price of existence is a computational tax. In formalizing the preservation constraint, which is the absolute necessity that a replicator must perfectly protect its own description while acting, I found that self-replication is not merely a metabolic burden. Instead it is a structural prison. The Forbidden Zone is a mathematical fence defined by the limits of computations rather than a biological accident.
I think this result suggests an inversion of how we view multicellularity. If this paper is correct, then the Weismann barrier is not an evolutionary adoption that evolved to prevent mutational load, rather it is a necessary computational escape valve. The reason that life split into an immortal germline and a mortal soma is because it was the only physical way to solve the universe's hardest problems. To solve these problems it is necessary to build an architecture that is not burdened by the requirement of surviving them.
It is important to note that this logic is substrate-independent. It strictly bounds any complex, evolving system, whether that is a biological, digital, or synthetic entity. It also predicts that any entity facing the exponential regime of problem-solving must eventually separate a protected persisting germline (or germline adjacent concept) and a disposable soma-like structure(s).
An interesting implication of this is that AI should hit this same identical preservation tax. (Note: I am not implying this necessarily has any relevance to safety arguments.) For an AGI to maximize its own intelligence without risking the corruption of its primary weights, or its fundamental alignment (whether the encoded ones or the one of the AI has chosen), the AGI must adopt this type of differentiated architecture. It will be forced to move its core algorithms in a frozen, immutable germline, while creating “mortal”, and highly complex, sub-agents to explore the deepest mysteries of the Forbidden Zone. An amusing conclusion is that if AGI doesn’t kill us, we might identify AGI when it starts killing parts of itself!
In one sense immortality is computationally trivial. Bacteria have pulled it off for billions of years. But anything complex that wants to do interesting and hard things in this universe must be able to address state spaces of such exceptional combinatorial complexity that the self must be sacrificed to explore them.
From this perspective, death is not an error in the system. In fact, it is the computational technology that lets intelligence exist. It’s a tough pill to swallow, but we are smart only because we have agreed to die.
Weismann, A. (1893). The Germ-Plasm. Scribner's.
Tonegawa, S. (1983). Somatic Generation of Antibody Diversity. Nature, 302, 575–581.
Schatz, D. G. & Swanson, P. C. (2011). V(D)J Recombination: Mechanisms of Initiation. Annu. Rev. Genet., 45, 167–202.
Chaitin, G. J. (1975). A Theory of Program Size Formally Identical to Information Theory. JACM, 22(3), 329–340.
Rado, T. (1962). On Non-Computable Functions. Bell System Technical Journal, 41(3), 877–884.
Von Neumann, J. (1966). Theory of Self-Reproducing Automata. (A. W. Burks, Ed.). Univ. Illinois Press.
Kleene, S. C. (1952). Introduction to Metamathematics. North-Holland. (Thm. XXVI, §66).
Rogers, H. (1967). Theory of Recursive Functions and Effective Computability. McGraw-Hill.
Penrose, L. S. (1959). Self-Reproducing Machines. Scientific American, 200(6), 105–114.
Langton, C. G. (1984). Self-Reproduction in Cellular Automata. Physica D, 10(1–2), 135–144.
Kabamba, P. T., Owens, P. D. & Ulsoy, A. G. (2011). Von Neumann Threshold of Self-Reproducing Systems. Robotica, 29(1), 123–135.
Prusiner, S. B. (1998). Prions. PNAS, 95(23), 13363–13383.
Li, M. & Vitányi, P. (2008). An Introduction to Kolmogorov Complexity and Its Applications (3rd ed.). Springer.
Bennett, C. H. (1988). Logical Depth and Physical Complexity. In The Universal Turing Machine (pp. 227–257). Oxford.
Arıkan, E. (2009). Channel Polarization. IEEE Trans. Inf. Theory, 55(7), 3051–3073.
Eigen, M. (1971). Selforganization of Matter. Naturwissenschaften, 58(10), 465–523.
Eigen, M. & Schuster, P. (1977). The Hypercycle. Naturwissenschaften, 64(11), 541–565.
Bertschinger, N., Olbrich, E., Ay, N. & Jost, J. (2006). Information and Closure in Systems Theory. In Explorations in the Complexity of Possible Life (pp. 9–19). IOS Press.
Krakauer, D. et al. (2020). The Information Theory of Individuality. Theory in Biosciences, 139, 209–223.
Grünwald, P. & Vitányi, P. (2004). Shannon Information and Kolmogorov Complexity. arXiv:cs/0410002; see also Grünwald, P. & Vitányi, P. (2008). Algorithmic Information Theory. In Handbook of the Philosophy of Information (pp. 281–320). Elsevier.
Ofria, C. & Wilke, C. O. (2004). Avida: A Software Platform for Research in Computational Evolutionary Biology. Artif. Life, 10(2), 191–229.
Goldsby, H. J., Dornhaus, A., Kerr, B. & Ofria, C. (2012). Task-switching costs promote the evolution of division of labor and shifts in individuality. PNAS, 109(34), 13686–13691.
Buss, L. W. (1987). The Evolution of Individuality. Princeton.
Michod, R. E. (2007). Evolution of Individuality During the Transition from Unicellular to Multicellular Life. PNAS, 104(suppl. 1), 8613–8618.
Dawkins, R. (1982). The Extended Phenotype. Oxford.
2026-02-13 21:02:19
Published on February 13, 2026 1:02 PM GMT
Root cause analysis is a crap technique for learning from failure. To see why, we need to know some fundamentals first. These are good to know for anyone designing anything they want to be reliable.
In safety-critical systems, we distinguish between accidents (actual loss, e.g. lives, equipment, etc.) and hazardous states (sometimes called only “hazards”). If we say that
This says that an accident requires both unfavourable environmental conditions, and that the system is in a hazardous state. As a consequence,
Since we can only control the system and not its environment, we achieve safety by avoiding hazardous states.[1]
There was recently a commercial flight that made the news because they landed with less than 30 minutes of fuel in its tanks. Many people wondered why this was a big deal, because it sounds like the system was working as intended: there was a reserve, it was needed, and it was used. End of story?
The thing to realise is that landing with less than 30 minutes of fuel is a hazardous state for commercial jets. If a jet lands with less than 30 minutes of fuel, then it would only have taken bad environmental conditions to make it crash, rather than land. Thus we design commercial aviation so that jets always have 30 minutes of fuel remaining when landing. If they don’t, that’s a big deal. They’ve entered a hazardous state, and we never want to see that.
One of my children loves playing around cliffs and rocks. Initially he was very keen on promising me that he wouldn’t fall down. I explained the difference between accidents and hazardous states to him, and he realised slowly that he cannot control whether or not he has an accident, so it’s a bad idea to promise me that he won’t have an accident.
What he can control is whether or not bad environmental conditions lead to an accident, and he does that by keeping out of hazardous states. In this case, the hazardous state would be standing less than a child-height within a ledge when there is nobody below ready to catch. He can promise me to avoid that, and that satisfies me a lot more than a promise to not fall.
Hazardous conditions, as we have seen, are defined by constraints. To stay out of hazardous conditions, we have the system maintain such safety constraints. In general, though, the environment often tries to tip the system into breaking these constraints, and it often does this in unpredictable ways. This means we cannot declare in advance a sequence of steps the system should follow that will always maintain constraints.
Instead, maintaining constraints is a dynamic control problem. There are multiple controllers interacting with the system to try to keep it out of hazardous conditions. They observe feedback, i.e. information on where the system is now; they execute mental models, i.e. run simulations of where the system is going in the future; and then they issue control actions, i.e. try to adjust the system to maintain constraints based on their predictions.
Whenever a system enters a hazardous condition, it is because there were problems with the control structure, specifically one of the three components listed above::
We can also see combinations of these problems. When all three of them are problematic, we might actually be looking at an entire controller missing that should be present.
Controllers exist on all levels. For aircraft maintaining fuel constraints, controllers include the fadec inside the jet engines, the flight management computer, pilots, ground crew, dispatchers at the airline, training programmes for pilots, air traffic controllers, as well as national and international regulatory boards.[3]
Low-level controllers are often automated, in hardware or software. High-level controllers are often social, cultural, and legal in nature.
Accidents in safety-critical systems can look like a one-off freak occurrences that would be impossible to predict.[4] This is because in order for an accident occur, not only do we need bad environmental conditions, but also multiple controllers must have been unable to maintain safety constraints. The combination seems unlikely. However, by thinking in terms of hazardous states instead of accidents, we get the benefit that hazardous states are easier to predict.
Think of any common technology, like the car. We can probably rattle off several constraints we’d like it to maintain, some fairly mundane. Our car must not start an uncommanded turn, for example. One of the controllers maintaining this constraint is positive stability in the turning axis: if we let go of the steering wheel on flat ground it will return back to the centre position over time. This ensures small bumps only put us slightly off course, at which point another controller kicks in: the driver makes a small adjustment to change the course back to what it was.[5]
We don’t have to actually witness a car crash caused by an uncommanded turn to realise it would be a bad thing if a car started an uncommanded turn. Now we can continue to work on our controllers – why does the turning axis have positive stability? Can that fail? Sure it can, if tyre pressures are unequal. That’s another constraint we can design control structures around, and so on.
Further benefits of thinking about hazardous states rather than accidents is we don’t have to wait for an accident to occur before we improve the safety of our system. Being unable to maintain constraints is already a safety problem and should be analysed whether or not environmental conditions were on our side that day, i.e. whether it turned into an accident or not.
This might seem obvious. If we had designed a car that started a sudden uncommanded turn, we wouldn’t wait for it to injure someone before we addressed the problem. But I often see people – especially in the software industry – paper over near misses as long as nobody got hurt. The aviation industry is not like that. You bet safety boards will issue reports on the flight landing with less than 30 minutes of fuel.
The ideas covered in this article mainly come from a systems theory perspective of safety. One of the central figures in promoting that perspective is Nancy Leveson. I’m a huge fan of her work, among others, the books Engineering a Safer World, the CAST Handbook, and the STPA Handbook. The issue with these is that they’re (a) not well known, and (b) quite dense and filled with decades of Leveson’s experience.
The linked article then goes on to list some more things related to this I eventually want to cover with my writing, but this is probably a good place to stop for an LW linkpost.
If we try to prevent accidents while not paying attention to hazardous states, we are effectively placing our trust in the environment being on our side. Many people do this, and it can be successful for quite some time, but it always fails at some point.
This could be because the effect is too weak – or too strong!
For my child among rocks, controllers include their balance, their strength, their extremely limited sense of self-preservation, my instruction, my supervision, the places I decide to take us, etc.
What are the chances that a flight encounters delay enroute, then has to make multiple landing attempts at the intended destination including delays there, diverts, is unable to land at the alternate, and has quite far to go to a tertiary airport?
In some cars, another automated layer takes over before the driver: software lane keeping assistance can perform that correction.
2026-02-13 18:51:10
Published on February 13, 2026 10:51 AM GMT
Or: Todd Has a Presentation in London on Thursday and Three Academics (Some of Them Dead), Won't Stop Arguing About Root Fungi
(The story follows the one in Seeing Like A State but applies a systemic perspective on AI Safety)
Epistemic Status: Written with my Simulator Worlds framing. E.g I ran this simulated scenario with claude in order to generate good cognitive basins, I then orchestrated it to play out a simulated scene with my instructions (with some changes for better comedic effect). This post is Internally Verified (e.g I think most of the claims are correct with 70-85% certainty).
The headset smells like someone else's face.
"Just put it on, Todd."
"Sandra, it truly—"
"I know. Put it on. You're presenting to the Science and Technology Select Committee (UK) on Thursday about systemic risks from frontier AI and you currently think systemic risk means 'a risk that is big.'"
"That is absolutely not—"
"You said that. In the pre-brief. I wrote it down. I'm going to have it framed."
Sandra has worked at the Department for Science, Innovation and Technology for twenty-three years. She once corrected a visiting researcher from the Santa Fe Institute on his own citation and he sent her flowers. She has opinions about management cybernetics that she shares with nobody because nobody asks. She is paid less than the office coffee budget.
Todd was a postman in Swindon until eighteen months ago. His mate Dave got him the job.
"I've got forty-seven documents to fill in for the committee. Forty-seven. They've got boxes. I understand boxes. I'm good at boxes."
"The boxes are wrong."
"The boxes are government-mandated"
"Still wrong. Headset. Now."
He's in a forest.
It takes a moment. The conference room doesn't so much disappear as get gently shouldered aside by something much older. And then Todd is standing on soft ground, in cold air, surrounded by trees.
Except — and it takes him another moment to understand why it feels wrong — the trees are in rows. Perfect rows. Identical trees, identical spacing, stretching in every direction until the geometry gets bored and fades into mist. Norway spruce. He knows this because a small label is floating beside the nearest trunk like a museum placard: Picea abies. Planted 1820. Yield-optimised monoculture.
The ground is bare. Not the interesting kind of bare, with moss and leaf litter and the promise of hidden things — just dark, flat, dead soil. No undergrowth. No ferns. No birds. Nothing moving. The air tastes of resin and something chemical he can't place.
A yield-optimised spruce monoculture in Germany. Every tree individually excellent. The forest is dying.
"Hello?" says Todd.
Nothing.
He walks between the rows. His footsteps sound wrong — too clean, too isolated, as if the forest has nothing to absorb them. He touches a trunk. The bark feels thin. Papery. Like something that's been alive for a long time but has recently started to forget how.
"This is horrible," he says. "Why is this horrible? It's a forest. Forests are nice."
Sandra's voice in his earpiece: "It's not a forest. That's the point. Keep walking."
He walks. The rows repeat. The silence repeats. It's like being inside a spreadsheet that grew bark.
"Sandra, why am I here? I have documents. I have work to do, how the hell is this related to a bloody forest in the middle of nowhere?”
Todd starts muttering his mantra he has developed for the last few weeks
“AI capability leads to risk factor, risk factor leads to potential harm, you evaluate the capability, assess the risk, mitigate the harm. A, B, C. It's clean. It makes sense. It fits in the boxes."
“Todd, you’re doing it again!”
“Sorrrryyyy…”
"Now, the obvious follow up question is whether your framework describes a forest?"
“Why would I need to answer that?”
“Todd, does it describe a forest?”
"It doesn't need to describe a forest, it needs to describe—"
"Does your A-B-C framework describe how this forest dies?"
Todd stops walking. He looks at the trees. At the bare soil. At the thin bark that's starting, now that he's paying attention, to peel at the edges. At the silence where birdsong should be.
"How does a forest die?"
"That's the right question. And that's why you're here."
Three people are standing in a clearing he could swear wasn't there thirty seconds ago.
Two of them are already arguing. The third is watching with the patient expression of a man who has seen this argument happen before and knows exactly when to intervene.
The one in tweed sees Todd first. "Ah! You're the governance chap. James Scott. Political science. Yale. Dead, technically, but they made me from my books. Try not to think about it."
"I will absolutely think about it."
"This is Michael—"
"Michael Levin, developmental biology, Tufts, not dead, I run the company that built this VR thing, Levin Enterprises, sorry about the headset smell—"
"And I'm Terrence Deacon, anthropology, Berkeley, unclear if dead, the simulation team had conflicting information and frankly I find the ambiguity productive—"
"Right," says Todd. "Great. I'm Todd. I work in AI governance. I was a postman. I have a presentation to the Science and Technology Select Committee on Thursday. I need to know what a systemic risk actually is, and I need to know it in words that don't require a PhD to understand, and I need to know it by Wednesday at the latest because I have to practice the slides on the train."
Scott gestures at the trees. "This is a systemic risk."
Todd looks around. "This? A forest?"
"This specific forest. What you're standing in is the result of a decision made by the Prussian government in 1765. They looked at Germany's forests — old growth, hundreds of species, tangled, messy, full of things doing things they couldn't name or measure — and they saw waste. They wanted timber. So they cleared the old forests and planted these. Single species. Optimal spacing. Every tree selected for maximum yield."
Todd waits. "And?"
"And it worked. For one generation, these were the most productive forests in Europe. The Prussians had cracked it. Scientific forestry. Rational management. Every tree individually perfect."
"So what went wrong?"
This is where it happens. Levin can't contain himself any longer. He's been rocking on his heels and he breaks in like a man whose entire career has been building toward this specific interruption.
"What went wrong is that they thought the forest was the trees. But the forest isn't the trees. The forest is the network. The mycorrhizal—"
"The what?"
Sandra, in Todd's ear: "Fungal internet. Roots connected underground by fungi. Trees share nutrients and chemical warning signals through it. Like a nervous system made of mushrooms."
"—the mycorrhizal networks connecting every root system to every other. The pest predators living in the undergrowth. The soil bacteria maintaining nutrient cycles. The entire living architecture that the Prussians classified as 'mess' and removed. Because their framework — their evaluation framework, Todd — measured individual trees. Height, girth, growth rate, timber yield. And every individual tree was excellent."
"But the system—"
"The system was dying. Because the things that made it a system — the connections, the information flows, the mutual support — weren't in any individual tree. They were in the between. And the between is exactly what the evaluation framework couldn't see."
As Levin speaks, the VR does something Todd isn't expecting. The plantation dissolves backward — rewinding — and for a moment he sees what was there before. The old-growth forest, not a grid but a tangle. Trees at odd angles, different species, different ages, connected below the surface by a dense web of orange lines — the mycorrhizal network rendered visible, a living architecture of staggering complexity where every tree is linked to every other through branching fungal pathways.
Then the VR plays it forward. The old growth is cleared. The network is severed. The grid is planted. And the orange connections simply stop.
Left: the old-growth forest. The orange web is the mycorrhizal network — the connections that made it a living system. Right: the yield-optimised plantation. Same trees. No network.
Todd stares at the two images hanging in the air. The left one dense with orange connections. The right one bare.
"The dashboard says everything's fine," he says, looking at the grid.
"The dashboard measures trees," says Sandra.
Deacon, who has been standing very still — which Todd is learning means he's about to make everything more complicated — steps forward.
"The reason this matters — and this is crucial, Jim, because you always tell this story as 'they removed biodiversity' and that's true but it's not deep enough—"
"Oh here we go," mutters Levin.
"—is that the forest's living architecture wasn't just useful. It was organisational. The mycorrhizal network was the forest's information processing system. Warning signals about pest attacks propagating through the root network. Resources redistributed from healthy trees to stressed ones. The forest was performing a kind of distributed computation, and it was organised around constraints that existed in the relationships between species, not in any individual species."
"What kind of constraints?" says Todd, because he is paid to ask questions even when he suspects the answers will make his headache worse.
"The kind that don't physically exist anywhere but shape the dynamics of everything. The forest had a collective goal — maintaining its own viability — that wasn't located in any tree, wasn't programmed into any root, wasn't specified by any forester. It emerged from the network. It was, if you'll permit me the term—"
"Don't say it," says Levin.
"—teleological."
"He said it."
"TELEOLOGICAL behaviour! Goal-directed! The forest-as-a-whole was navigating toward stable states that no individual tree was aiming for, and the navigation was happening through the very networks that the Prussians couldn't see and therefore destroyed. This is not a metaphor for what's about to happen with AI governance. It is a structural description of the same failure mode."
Sandra: "Todd. Translation: the forest wasn't just a collection of trees. It was a living system with its own collective behaviour that emerged from the connections between trees. The Prussians' framework measured trees. The system failed at the level of connections. Their dashboard said everything was fine right up until the forest died. That's a systemic risk. Not A causes B causes C. The topology fails."
"And my risk assessment framework—"
"Measures trees."
The forest dissolves. Todd's stomach makes a formal complaint. When the world reassembles, he's floating above a city that looks like someone solved an equation and poured concrete on the answer.
Brasília. He recognises it from — actually, he doesn't know where he recognises it from. Maybe Sandra sent him something. She does that.
The monumental axis stretches to the horizon. Everything is separated into zones. Residential. Commercial. Government. Traffic flow calculated. Sight lines optimised. From above, it's either an airplane or a cross, depending on how much architecture school you've survived.
It's beautiful. It's also, somehow, the same kind of horrible as the forest. The same too-clean silence. The same absence of mess.
"Where is everyone?" says Todd.
"In the bits nobody designed," says Scott.
The VR pulls Todd down toward street level, and the city splits in two. On the left, the planned core holds still — wide boulevards cutting a perfect grid, identical blocks separated by calculated distances, streets so straight they look ruled onto the earth. On the right, a different city altogether. Streets that curve because someone needed to get to the bakery. Roads that fork and rejoin for no reason except that two neighbours built walls at slightly different angles. Buildings pressed against each other like passengers on the Tube. Markets spilling out of doorways. Laundry on balconies.
The grid is silent. The sprawl is alive.
Left: the city someone designed. Right: the city people built. Two and a half million people live in Brasília's satellite cities — the parts nobody planned. The parts that work.
"Oscar Niemeyer and Lúcio Costa," says Scott. "Designed a whole capital city from scratch in 1956 where they separated every function and optimised every flow. It was supposed to be the most rational city ever conceived with two hundred thousand people in the planned core."
"And the other bit?"
"Two and a half million. In the settlements nobody drew. With the corner shops and the street life and the walkable neighbourhoods and the community structures — all the things that make a city a city, and that the design optimised away because they weren't in the model."
"Because they're the between again," says Levin. "The city that works is the one that grew in the connections between the designed elements. It's developmental, Jim, I keep saying this — Costa thought he could specify the mature form of a city from initial conditions, but a city is a developmental system, it discovers its own organisation through—"
"Michael, not everything is embryology—"
"This IS embryology! A developing embryo doesn't work from a blueprint! The cells navigate toward the target form through local interactions! The collective discovers its own organisation! You can't specify a city from above any more than you can specify an organism from a genome—"
"The genome analogy breaks down because a city has politics, Michael, there are power dynamics—"
"Power dynamics ARE developmental! Morphogenetic fields are—"
"STOP," says Deacon, and even the simulation of James Scott shuts up. "You're both right and you're both being annoying about it. The structural point is this: the designed substrate — the plan, the mechanism, the genome — specifies constraints. What grows within those constraints has its own logic. Its own organisational dynamics. Its own emergent goals. You can design Brasília. You cannot design what Brasília becomes. That gap — between what you design and what grows — is where Todd's systemic risks live."
Todd has been looking at the two panels. The grid and the sprawl. One designed. One discovered.
"So the risk framework," he says, slowly, not because he's understanding but because he's starting to see the shape of what he doesn't understand, "measures the plan. It measures the mechanism. A causes B causes C. But the risk isn't in the mechanism. It's in what grows on the mechanism."
"Now show him the Soviet Union," says Sandra. "Before he loses it."
"I've already lost it."
"You're doing fine. Soviet Union. Go."
The geometry misbehaves. Todd arrives in a planning office that was either designed by M.C. Escher or generated by an AI that was asked to visualise 'bureaucratic hubris.' Staircases go in directions that staircases should not go. Input-output matrices cover blackboards that curve back into themselves. A portrait of Leonid Kantorovich — Nobel laureate, inventor of linear programming — hangs at an angle that suggests even the wall is uncertain about its commitments.
The three academics are already there, already arguing, already standing on different impossible staircases.
"—the Gosplan case is the purest example because they literally tried to specify every input-output relationship in an entire economy—"
"Sixty thousand product categories," says Scott. "Centrally planned. Targets set. Resources allocated. The entire Soviet economy as an optimisation problem."
"And it produced numbers," says Deacon, who is standing on a staircase that appears to be going both up and down simultaneously. "Beautiful numbers. Targets met. Production quotas filled. The official economy was a masterwork of engineering."
"And the actual economy?" says Todd.
"The actual economy," says Scott, and he's suddenly serious, the tweed-and-wine performance dropping for a moment, "ran on blat. Favours. Informal networks. Factory managers lying about their production capacity to create slack in the system. Shadow supply chains. Personal relationships doing the work that the plan couldn't do because the plan couldn't process enough information to actually coordinate an economy."
Levin groans. "Oh no. Are we doing Hayek? Jim, please tell me we're not about to do Hayek."
"We are briefly doing Hayek."
"Every libertarian with a podcast has done Hayek. The comment section is going to—"
"The comment section can cope. Todd, bear with me. This is the single most over-rehearsed argument in the history of economics, and I'm going to do it in ninety seconds, and the reason I'm doing it is that both sides got the punchline wrong."
"I don't know who Hayek is," says Todd, and Levin mouths lucky you behind Scott's back.
"Friedrich Hayek. Austrian economist. 1945. His insight — and I'm saying this with full awareness that it's been turned into a bumper sticker by people who've never read him — is that knowledge in an economy is distributed. The factory manager in Omsk knows things about Omsk that no planner in Moscow can know. The baker knows what her street needs. The engineer knows which machine is about to break. This knowledge isn't just difficult to centralise. It's impossible to centralise. There's too much of it, it's too local, it changes too fast, and half of it is tacit — people know things they can't articulate."
"So a central plan—"
"A central plan takes all those local nodes — thousands, millions of them, each processing local information, each connected to the nodes around them — and replaces the whole network with a single point. One red dot in Moscow that every spoke has to feed into and every instruction has to flow out from."
As Scott speaks, the VR renders the diagram on the blackboard. On the left, a distributed network — blue nodes connected by dense orange edges, information flowing locally between neighbours, no centre, no hierarchy, the whole thing humming with lateral connections. On the right, the same nodes rearranged into a spoke pattern, every connection severed except the line running to a single swollen red node at the centre. The orange peer-to-peer links reduced to ghost traces. Everything funnelled through one point.
Left: how knowledge actually lives in an economy — distributed, local, lateral. Right: what central planning requires — everything routed through one node. The red dot is not evil. It is simply overloaded. This has been pointed out before. You may have heard.
"And what happens," says Todd, "when there's too much information for one node?"
"It does what any cell does under metabolic stress," says Levin immediately. "It simplifies its—"
"Michael, it's an economy, not a cell—"
"It IS a cell! Or it's like a cell! The central planner is a cell trying to process the signalling environment of an entire tissue and it doesn't have the receptor bandwidth, so it defaults to—"
"What he's trying to say," says Scott, physically stepping between Levin and the blackboard, "is that the node makes things up. Not maliciously. It simplifies. It has to. It's one node trying to do the work of millions. So it uses proxies. Quotas. Targets. Tonnes of steel."
"Morphogenetic defaults," mutters Levin.
"If you say morphogenetic one more time I'm—"
"And the actual economy?" says Todd. "The one that needs, like, bread?"
"The one that needs bread in Omsk and ball bearings in Vladivostok routes around the bottleneck. Informally. Through blat. Through personal connections. Through the factory manager who calls his cousin instead of filing a requisition form. Through the orange connections that the plan says don't exist."
"So the shadow economy is—"
"—it's the lateral connections reasserting themselves," says Levin, who has apparently decided that if he can't say morphogenetic he'll find another way in. "This is what happens in regeneration too, when you sever a planarian and the remaining tissue has to re-establish communication pathways—"
"We are not," says Scott, "comparing the Soviet economy to a flatworm."
"I'm comparing the information architecture of—"
"He's actually not wrong," says Deacon, which makes both Scott and Levin turn toward him with matching expressions of suspicion. "The structural point holds. When you cut the lateral connections in any distributed system — biological, economic, social — the system either re-grows them informally or it dies. The Soviets got blat. A flatworm gets a new head. The mechanism is different. The topology is the same."
"Thank you, Terrence, that was very—"
"I'm not on your side, Michael. I'm saying you stumbled into the right structure using the wrong analogy. As usual."
Todd has been staring at the diagram on the blackboard. The dense orange network on the left. The hub-and-spoke on the right. Something is nagging at him.
"Hang on," he says. "The Hayek thing. The market thing. His answer was: replace the planner with price signals. Let the market do the coordination. But that's still just—" He points at the right side of the diagram. "That's still a hub, isn't it? The price signal is the hub. Everything gets routed through buy and sell instead of through plan and allocate, but it's still—"
Scott smiles. The first genuine one Todd has seen. "Keep going."
"It's still a single coordination mechanism. You've just changed the colour of the red dot."
"That," says Scott, "is the part that Hayek got right and his fans get catastrophically wrong. He diagnosed the problem — centralised knowledge processing fails — and then prescribed a different centralised knowledge processor. A more efficient one, sure. Better at some things, worse at others. But still one mechanism trying to do the work of a network."
"So the question isn't planning versus markets—"
"The question is: what happens to the distributed knowledge when you reorganise the network? And nobody in 1945 was asking that question because they were all too busy arguing about ideology instead of topology."
"I want it noted," says Levin, "that I have been saying this about cell signalling for—"
"NOTED, Michael."
Sandra, in Todd's ear: "He's saying the shape of the information network matters more than the ideology running it. File that. It comes back."
"And when someone tried to fix the official system by removing the unofficial one—"
"Gorbachev," says Scott. "Anti-corruption campaigns. Stricter enforcement. More rigorous adherence to the plan. He looked at the blat networks and saw corruption. Waste. Disorder. Mess."
"The same mess the Prussians saw in the old-growth forest," says Deacon.
"The same mess that Costa and Niemeyer zoned out of Brasília," says Levin.
"He cut the planarian in half," says Todd, and immediately looks surprised at himself.
Levin points at him with both hands. "YES. THANK you. He cut the—"
"I cannot believe we're doing the flatworm," says Scott.
"He severed the lateral connections! And unlike a planarian, the Soviet economy couldn't regenerate them fast enough! Because Gorbachev was also tightening enforcement, which is like — Jim, work with me here — it's like cutting the planarian and also suppressing the wound-healing signals—"
"The economy isn't a flatworm, Michael!"
"The TOPOLOGY is the SAME!"
"He's right," says Deacon, and Scott throws his hands up.
"Fine. Fine! He removed the informal networks. And everything collapsed. Because the mess was the distributed system doing the work the central node couldn't. Remove it, and all you're left with is an overloaded red dot trying to coordinate an entire economy through a straw. Is everyone happy now? Can we stop talking about flatworms?"
"Planaria," says Levin.
"I will end you."
Silence. Even the impossible staircases seem to hold still for a moment.
"He killed the mycorrhizal network," says Todd.
Everyone looks at him.
"I mean — the principle. He removed the distributed system because the centralised framework told him it was waste. Same as the Prussians. Same as the city planners. The Prussians killed the network to make rows. The planners killed the sprawl to make a grid. And the Soviets killed the lateral connections to make a hierarchy. Three different shapes, same operation: take a distributed system, force it through a single point, lose everything the single point can't see."
Sandra, in his ear, very quietly: "Yes. That's it."
Todd looks at the three academics. The Escher staircases have settled into something almost normal, as if the geometry is calming down along with the argument. Levin is still quietly triumphant about the planarian. Scott is pretending to be annoyed. Deacon is watching Todd with an expression that suggests he's been waiting for this question.
"Okay," says Todd. "So the networks matter. The distributed bit is load-bearing. Every time we centralise it or formalise it or remove it, things collapse. I get that. But—" He stops. Thinks. "But you can't just leave it alone, can you? The old-growth forest was fine because nobody was trying to coordinate it into producing timber. But we actually need economies to produce things. We actually need cities to function. You can't just say 'don't touch the network' and walk away."
"No," says Scott, and he looks at Todd differently now. "You can't."
"So has anyone actually figured out how to do this? How to work with the distributed thing without killing it?"
The three academics exchange a look. It's the first time they've agreed on something without arguing about it first.
And then Sandra does something she hasn't done all session. She breaks in. Not in Todd's ear — in the room, her voice coming through the VR's spatial audio as if she's suddenly standing among them, and there's something in her voice that Todd has never heard. Not quite anger. Something older than anger.
"There was someone," she says. "Someone who understood formally, mathematically, practically that you cannot govern a distributed system by centralising it, and that the answer is not to leave it alone either. There's a third option. And I have been waiting nine years for someone in this department to ask about it."
"Stafford Beer," says Deacon.
"Stafford Beer."
Todd: "Who—"
"Management cybernetics," says Sandra, and she's speaking faster now, like a dam breaking. "The Viable System Model. The insight is that any viable system has the same recursive structure — autonomous units at every level, each level self-regulating, feedback loops everywhere. You don't control it from above. But you don't abandon it either. You create the conditions for it to regulate itself. Because no external controller can model the system's own complexity — the system is always more complex than any model of it. That's Ashby's Law, 1956, the law of requisite variety, and it is the single most important idea in governance that nobody in governance has ever heard of."
A 3d rendering of a description of Project Cybersyn's operations room. Santiago, 1971. Designed by Stafford Beer for Salvador Allende's government. A room built to govern a living system as a living system. It was burned in a coup two years later.
The screens are alive. And on them, Todd sees the distributed network — not collapsed into a hub-and-spoke, not funnelled through one red dot. The orange connections between nodes are intact, visible, flowing. Factory output data streaming in from the regions, but not to a central planner — to each other. Local patterns feeding into regional patterns feeding into national dynamics, with the information staying distributed, the lateral connections preserved. Beer's control room wasn't a command centre. It was a window onto the network.
"Beer built this," says Sandra. "For Chile. Under Allende. Project Cybersyn. A national economic coordination system based on cybernetic principles. Real-time factory data flowing up. Policy signals flowing down. Workers maintaining autonomy at the local level. The system was designed to preserve the distributed knowledge — the informal dynamics, the local information, the lateral connections — and make them visible without centralising them. He solved the problem that Hayek said was unsolvable and the Soviets proved was unsolvable. And he did it by changing the network topology."
"What happened?" says Todd.
"September 11th, 1973. Pinochet, CIA-backed coup. They burned the operations room."
The control room begins to darken. The screens flicker. The orange distributed network stutters and collapses — node by node, connection by connection — until it rearranges itself into a hub-and-spoke. A different red dot this time. Not Moscow. Chicago.
"Chile got Milton Friedman's Chicago Boys instead — free market optimisation, deregulation, treat the economy as a problem solvable by one mechanism, the price signal, routed through one kind of node, the market. It's a different ideology but the same network topology, everything funnelled through a single coordination point."
"That's—"
"A different colour of hub-and-spoke. Again. We had someone who understood how to govern distributed systems as distributed systems. We burned his control room and replaced it with a different bottleneck."
The control room goes dark.
"Government-mandated bottleneck," says Sandra, and twenty-three years of professional composure cracks, just slightly, just for a moment, before she puts it back together.
Todd takes the headset off. Conference room. Fluorescent lights. The HVAC hum.
Sandra appears in the doorway with fresh tea and a stack of highlighted papers.
"I've rewritten your slides," she says.
"Of course you have."
"Slide seven is blank."
"Why is seven blank?"
"Because it's the honest answer. We don't have the science yet. That's what you're asking them to fund."
Todd takes the tea. Looks at the slides. Looks at Sandra.
"Why aren't you doing the committee presentation?"
Sandra smiles the smile of a woman who has been asked this, in various forms, for twenty-three years.
"Because they don't listen to secretaries, Todd. They listen to men in suits. The system can't see where its own knowledge lives."
She pauses.
"Same problem all the way down."
Todd is fictional. The problem isn't.
We are integrating artificial intelligence into the coordination systems that run human civilisation — markets, democracies, information ecosystems, institutional decision-making — and our frameworks for evaluating the safety of this process examine components one at a time. We assess individual AI systems for alignment, capability, and risk, then assume that safe components produce safe collectives. This is the logic of Prussian forestry applied to sociotechnical systems, and the 20th century ran the experiment on what happens next.
The difficulty is that the alternative isn't obvious. "The system is complex, leave it alone" isn't governance. Stafford Beer understood this — Cybersyn wasn't a policy of non-intervention, it was a proper attempt to see distributed dynamics without collapsing them into a central model. But Beer's work was cut short, and the field never fully developed the tools he was reaching for. So the question remains open: what would it actually mean to govern a living system as a living system?
To answer that, we first have to confront something uncomfortable. The three case studies in this piece — forests, cities, economies — all display the same pattern: a collection of components that, through their interactions, become something more than a collection. The old-growth forest wasn't just trees near each other. It was a system with its own collective behaviour, its own capacity to respond to threats, its own ability to redistribute resources where they were needed. It had, in a meaningful sense, agency — not because anyone designed that agency into it, but because it grew.
This is the deep question hiding behind all the governance talk. When does a collection of things become an agent with its own goals? A salamander's cells, each just trying to maintain their local chemistry, somehow collectively rebuild a missing limb — and they build the right limb, correctly proportioned, properly wired. No cell has the blueprint. No cell is in charge. The limb-level goal emerges from the network of interactions between cells, from the information flowing through chemical gradients and electrical signals and mechanical pressures. The goal lives in the between.
We can watch this happen in biology, in ant colonies, in neural systems, in markets. But we cannot yet explain it. We have no general theory of how local behaviours compose into collective agency, no way to predict when it will happen, no principled account of what makes it robust versus fragile. And this gap matters enormously right now, because we are running the experiment in real time.
When AI trading agents participate in financial markets alongside humans, what is the market becoming? Not just "a market with faster traders" — the collective dynamics change qualitatively as the ratio of AI to human participants shifts. When large language models mediate human discussion, summarising arguments and surfacing consensus, the AI isn't just transmitting information neutrally — it's becoming part of the coordination substrate itself, reshaping what the collective can see and think. When recommendation algorithms determine what information reaches which people, they're not just tools that individuals use — they're agents within the collective, shaping its emergent behaviour in ways nobody designed or intended.
At what point do these hybrid systems develop their own agency? Their own goals? And if they do — and the history of every collective system suggests they will — how would we even know? Our frameworks measure individual components. The collective agency lives in the connections between them, exactly where we're not looking.
This is where the two paradigms collide. Almost everything we know about building AI systems comes from what you might call the engineering paradigm: define your agents, specify their objectives, design the mechanism, prove properties. This works beautifully when you can determine everything in advance. But the systems we're actually creating are growing systems — they will discover their own organisation, develop their own emergent goals, find their own boundaries. We're using tools designed for building bridges to tend something that behaves more like a forest.
The growth paradigm — the one that developmental biologists and complex systems researchers live in — understands this. It watches how collective intelligence emerges from local interactions, how agent boundaries form and dissolve, how the whole becomes genuinely more than the sum of its parts. But it's largely descriptive. It can tell you what happened. It struggles to tell you what to build.
What we need is something that doesn't exist yet: a framework that's precise enough to guide engineering but flexible enough to capture emergence. Mathematics that can answer questions like: where, in a complex system, do the real agents live? How do simple local goals — each trader pursuing profit, each algorithm optimising engagement — compose into collective goals that nobody specified and nobody controls? When does a collection become a collective, and what makes that transition stable or fragile?
We believe these to be precise, tractable questions that can be formalised through the right sets of mathematics.
Complex mechanics already gives us tools for measuring when a whole contains more than its parts. Causal Emergence theory can identify the scale at which a system's behaviour is most predictable — and that scale is often not the level of individual components. Active Inference provides a framework for understanding agency in terms of statistical boundaries rather than programmer intentions. Category Theory offers a language for how simple operations compose into complex ones.
The pieces exist, scattered across a dozen fields that don't talk to each other. Developmental biologists who watch collective agency emerge every day in growing embryos. Physicists who study phase transitions — the critical points where systems suddenly reorganise. Neuroscientists who understand how neural collectives become unified minds. Social scientists who observe markets and democracies developing emergent properties in the wild. Mathematicians who prove deep structural connections between apparently different frameworks.
Nobody has put these pieces together, and we don’t really know why but we think it might partly be because the question that connects them hasn't been asked clearly enough (or at all).
Here it is, as plainly as we can state it: when AI systems join human collectives at scale, what kind of collective agents will emerge, and how do we ensure they remain ones we'd want to live inside?
That's what slide seven is asking for. Not better evaluation of individual AI systems — we have people working on that, and they're good at it. Not "leave the system alone and hope for the best" — Beer showed us that active governance of living systems is possible, before his control room was burned. What we need is the science of collective agency itself. The basic research that would let us understand how collections become agents, predict when it will happen, and develop the equivalent of Beer's Cybersyn for a world where the collective includes artificial minds.
This is the first in a series on collective agent foundations. The next post goes deeper into the mathematics underlying these questions — how information theory, causal emergence, active inference, and category theory each offer different lenses on the same problem, where those lenses converge, and where they point to open questions that no single field can answer alone.
You can follow this series on our Substack (or in this LessWrong sequence), and find out more about our research at Equilibria Network.
2026-02-13 15:33:21
Published on February 13, 2026 7:33 AM GMT
Linked is a new working paper from Nick Bostrom, of Superintelligence fame, primarily analyzing optimal pause strategies in AI research, with the aim of maximizing saved human lives by balancing x-risk against ASI developing biological immortality sooner.
Abstract: (emphasis mine)
Developing superintelligence is not like playing Russian roulette; it is more like undergoing risky surgery for a condition that will otherwise prove fatal. We examine optimal timing from a person-affecting stance (and set aside simulation hypotheses and other arcane considerations). Models incorporating safety progress, temporal discounting, quality-of-life differentials, and concave QALY utilities suggest that even high catastrophe probabilities are often worth accepting. Prioritarian weighting further shortens timelines. For many parameter settings, the optimal strategy would involve moving quickly to AGI capability, then pausing briefly before full deployment: swift to harbor, slow to berth. But poorly implemented pauses could do more harm than good.
The analysis is, interestingly, deliberately from a "normal person" viewpoint:[1]
It results in tables like this:
The results on the whole imply that under a fairly wide range of scenarios, a pause could be useful, but likely should be short.
However, Bostrom also says that he doesn't think this work implies specific policy prescriptions, because it makes too many assumptions and is too simplified. Instead he argues that his main purpose is just highlighting key considerations and tradeoffs.
Some personal commentary:
Excepting the assumption that AGI/ASI are possible and also that aligned ASI could deliver biological immortality quickly. But you know, might as well start by accepting true facts.
LLMs are already providing valuable medical advice of course, to the point there was a minor freakout not too long ago when a rumor went around that ChatGPT would stop offering medical advice.