MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

Towards an objective test of Compassion - Turning an abstract test into a collection of nuances

2026-02-13 23:03:06

Published on February 13, 2026 3:03 PM GMT

This post is also available on my Substack. If you would like to try the test described in the post, head to onlinetests.me/test/compassion2, where you can get scored and contribute to research. Data is available at the end of the post. If you are interested in the topic of psychometrics, consider joining my Discord server to talk more.

This is a bit of a followup to my previous post, Which personality traits are real? Stress-testing the lexical hypothesis. I haven’t quite gotten rid of my psychometrics addiction yet, and one of my latest projects is to try to measure trait Compassion more objectively.

For personality tests, consider the distinction between asking respondents about abstract statements like “I am concerned about others” versus concrete statements like “I’m open to spending a lot of time listening to a friend who is feeling down”. The more concrete statement has multiple virtues:

  • There is less freedom in how to interpret it, making it more consistent in meaning across respondents
  • It has less conceptual overlap with other concrete statements about compassion, allowing more nuances to be assessed with a given question set
  • It is more transparent to researchers what it means when people agree or disagree with the statement

On the other hand, the abstract statement has its own advantages:

  • It allows a broad trait like Compassion to be assessed more accurately with fewer statements
  • It makes the statement more applicable across different groups of people, e.g. someone who does not have friends can consider how concerned they are about others in a different sense than listening to friends who feel down

Conventional personality tests mainly use statements of the abstract kind, yet given their advantages I think there may be value in using statements of the concrete kind too.

Generating statements

I needed a lot of statements related to Compassion. To ensure the realism of the items, I took people who scored high or low on abstract Compassion tests and asked them to explain the meaning of their responses.

Overall I had three studies on Prolific with a total of 421 respondents getting asked. The first study of 101 respondents was what I used to generate the items for Which personality traits are real? Stress-testing the lexical hypothesis. In the second study, I asked 102 people and their 86 close friends to rate them mainly on Compassion (but also on some other traits, for variety). In the third study, I gave 53 personality statements to 132 people and asked them to pick the 5 statements that described them the best.

This gave me texts such as:

I would not see someone go without something that I had in abundance, if I see a homeless person on the streets even when I have very little money I will stop and talk with them maybe offer them a cigarette and if I have money I offer food. I will go out of my way to help people out if I have something they need and I have no use of it then they can have it for free. I hate seeing people upset and will do everything in my power to fix that upset for them even at cost to myself.

I had to convert these texts to brief personality items for the survey. In the above case, the item I ended up with was “I give things to homeless people”. Obviously this is cutting out a lot of the context, but it’s hard to assess details like this in personality surveys.

In total I generated 28 different items assessing Compassion. The full set of items can be seen below:

  • I feel uncomfortable if my friends are unhappy
  • I know how to make sad people happier after they’ve lost someone close to them
  • I show support to people who are concerned about catching diseases
  • I give things to homeless people
  • I care about helping customers who are dissatisfied with what’s happening at work
  • I help people with tech problems and installations for free
  • If a family member was in financial trouble, I would give them something they need (e.g. clothes)
  • I would help drive a neighbor for an hour on an urgent trip if their car broke down and they needed help
  • I’m open to spending a lot of time listening to a friend who is feeling down
  • I forgive people who have hurt me
  • I’ve worked in a food bank or soup kitchen or similar to help feed people who need it
  • I’ve helped a friend with mental health issues stop harming themselves
  • I help elderly people carry heavy things
  • I teach others about the systemic unfairness of the world
  • I purchase toys for families who are too poor to afford them
  • I hide my frustrations when helping others, pretending it’s no big deal
  • I’ve adopted an animal because it was abandoned and struggling
  • If someone asked for feedback about food they were proud of making, and I didn’t like the food, I’d tell them it sucks
  • If people can’t pay back their debts, then it’s their own fault and I don’t feel bad for them
  • If people seem upset, I try to figure out if they have a real problem or are just being dramatic
  • If the pet of someone close to me had run away, I might joke that it could have been run over by a car
  • If people don’t want to date me, it’s usually because they are shallow assholes
  • I avoid people who have lost someone because I don’t know how to behave around them
  • I can’t feel too sorry for abused women because I feel like they chose evil partners
  • I can’t feel sorry for a lot of poor people because they just need to learn to save money
  • If someone is upset about something, I might dismiss them with “well, that’s life”
  • If I saw someone fall over on the street, I would pass them and assume someone else would help
  • I think schizophrenic people are idiots

Then I had to test them.

Testing the statements

I recruited 200 people and their romantic partners on Prolific.

The obvious question is whether my concrete Compassion items measure the same trait as abstract Compassion items do. Therefore I asked people to rate themselves on a variety of traits, including Compassion, in both an abstract and a concrete form. The following were my abstract Compassion items:

  • I am sensitive to the needs of others (via SPI-27)
  • I am concerned about others
  • I sympathize with other’s feelings
  • I feel sympathy for those who are worse off than myself
  • I think of others first
  • I can be cold and uncaring (via BFI-2)
  • I feel little sympathy for others
  • People who know me well think I am a psychopath (new, custom item)

The raw correlation between the scores for the two tests was a mere 0.66. However, it is to be expected that we don’t get a perfect correlation, because each item carries a certain amount of measurement error, and that measurement error is only partially washed away when taking the average.

One way to estimate the measurement error in the items is to base it on how strongly the items are correlated with each other, since e.g. if the items were not at all correlated with each other, then it’s hard to see how they could “tap into” some latent factor influencing them all.

The easiest way to do that is by a statistic called Cronbach’s alpha. If I divide out by that, I can adjust the correlation for the measurement error due to having only a finite number of imperfectly correlated items, yielding the hypothetical correlation between perfectly-measured versions of the traits in question. After doing so, the correlation jumped up to 0.82, which is pretty respectable. (Though less than the 0.92 or so that I got in the previous study.)

I also asked people’s romantic partners to rate them on the concrete Compassion items (rewritten to say “My partner …” instead of “I …”). This allowed me to get a second perspective on how compassionate the respondents were. Unfortunately the correlation between self-reported Compassion and partner-reported Compassion was a mere 0.42.

It would have been cool if the concrete Compassion items were more highly correlated with the partner-reports than the abstract ones were, because this would indicate my concrete approach reduces measurement error. Unfortunately this was not the case, and the concrete approach instead had a correlation of 0.34.

(Which is suspiciously close to 0.82*0.42, the product of the prior correlations. I think this must be a coincidence, since with 200 respondents I shouldn’t be able to place correlations more exactly than ±0.14 or so.)

I’ve been curious what could account for the difference between the abstract and the concrete Compassion scores. One idea I had was that the abstract Compassion scores might also account for rare extreme acts of compassion that don’t fit into my neat schema. For this reason I did an extra survey, where I asked people to qualitatively describe the most compassionate thing they’ve been doing, and then rate how compassionate it was across a number of dimensions:

  • How often do you do something like this?
  • How much effort, cost or sacrifice was it on your part to do this?
  • What kinds of effort, cost or sacrifice was involved in this?
  • How much has the recipient(s) of this been helped by it?
  • How close are you to the recipient of this?
  • How emotionally engaged were you in this?
  • How likely would you be to do something similar again in the future?
  • How voluntary was this (i.e., to what extent did you feel free not to do it)?

My expectation was that the aggregate score from this would correlate more with the abstract than with the concrete Compassion measurements, but when I actually tried, I instead got r~0.09 and r~0.35 respectively, indicating that the compassion measures did in fact differ by how they relate to the most extreme act of Compassion one has been doing, but in the opposite way from how I expected. Perhaps when asked abstractly, people try to adjust for environmental circumstances or something? I don’t know.

Finally, one major question in psychometrics is the stability of responses. I didn’t give it a lot of time, so I can’t measure long-term stability, plus Prolific respondents tend to disappear after a while so I probably wouldn’t be able to measure long-term stability if I tried. However, I did give people the test again after a week, so I could measure week-long retest reliability.

Compared to traditional abstract psychometric items, there were more of my concrete Compassion items that had low test-retest reliability. With such a short timespan, the low reliability is probably less due to the people changing their underlying traits, and more due to people being confused about the meaning of the items. That said, the overall difference in reliability was not huge, and I had some highly reliable Compassion items too:

One finding that may be interesting is that the variance of an item correlated with its reliability:

I can also plot the test-retest reliability of the overall test, which leads to this picture:

I was also interested in whether there was any significant factor structure in the concrete Compassion items. However, as far as I could tell, there was not. While there does seem to be hints of additional correlations (e.g. “I give things to homeless people” correlated especially much with “I purchase toys for families who are too poor to afford them”), the factor structure is dominated by a strong general factor, followed by a distinction into positive-loading and negative-loading items, perhaps because of acquiescence bias.

I would like to see this sort of study executed at an even larger scale, to eventually untangle narrower facets of Compassion. However, I am not willing to pay for it myself.

Ranking the statements

Statements that have higher test-retest reliability are probably superior to statements with lower test-retest reliability, as low reliability likely reflects confusion about the meaning of the statements. Furthermore, statements with higher correlation to overall Compassion levels are probably superior (as measures of Compassion) to statements with lower correlation. Based on that, I have made the table below:

Reliability: the test-retest reliability of the statement. Abstract λ: the correlation between the test item and abstractly-rated Compassion. Concrete λ: the correlation between the test item and concretely-rated Compassion.

Data availability

Data is available on osf.



Discuss

METR's data can't distinguish between trajectories (and 80% horizons are an order of magnitude off)

2026-02-13 22:59:03

Published on February 13, 2026 2:05 PM GMT

TLDR

I reanalyzed the METR task data using a Bayesian item response theory model.

  • The METR data cannot distinguish exponential from superexponential growth. Four trajectory shapes (linear, quadratic, power-law, saturating) fit the existing data equally well but diverge on forecasts. For instance, the 95% credible interval for the 125-year crossing is 2031-01 – 2033-10 for linear and 2028-02 – 2031-09 for quadratic.
  • METR’s headline horizon numbers overstate current capability by roughly an order of magnitude at 80% success. METR doesn’t model variation in task difficulty, so their horizons reflect a task of typical difficulty for its length. But tasks of the same length vary a lot in how hard they are, and difficult tasks pull the horizon down more than the easy tasks push it up. Curiously, this doesn’t affect timelines by more than ~1 year, as it’s just a level-shift.
  • We need data about the human times to quantify uncertainty. Credible intervals throughout are too narrow because I treat human times as known rather than estimating using latent variables. I’m doing this because I don’t have access to all the raw data. This could be a big deal, and could also affect the  horizons.
  • Doubling time under the standard linear (exponential growth) model is ~4.1 months, which is similar to METR’s estimate (95% credible interval: 3.5–5.0, but see caveat above).

METR data

Let’s start with a plot that shouldn’t be too surprising. Four reasonable models fit the METR data equally well. They agree about the past but disagree strongly about the future.

The model selection scores known as ELPD-LOO differ by at most ~7 points. [1] Calibration is nearly identical, with Brier 0.066 across the board. Your prior matters a lot here and has clear-cut consequences, as the models agree about the past but disagree strongly about the future. The current data on METR’s Github doesn’t include GPT-5.2 at the moment, if you’re missing it.

These curves are fitted using a Bayesian item response theory model described below. Before describing it, let’s recall METR’s analysis of the time horizon. They proceed in two stages:

  1. Per-model logistic regression. For each model , fit where is human time for task . Here is the task duration where the curve crosses 50%. When , we get , a horizon. This gives a “horizon score” per model.

  2. An OLS trend. Regress on release date. The slope gives a doubling time of ~4 months.

This is good modeling and gets the main story right, but there are some non-standard choices here. For instance, the slope varies with model rather than task (which is unusual in item response theory) and Stage 1 uncertainty is not accounted for in Stage 2 (METR uses the bootstrap). It also treats every task of the same length as equally difficult and only considers one trajectory shape.

In this post I make a joint model, adjust some things to be more in line with standard practice, and ask what happens when you try different trajectory shapes. The post is somewhat technical, but not so god-awful that Claude won’t be able to answer any question you have about the methodology. Models are fitted with Stan, 4 chains 1000 post-warmup draws, with code available here. I intentionally won’t go into details about technicalities, e.g. prior choices – the code contains everything you’ll want to know and your favorite LLM will figure it out for you. (All priors were chosen by Codex / Claude Code and appear reasonable enough.)

The basic model

The first stage of METR’s model is almost a 2-parameter logistic model (2PL), the workhorse of educational testing since the 1960s.

So, what kind of problems was the 2PL model designed for? Say you give 200 students a math exam with 50 questions and record their answers as correct / incorrect. You want to estimate the students’ math ability, but raw percent correct scores aren’t necessarily very good, as they depend on which questions (easy or hard? relative to which students?) happened to be on the exam.

The 2PL model solves this by giving each student a single ability score () and each question two parameters: a difficulty (, how hard it is) and a discrimination (, how cleanly it separates strong from weak students). “What is 3×2?” has low discrimination as everyone gets it right regardless of ability. A simple proof-writing question has high discrimination as sufficiently strong students can solve it, but weak students have no chance.

The model estimates all parameters simultaneously via a logistic regression:

This matters here because METR tasks are like exam questions. They vary in both difficulty and how well they separate strong from weak models, and we want to put all the models on a common ability scale.

Modeling difficulty

Ability and difficulty parameters in the 2PL are hard to interpret. The scale is arbitrary, and it’s not clear what, for instance, a 0.1 increase in ability actually means. Or whether it would be better to take a log-transform of the parameter, etc. The METR data is cool and famous because each task comes with a human time, which gives us a natural and interpretable scale for difficulty. So let’s connect human time to difficulty first.

Each task’s difficulty has a mean that depends on log human time, plus a random component to account for the fact that same-length tasks are not born equal. (METR treats all tasks of identical length as equally hard.)

Since difficulty increases with log human time at rate , we can convert any difficulty value back into a time, an equivalent difficulty time. If a task takes humans 10 minutes but is unusually hard for AI, its equivalent difficulty time might be 50 minutes. A task with human time and difficulty residual has equivalent difficulty time . [2]

I estimate 1.44 (posterior median), which is quite large once we interpret it. One standard deviation of unexplained difficulty corresponds to a ~4.7x multiplier in equivalent difficulty time. [3] A task that’s harder than the average for its length is as hard as a task 4.7x longer. And a task that’s harder is as hard as a task roughly 22x longer. So tasks of identical human time can span a huge range in difficulty for the AI models.

Of course, this is a modeling choice that can be wrong. There’s no guarantee that difficulty is linear in , so we need diagnostics to check. The plot below does double duty as model diagnostic and explanation of what the random effect means in practice.

A plotted dot at 5x means the task’s equivalent difficulty time is 5x its actual human time. Even within the band, tasks of identical human time can differ multiplicatively by a factor of 22x in equivalent difficulty time, so the practical spread is enormous.

There’s not too much curvature in the relationship between log human time and difficulty, so I think the log-linear form is decent, but it’s much more spread out than we’d like. There is a cluster of easy outliers on the far left, which I think can be explained by very short tasks containing virtually no information about difficulty. Overall this looks reasonable for modeling purposes.

Modeling ability over time

By directly modeling ability over time, we can try out shapes like exponential, subexponential, superexponential, saturating, and singularity. Forecasts depend a lot on which shape you pick, and the data doesn’t really tell you much, so it’s not easy to choose between them. Your priors rule here.

The abilities are modeled as

where is the model release date in years, centered at the mean (September 2024). I’m still using a random effect for model ability here, since nobody seriously thinks every model released on the same date must be equally capable. I’m looking at four shapes for : [4]

Model Params Intuition
Linear 2 Linear = exponential horizon growth (constant doubling time)
Quadratic , 3 Superexponential, accelerating growth
Power-law , 3 Flexible: sub- or super-exponential. is a shifted/scaled version of .
Saturating 4 S-curve ceiling on ability.

If METR’s GitHub repo contained all the historical data, I would also have tried a piecewise linear with a breakpoint around the time of o1, which visually fits the original METR graphs better than a plain linear fit. But since the available data doesn’t go that far back, I don’t need to, and the value of including those early points in a forecasting exercise is questionable anyway. Getting hold of the latest data points is more important.

All models share the same 2PL likelihood and task parameters (, , , , ). Only the model for changes.

Each model except the saturating model will cross any threshold given enough time. Here are posteriors for the 50% crossing across our models. The saturating model almost never crosses the 1-month and 125-year thresholds since it saturates too fast.

Trend 1mo Mean 1mo 95% CrI 125y Mean 125y 95% CrI
Linear 2028-07 2027-12 – 2029-05 2032-03 2031-01 – 2033-10
Quadratic 2027-08 2026-12 – 2028-07 2029-07 2028-02 – 2031-09
Power-law 2027-10 2027-02 – 2028-11 2030-02 2028-08 – 2032-11

Problems with 80% success

Everything above uses 50% success, but METR also cares about 80% success and fits a separate model for that. We don’t need to do that here since the model estimation doesn’t really depend on success rates at all. We’ll just calculate the 80%-success horizon using posterior draws instead.

But there are actually two reasonable ways to define “80% success,” and they give different answers.

  1. Typical: Pick a task of average difficulty for its length. Can the model solve it 80% of the time? This is roughly what METR computes.

  2. Marginal: Pick a random task of that length. What’s the expected success rate? Because some tasks are much harder than average, the hard ones drag down the average more than easy ones push it up.

At 50%, the two definitions agree exactly. But at 80%, the gap is roughly an order of magnitude!

So, on the one hand, it’s the variance () alone that causes these two plots to be necessary under our model. But on the other hand, the difference is not really a consequence of modeling. Some tasks of the same human time vary a lot in how hard they are for our models, and a phenomenon like this would happen for any model that’s actually honest about this.

The marginal horizon is the one that matters for practical purposes. “Typical” is optimistic since it only considers tasks of average difficulty for their length. The marginal accounts for the full spread of tasks, so it’s what you actually care about when predicting success on a random task of some length. That said, from the plot we see frontier performance of roughly 5 minutes, which does sound sort of short to me. I’m used to LLMs roughly one-shotting longer tasks than that, but it usually takes some iterations to get it just right. Getting the context and subtle intentions right on the first try is hard, so I’m willing to believe this estimate is reasonable.

Anyway, the predicted crossing dates at 80% success are below. First, the 1-month threshold (saturating model omitted since it almost never crosses):

Trend Typical Mean Typical 95% CrI Marginal Mean Marginal 95% CrI
Linear 2028-12 2028-04 – 2029-10 2030-07 2029-08 – 2031-09
Quadratic 2027-10 2027-02 – 2028-11 2028-09 2027-08 – 2030-04
Power-law 2028-02 2027-05 – 2029-04 2029-02 2028-01 – 2031-01

And the 125-year threshold:

Trend Typical Mean Typical 95% CrI Marginal Mean Marginal 95% CrI
Linear 2032-08 2031-05 – 2034-03 2034-02 2032-09 – 2036-03
Quadratic 2029-09 2028-03 – 2032-01 2030-05 2028-09 – 2033-05
Power-law 2030-05 2028-09 – 2033-05 2031-04 2029-04 – 2035-02

Make of this what you will, but let’s go through one scenario. Let’s say I’m a believer in superexponential models with no preference between quadratic and power-law, so I have 50-50 weighting on those. Suppose also I believe that 125 years is the magic number for the auto-coder of AI Futures, but I prefer  to as the latter is too brittle. Then, using the arguably correct marginal formulation, my timeline has mean roughly November 2030, but the typical framework yields roughly January 2030 instead. And this isn’t too bad, just a difference of ~0.8 years! The linear model is similar, with timelines pushed out roughly 1.6 years. So, the wide marginal-typical gap doesn’t translate into that big of a timeline gap, as both trajectories have the same “slope”, just at a different level.

Let’s also have a look at METR’s actual numbers. They report an 80% horizon of around 15 minutes for Claude 3.7 Sonnet (in the original paper). Our typical 80% horizon for that model under the linear model is about 22.0 min, and the marginal is about 1.0 min, roughly 15x shorter than METR’s.

Modeling

The available METR data contains the geometric mean of (typically 2-3 for HCAST) successful human baselines per task, but not the individual times. Both METR’s analysis and mine treat this reported mean as a known quantity, discarding uncertainty. But we can model as a latent variable informed by the reported baselines. This is easy enough to do in Stan, and would give a more honest picture of what the data actually supports, as all credible intervals will be widened.

I’d expect smaller differences between the typical and marginal plots at horizon if the values were modeled properly, as more of the variance in the random effect would be absorbed by the uncertainty in . I’m not sure how big the effect would be, but getting hold of the data or doing a short simulation would help.

A technical point: When modeling , I would also try a Weibull distribution instead of log-normal, since the log-normal is typically heavier-tailed and the Weibull is easier to justify on theoretical grounds using its failure-rate interpretation.

Notes and remarks

  • I also tried a finite-time singularity model of the form . The posterior on the singularity date didn’t really move from the prior at all. This is no surprise. It just means the data is uninformative about .
  • There are loads of other knobs you could turn. Perhaps you could introduce a discrimination parameter that varies by model and task, together with a hierarchical prior. Perhaps you could make discrimination a function of time, etc. I doubt any of these would change the picture much, if at all. The model fit is good enough as it is, even if the uncertainty is likely too small. That said, I don’t want to dissuade anyone from trying!
  • The power-law model does in principle support both sub- and superexponential trajectories ( and , respectively, where is the linear model). The posterior puts , so the data does not support subexponential growth. At least when using this model.
  • There’s plenty of best-practice stuff I haven’t done, such as prior sensitivity analysis. (But we have a lot of data, and I wouldn’t expect it to matter too much.)
  • The doubling time posterior median is 4.1 months (95% credible interval: 3.5–5.0), which is close to METR’s v1.1 estimate. Of course, doubling time only makes sense for the linear model above, as the doubling time of the other models varies with time.

  1. The ELPD-LOO estimates are: linear (SE ), saturating (SE ), power-law (SE ), quadratic (SE ). ↩︎

  2. Define as the human time whose mean difficulty equals . Then , so and . ↩︎

  3. The multiplier is where is the posterior median ↩︎

  4. Quadratic is the simplest choice of superexponential function. You could spin a story in its favor, but using it is somewhat arbitrary. The power-law is the simplest function that can be both super- and subexponential (in practice turns out to be superexponential here though), and I included the saturating model because, well, why not? ↩︎



Discuss

We Die Because it's a Computational Necessity

2026-02-13 21:16:34

Published on February 13, 2026 1:16 PM GMT

Note: This builds on my sketch from September 2025 "You Gotta Be Dumb to Live Forever." Candidly, that work had a lot errors. I've done my best here to correct those and clarify the exact results here, but it is possible this is still all messed up. With thanks to David Brown; and Tatyana Dobreva for her great questions and feedback. All errors are mine.

Just one whale really, but if three had fallen...
Johannes Wierix: Three Beached Whales

Another thing that got forgotten was the fact that against all probability a sperm whale had suddenly been called into existence several miles above the surface of an alien planet…

[The whale experiences life as the ground rapidly approaches.]

I wonder if it will be friends with me?

And the rest, after a sudden wet thud, was silence.

— Douglas Adams, The Hitchhiker's Guide to the Galaxy

Why do we die?

And not just why do we humans die, but why does any complex thing die?

The standard answer from biology is that the Weismann Barrier,[1] which establishes a strict separation between the immortal germline (say DNA) and the mortal soma (for example your body), is a strategy that evolution discovered to faithfully preserve inheritance by requiring a disposable vessel.

In reality, I argue death is a computational necessity that is generalizable across all complex organisms, be they organic, artificial life, AI, or otherwise. These systems must die if they want to solve problems of a certain complexity class because doing so requires computational techniques that physically forbid self-replication.

This occurs because any system that must preserve its own description so it can reproduce ends up structurally confined to a lower-dimensional subspace of strategies. By “strategies,” I mean the computations that can be performed, the problems it can solve, and the configurations it can exist as. The complement of this subspace is something I call the Forbidden Zone. In this area, there are a set of peculiar strategies that necessitate the destruction, or irreversible modification, of the system’s own blueprint. We have good examples of these from biology:

  • B Cells produce unique antibodies by discarding and rearranging parts of their own DNA in an irreversible step.[2][3] They cannot make a faithful copy of the genome they threw away.
  • Immune effector cells actively hunt tumor cells and pathogens. Once they have completed their attack, they deliberately self-destruct (apoptosis). A destroyed cell cannot be copied.
  • Neurons are stable because they permanently exit the cell cycle (they become post-mitotic). This is necessary because their function relies on long-term signal transmission and homeostasis. These cells are alive but sterile; their irreversible modification means reproducing would destroy their functional value.

All of these strategies, whether they require a cell to discard parts of itself, destroy itself, or commit to an irreversible non-replicating state, exist in the Forbidden Zone. Dramatically, no integrated, self-replicating system can execute them. The body exists because the genome cannot perform these special strategies itself, it must build mortal systems to run computations that self-replication makes mathematically impossible.

This dual immortal/mortal strategy does not apply to all life, for example a bacterium does not need a body to survive. There is, however, a precise threshold where the level of complexity demands relinquishing wholly contained self-integration. I identify a Regime Dichotomy based on how search space scales:

  • The Polynomial Regime: Complexity is low and the cost of self-preservation is minimal because the problems that the system faces are proportional to its size. These are things like replicating your DNA, adapting to a local environment, and running a basic metabolism. Bacteria exist in this regime, where integration is essentially free.
  • The Exponential Regime: Problems involve combinatorial search, and each degree of additional complexity multiplies the number of potential strategies rather than just adding to them. Self-preservation excludes the system from an exponentially large fraction of its reachable strategy space in this regime. This is where B cells and neurons exist.

There is a sharp phase-based transition at exactly the exponential regime and this is meaningful because it is not a sliding scale; it proves exactly why the Weismann barrier appears where it does in nature. When a self-replicating system enters the exponential regime, the only architecture that can retain its full computational capabilities is one composed of a simple immortal replicator that builds complex mortal workers. This is why humans need bodies, but bacteria do not.

Above the polynomial and exponential regimes, there exists a theoretical ceiling governed by the uncomputable Busy Beaver function[4][5]. Reasoning about this theoretical limit, we learn that no computable bound can uniformly contain the cost of persistence. At every level of this hierarchy, there exist description lengths where the costs are severe, and as computational power grows, the severity grows without limit.

By working in computational terms, I can show that these results are not just applicable to biological life but are strictly substrate-independent. They apply directly to self-replicating artificial life, Turing machines, Von Neumann probes, and Artificial Intelligence because all of these entities face the identical physical constraints.

Death is not an error. It is supreme computational technology, and we are only smart because we die.

Outline of The Essay

This essay is somewhat longer, but builds the argument through the following sections:

  1. Self-Replication Definitions: first I define what self-replication requires using the von Neumann architecture and Kleene’s fixed point, and derive the preservation constraint (what self-replication forbids), which confines any integrated replicator to a proper subspace. I also define a Non-Trivial Persistent Replicator (NTPR).
  2. The Cost of Persistence: next I quantify how much productive potential is expended in order to remain replicable (what I call the Persistence Ratio), proving a sharp regime dichotomy dependent on the environmental time budget.
  3. The Forbidden Zone: I show that maintaining self-description unconditionally excludes an exponentially vast region of behavior space, highlighting when optimal strategies are destructive or descriptively dense.
  4. Architectural Comparison (The Discovery Time Theorem): I combine the cost analysis and exclusion principle to categorize every evolutionary search problem into three zones, showing exactly when differentiation is mathematically necessary.
  5. The Architectural Dominance Conjecture: Based on these findings, I predict that above a specific complexity threshold, differentiated agents strictly dominate integrated ones.
  6. Conclusions: Finally I conclude with a discussion of the findings, some biological applications, and a specific prediction for AGI.

1. Self-Replication Definitions

This section is primarily about defining some preliminaries about the minimum requirements for self-replication, the preservation constraint and what it means to be non-trivial (why a computer virus is different from a crystal which also self-replicates.)

Von Neumann solved the problem of how self-replication is logically possible [6]. He did this by resolving the problem of infinite regress (a machine’s description must describe the description itself) by outlining a Universal Constructor , Copier , Controller , and Description , where  serves a dual role by being interpreted as code instructions for  and copied as data by . This so-called von Neumann Pivot solves the regress via self-reference. Kleene's Second Recursion Theorem mathematically guarantees a resolution to this infinite regress problem due to the existence of such a fixed point in any Turing-complete system: for every total computable , there exists  with [7][8].

However, self-replication as a concept is too broad to distinguish something like a crystal[9] from an open-ended evolutionary system. Open-ended evolution requires three conditions:

  1. Universal Construction - It must have the power of a Universal Turing Machine so that it can build any computable structure (simple self-copying automata lack this[10]).
  2. Self-Reference - It must be able to effectively access its own description (guaranteed by Kleene's Theorem).
  3. Informational Fidelity - It must have robust error correction to prevent the blueprint from degenerating into noise over indefinite generations.

Definition 1.1 (Von Neumann Threshold):  is the minimum description length of the replication core  plus minimal control instructions within  to satisfy Conditions 1–3. I model  as a structural constant  with respect to total system size  which is a valid assumption for modular architectures where only the payload increases[11]. In noisy environments, this constant inflates.

Satisfying  imposes a permanent structural burden derived from solving infinite regress. I call this restriction the Preservation Constraint.

Definition 1.2 (The Preservation Constraint): An integrated self-replicating agent must preserve a valid, recoverable copy of its complete self-description throughout the time it is computing in order to replicate at the end of its generation. It cannot do anything that would irreversibly prevent this reconstruction, regardless of whether the destruction occurs in the -bit replication module or the payload region.

This restriction imposes a strict topological limit on the system’s potential configurations. Notably, somatic units do not face this constraint; they are free to use all  bits of their description and make irreversible, destructive modifications. An integrated replicator, however, is structurally confined to the region of the state space where  remains invariant and recoverable.

Definition 1.3 (Replication-Compatible State Space): Let  denote the set of all programs of length . Let  denote the subset of programs compatible with the preservation constraint which are those that maintain a recoverable self-description throughout execution.

This means an integrated agent is confined to , but a mortal soma accesses the full .

Definition 1.4 (Destructive Strategy): A strategy  is destructive if executing  requires irreversible modification of the agent's self-description in a way that prevents faithful replication. For destructive strategies, , and integrated self-replicating agents strictly cannot implement them.

For the restrictions of destructive strategies to be sensible it is important that we distinguish informational duality. Simple replicators like crystals[9] or prions[12] only propagate a physical state. I distinguish these trivial cases from meaningful ones:

Definition 1.5 (Non-Trivial Persistent Replicators - NTPRs): A system  at noise  is a non-trivial persistent replicator :

  • (C1)  - it has sufficient complexity.
  • (C2)  for all  - there is informational closure.
  • (C3)  for all  - it has non-trivial organization.
  • (C4) Reliable replication at noise  - there is environmental robustness.

I define a complexity floor () which represents the minimum logical organization to maintain coherence against a background noise (). C3 disqualifies anything that replicates through simple physical cascades.

Remark: NTPR is a universal distinction. Because conditions (C1) and (C2) rely on Kolmogorov complexity and mutual information, metrics that are invariant up to a constant term by the Invariance Theorem[13], the definition holds regardless of the underlying machinery. A computable bijection between systems (like mapping DNA to binary) only shifts description lengths by a constant, guaranteeing that the depth threshold () adjusts to the local substrate while preserving the fundamental classification.

Some Examples:

System C1 C2 C3 C4 Status
Bacteria NTPR (Integrated)
Von Neumann Probe NTPR (Integrated)
Ciliate Protozoa ✓* NTPR (Differentiated)
Crystal Not NTPR - low , trivial depth
Fire Not NTPR - No encoded 

*C2 is satisfied by the ciliate's micronucleus; the macronucleus degrades amitotically and is rebuilt from the germline during conjugation. This is an interesting intracellular instance of the germline-soma separation.

2. The Cost of Persistence

Given that self-replication has a structural constraint, how much problem-solving power is relinquished just by virtue of a system keeping itself alive? I define a universal way to consider this by fixing an optimal prefix-free Universal Turing Machine  as our reference frame, allowing us to treat any organism as a computational process. It is defined by the following metrics:

  • Information:  (invariant up to ) and  (symmetric up to [13]).  is the ultimate compression limit, while  measures heredity.
  • Capacity: . This represents the theoretical ceiling of problem-solving output for an -size system before its time budget runs out. UTM simulation overhead is , preserving regime classifications.
  • The Ceiling (): As  becomes the Busy Beaver function , which is non-computable and dominates all computable bounds.[4][5] The strict hierarchy  means that the gap between any computable time bound and the theoretical ceiling is where the regime dichotomy operates.
  • Logical Depth: The minimum runtime of any near-shortest program for .[14] Per the Slow Growth Law, deep objects cannot be quickly produced from shallow ones, distinguishing the evolved complexity of a genome from the random complexity of a gas.

The Generational Model: Each generation of a self-replicating system is a halting computation: , where  is the offspring program and  is the productive output with . The lineage continues through ; each generation halts.

The agent must allocate a portion of its description to the specification of  (to satisfy the preservation constraint), that portion is strictly subtracted from the resources available to compute . This partitioning establishes a hard upper bound on the system’s potential output.

Theorem 2.1 (The Productivity Bound). For a self-replicating system of total description length  with replication overhead , operating under a uniform environmental time budget :

Proof. Both the integrated replicator and a differentiated soma of the same total size exist in the same environment and experience the exact same external time budget . The integrated program encodes replication machinery ( bits) and productive computation ( bits). Its productive output is therefore a halting computation on an effective program of  bits, running within  steps, bounded strictly by 

Please note that the superscript denotes that the time budget is , which is the global environmental clock evaluated at the system's total physical size . This is physically correct because the environment allocates time based on the organism's macroscopic size and niche, not its internal bit allocation.

2.1 The Regime Dichotomy

To characterize this tax we must constrain the conceptual Turing machine to a physically realistic model. I do this by modeling the agent as a Linear Bounded Automaton (LBA) with internal tape length , augmented with a standard write-only output tape to permit macroscopic output that scales beyond the internal memory limit. This confines the program and working data to the exact same finite substrate, adequately modeling cells with finite genomes or digital organisms with allocated RAM.

With this constraint, the preservation mechanism becomes a fixed-cost partition. Exactly  bits of the substrate are frozen (read-only), they are permanently occupied by the recoverable self-description, which leaves exactly  bits for working computation. This finiteness changes the bottleneck from time to space. A system with  writable bits is strictly bounded by its configuration space of  distinct states. Once the external time budget  exceeds this limit, the system saturates; it exhausts its non-repeating capacity and must either halt or cycle.

This yields the persistence ratio under the uniform environmental clock :

The critical difference from a naive formulation is that both the numerator and denominator evaluate the time budget at the exact same argument , because both architectures inhabit the same environment and experience the same generation time. The severity of the persistence tax depends entirely on whether the environment's time budget exceeds the system's internal configuration space.

From the physical model above, I derive the main result: the severity of the persistence tax depends entirely on whether the environment's time budget exceeds the system's internal configuration space. This creates a sharp phase transition rather than a continuous decay.

Theorem 2.1 (The Memory-Bound Phase Transition). Let  be the uniform environmental time budget. The persistence ratio undergoes a sharp phase transition:

  • (a) The Free Regime (): The environmental time budget  is strictly smaller than the integrated agent's  configuration space. Time binds computation before memory constraints are reached. Both architectures exhaust the time limit identically. . The replication tax is exactly zero.
  • (b) The Transition Zone (): The integrated agent hits its spatial ceiling (), but the unconstrained soma does not. The ratio is . Because  is a structural constant relative to , the width of this transition zone () strictly vanishes to zero as .
  • (c) The Taxed Regime (): The environmental time budget exceeds the configuration-space limits of both architectures. Both systems exhaust their internal memory. The environment offers excess time, but neither system has the configurational degrees of freedom to exploit it. The ratio homes instantly to the structural floor: .

Proof. Follows directly from evaluating the piecewise limits of the uniform clock  against the LBA state-space limits. Time acts as the strict binding constraint until  exceeds the available address space, at which point output is strictly bound by geometry. 

Note: the LBA model governs physically realizable results. The unbounded Turing machine model is used solely for the incomputable ceiling to establish the theoretical limit.

2.2 Finite Memory, Computability, and the Physical Ceiling

One might intuitively assume that giving an agent a computable super-exponential time budget (e.g., ) would cause the persistence ratio to collapse to zero, but this is a mathematical illusion.

If  is any computable function, the algorithm required to compute it has a Kolmogorov complexity of . For sufficiently large , both the -bit soma and the -bit integrated agent possess vastly more memory than is required to encode the simple loop that counts to  and outputs a string of that length. Because both architectures can easily encode and reach the computable limit, their productive outputs both scale as , resulting in a ratio of .

This reveals a deep property: no computable physical environment can yield a uniform persistent penalty worse than the  saturation floor. The infinite collapse of the persistence ratio () strictly requires non-computability.

2.3 The Incomputable Ceiling

Even though I have established the limits of the persistence tax for realizable systems, I want to show the tax is an intrinsic property of self-reference. To do so I remove physical constraints and examine the system in the limit of infinite capacity by moving from the LBA to an unbounded Turing Machine. Here, the ratio is measured against the uncomputable Busy Beaver function .

Theorem 2.2 (Unbounded Collapse).

Proof. The Busy Beaver function  grows faster than any computable function.[5] If the ratio  were bounded by a constant , then , making  computably bounded by an exponential function which is a contradiction. Therefore, the ratio of productive capacity between size  and size  must be unbounded. Along the subsequence of  where these growth spikes occur, the inverse ratio  drives to 0. 

This establishes two fundamental truths:

  1. The hierarchy has no top. No computable time bound can uniformly contain the persistence penalty. At every level of resource availability, there exist description lengths where the tax spikes arbitrarily high.
  2. There is entanglement with incomputability. In general, you cannot compute exactly how much productive capacity a specific replicator sacrifices because doing so requires computing .

2.4 Information Closure and Noise

The previous results treated the replication overhead  as a fixed constant. However, in physical environments, noise is an active adversary. To persist, the system must not only copy itself but correct errors. This makes  a dynamic function of the environmental noise level .

1. The Cost of Accuracy: We define the noise-dependent overhead as , where  represents the descriptive complexity of the physical error-correction machinery required to suppress noise.

While the mathematical algorithm for an optimal error-correcting code (e.g., a polar code[15]) might be  bits, the biological machinery required to physically execute it (proofreading enzymes, mismatch repair proteins, and recombinational hardware) is massive. Furthermore, Eigen’s Paradox[16][17] creates a deadly feedback loop. The genome must encode the repair machinery, but the machinery must copy the genome (including its own instructions). If the noise  approaches a critical threshold , the required machinery becomes too large to be copied faithfully. At this point, , and the cost of persistence becomes infinite.

2. The -Closure Formulation: I translate the concept of informational closure[18][19] to the algorithmic level. A system achieves -fidelity if its future state is algorithmically determined by its current state:

for a fidelity parameter . If the fraction of complexity preserved drops below , the system has disintegrated. For physical systems, verifying -closure via Shannon entropy is an acceptable proxy because Shannon information perfectly approximates the average algorithmic information for data drawn from computable distributions.[13][20]

3. The Forbidden Zone

The previous section established that there is a productivity tax on persistence. This section demonstrates that this tax is not just a reduction in efficiency, but a fundamental restriction on reachability of strategies within the total space of possible behaviors. 

The preservation constraint divides strategies into two regions: those compatible with self-replication, and those that are not. Reserving  bits structurally confines the replicator to a lower-dimensional subspace. The complement of this subspace is the Forbidden Zone: strategies syntactically described by a system of size , but physically not executable by any system that must persist as an integrated whole.

3.1 The Subspace Exclusion Principle

The size of the Forbidden Zone can be quantified by comparing the volume of the full strategy space  to the volume of the replication-compatible subspace .

Theorem 3.1 (The Subspace Exclusion Principle).

The ratio of the full strategy space to the replication-compatible subspace is strictly and unconditionally:

Proof. The class of partial functions computable by prefix-free programs of length  has cardinality at most . The integrated agent, whose effective program length is , can therefore access at most  distinct computable strategies. The unconstrained agent, with program length , can access up to . The ratio of these upper bounds is:

Therefore, the full strategy space is  times larger than the replication-compatible subspace. At least  of all computable strategies of size  are structurally inaccessible to the integrated replicator. This bound is unconditional and environment-independent. 

The Forbidden Zone. The set  are strategies requiring the full -bit capacity. For every one strategy an integrated replicator can execute, there are  strategies of the same total size that are permanently foreclosed.

In Harvard-like architectures where program memory (genome) is physically separate from working memory (proteome), the constraint operates on control program expressiveness. The Forbidden Zone persists because the  bound applies unconditionally regardless of architecture.

3.2 Don’t Go There! When Does the Forbidden Zone Matter?

The relevance of the Forbidden Zone depends on the environment and particularly whether the fitness landscape peaks in the excluded region. I identify three sufficient conditions where it does.

Condition A: Destructive Strategies. If executing a strategy  requires irreversible modification or deletion of information necessary for reproduction, then . An integrated agent cannot implement  because that would destroy the self that defines it. In biology there are numerous instances. Three examples:

  • V(D)J Recombination: B-cells physically cut and paste gene segments to create antibodies with high specificity, permanently deleting the intervening DNA to build their combinatorial antibody repertoire.[2][3]
  • Enucleation: Mammalian erythrocytes eject their entire nucleus to maximize hemoglobin volume, a strategy that is not possible for a cell that retains its genome for future division.
  • Apoptosis: In digital evolution experiments within Avida (an artificial life software platform), Goldsby et al.[21][22] demonstrated that division of labor evolves spontaneously under such pressures: when a task corrupts the replication template, the population splits into a clean germline and a sacrificial soma.

Even without destruction strategies, some problems are too complex to be solved by the reduced description space of the integrated agent.

Condition B: Descriptively Dense Strategies. A strategy  is descriptively dense if its Kolmogorov complexity  exceeds the payload capacity of the replicator: . Here, the integrated agent cannot compress the solution  into its available  bits, making the strategy unrepresentable (), so again .

An example here from biology is the developmental program used in the vertebrate body plan. Morphogenetic computation which involves coordinating billions of cell fate decisions likely requires a control program  that pushes the limits of the genome's capacity . If , the loss of  bits to replication machinery may render the full developmental program inaccessible to an integrated system.

I should note that even for standard, non-destructive problems (i.e. most biological traits like metabolism, color vision, etc. don’t destroy the genome), the integrated agent loses.

Condition C: Probabilistic Exclusion (The Mild Forbidden Zone). Even if a solution  is compact enough to fit in the integrated agent's workspace () and non-destructive, the integrated agent faces a catastrophic structural disadvantage.

Shrinking the search space by  does not make the landscape sparser, because both the number of targets and the volume shrink proportionally. The true penalty is structural absence. Let  be the total number of optimal solutions uniformly distributed across . The expected number inside the restricted subspace is .

When , as is generically the case for complex phenotypic traits,  and the probability that the restricted subspace contains zero solutions is . The integrated agent does not face a slower search; it faces the overwhelming mathematical certainty that its reachable subspace is entirely barren.  due to structural absence, while  remains finite.

4. Architectural Comparison: The Discovery Time Theorem

In the last two sections I established that self-replication imposes a tax on productivity and the Forbidden Zone excludes agents from a vast region of strategy space. I now use these findings to operationalize and compare two fundamental architectures of life: those that are Integrated (every agent carries its constructor, like bacteria) and Differentiated (a germline retains replication and constructs mortal somatic units, like multicellular organisms).

4.1 The Rate Advantage (Resource Efficiency)

One straightforward consequence of replication overhead is a throughput penalty. For finite-resource environment every bit allocated to the constructor  is a bit not available for the search payload .

Definition 4.1 (Resource-Constrained Search). This is a persistent query system consisting of agents searching a fitness regime  under a complete resource budget of  per generation. For Integrated Agents their description length is . Differentiated Agents (somatic units) have description length  (they have no replication machinery) and  is the per-unit coordination overhead.

Theorem 4.2 (Linear Rate Advantage). The asymptotic ratio of throughput between optimally differentiated () and optimally integrated () architectures is:

Proof. For the Integrated system, each agent costs . The maximum population is , yielding throughput . For the Differentiated system, the germline costs  (paid once). The remaining budget  is spent on somatic units costing . Throughput is . As . Dividing the limits yields 

If we assume the somatic units perform the full search task where , this simplifies to .

This result demonstrates that the architectural trade-off is a matter of resource efficiency. In the ideal case, where coordination costs are negligible (), the advantage reduces to a factor of approximately . It has long been posited in evolutionary theory that fitness tradeoffs between reproduction and viability are the factors that drive specialization,[23][24] but Theorem 4.2 provides a precise algebraic basis for this notion. However, a constant-factor speedup is computationally insufficient to explain the universality of the Weismann barrier in complex life. For complex life a transition of this magnitude requires a stronger force than simple optimization, it demands complete algorithmic necessity.

There is a critical nuance I should mention regarding somatic division: although somatic cells (like the skin or liver) divide mitotically to fill the body, this represents an amplification step within a single generation rather than a persistence step across generations. Because somatic lineages do not need to maintain indefinite information integrity they can tolerate mutation accumulation and telomere erosion because the lineage terminates with the organism's death. Consequently, somatic replication avoids the high fidelity premium of the germline, which is why  is structurally far cheaper than .

4.2 The Combined Discovery Time

Now having quantified the linear penalty of carrying the replication machinery, I examine the computational cost of preserving it.

Theorem 4.3 (Discovery Time by Regime). Let  be a search problem with optimal solution . The ratio of expected discovery times between Integrated () and Differentiated () architectures depends strictly on where  lies in the strategy space:

  • (a) The Shallow Zone (Optimization): If  is non-destructive and compact (), both architectures can implement the solution. The differentiated agent wins only by its throughput advantage.

    Here, differentiation is merely an optimization (a constant factor speedup). This applies to simple adaptive problems like metabolic optimization or chemotaxis. Consequently, unicellular life (integrated architecture) dominates these niches due to its simplicity.
  • (b) The Forbidden Zone (Necessity): If  is destructive or descriptively dense (), the integrated agent is structurally incapable of implementing .

    In this case, differentiation is computationally necessary. This applies to uniquely multicellular problems like V(D)J recombination. Their existence in complex organisms confirms that the Weismann barrier is a mathematical response to the computational necessity of destructive search.
  • (c) Probabilistic Exclusion Zone: If  is technically reachable () and non-destructive, but optimal solutions are rare (), shrinking the search space by  drops the expected number of solutions in the restricted subspace to , giving probability  that the subspace is entirely barren.

4.3 The Biological Regime: A Tale of Two Subsystems

The mathematical framework of discovery time is parametric in  and makes no reference to molecular biology. It applies to any computational substrate where a persistent constructor must maintain its own description while executing a search. This recapitulates at the algorithmic level what Dawkins's Extended Phenotype[25] describes biologically.

Different subsystems within a single organism inhabit distinct computational regimes. The germline operates primarily in the Polynomial Regime: DNA replication is a mechanical construction task that scales polynomially. In this regime, the computational tax is negligible. The soma operates in the Exponential Regime: complex adaptation, immune search, and neural computation involve combinatorial search over high-dimensional spaces. The Weismann barrier[1] maps exactly onto this computational boundary: it sequesters the germline in the safe polynomial regime while freeing the soma to operate destructively in the risky exponential regime.

The Functional Density Constraint : The "C-value paradox" demonstrates that raw genome size  is a poor proxy for search dimension . The pressure toward differentiation is absolute only when functional density : informationally dense genomes facing high-dimensional search problems.

5. The Architectural Dominance Conjecture

I have established two distinct advantages for the differentiated architecture: a linear Rate Advantage (efficiency) and an infinite Reach Advantage (feasibility). I now synthesize these findings into a unified conjecture that predicts the transition between unicellular and multicellular life. The core insight is that these advantages are not fixed, instead they scale differently with problem complexity.

Conjecture 5.1 (Architectural Dominance).

Consider a persistent replicator facing a search problem  over . The dominance of the differentiated architecture over the integrated architecture progresses in stages based on problem complexity:

  • (a) Rate Dominance (Proven): For simple problems, the differentiated architecture achieves a strictly higher query throughput by a factor of . If , integrated architectures are locally optimal due to implementation simplicity. In simple environments (e.g., bacterial competition for glucose), differentiation offers only a constant-factor speedup. If , this advantage is negligible, allowing integrated agents to remain competitive or even dominant due to their simpler implementation.
  • (b) Reach Dominance (Proven): If  contains solutions requiring destructive modification, the integrated architecture hits a hard algorithmic barrier (), while the differentiated architecture can solve it. This is the "Hard" Forbidden Zone. Certain biological functions are physically impossible for a cell that must remain totipotent.
  • (c) Probabilistic Dominance: For search problems where optimal solutions are rare (), the integrated architecture faces a probability approaching 1 that its reachable subspace contains exactly zero solutions.
  • (d) Threshold Existence: There exists a critical boundary at the exact transition from polynomial to exponential computational demands where the advantage shifts from linear efficiency to complete mathematical necessity. The Weismann barrier is the physical, architectural response to crossing this mathematical boundary.

In summary, the Weismann barrier is the architectural response to crossing this boundary. It is not just a biological optimization, but rather a computational phase transition required to access the high-complexity regime of the fitness landscape.

5.1 Limitations

There are numerous open questions that this framework does not address, but that would be highly useful to answer with experimental data or additional theoretical work. I am very grateful to Tatyana Dobreva for suggesting a number of interesting questions along these lines, including:

  • How does the immortal jellyfish (T. dohrnii) prove or disprove the ideas presented? Do epigenetic marks survive transdifferentiation?
  • How does the "memory" that some plants retain of droughts through epigenetic modifications play into the ideas here? I assume that these modifications would not violate the Preservation Constraint, and it is fine for information to transfer between the soma and germline, but it would be better to have clarity on this type of situation and how exactly it fits (or doesn't.)
  • In general, what do we learn by understanding this concept as a computational necessity rather than a biological optimization? I think, but really am not sure, that this essay suggests the Weismann barrier is the only type of architecture that can accommodate complex organisms, rather than it being one of many solutions evolution came up with. This would also suggest we can't escape death. Following from that, we should expect to see any complex thing die as well (not just biological life.) Our bodies are also not just a gene protectors, but they exist because we need to do complex calculations that require destruction.

These are just a few of the open questions, research ideas, and some random thoughts I had to answer them. They are interesting and complex topics that deserve more work.

6. Conclusions

The unfortunate sperm whale from The Hitchhiker's Guide to the Galaxy joins the universe for a brief explosion of complex cognition ending in another sudden, and more unfortunate, explosion. In a way this is the exact same thing we have shown in the paper: according to the mathematics of self-replication it is the precise and necessary shape of any higher intelligence.

I have shown that the price of existence is a computational tax. In formalizing the preservation constraint, which is the absolute necessity that a replicator must perfectly protect its own description while acting, I found that self-replication is not merely a metabolic burden. Instead it is a structural prison. The Forbidden Zone is a mathematical fence defined by the limits of computations rather than a biological accident.

I think this result suggests an inversion of how we view multicellularity. If this paper is correct, then the Weismann barrier is not an evolutionary adoption that evolved to prevent mutational load, rather it is a necessary computational escape valve. The reason that life split into an immortal germline and a mortal soma is because it was the only physical way to solve the universe's hardest problems. To solve these problems it is necessary to build an architecture that is not burdened by the requirement of surviving them.

It is important to note that this logic is substrate-independent. It strictly bounds any complex, evolving system, whether that is a biological, digital, or synthetic entity. It also predicts that any entity facing the exponential regime of problem-solving must eventually separate a protected persisting germline (or germline adjacent concept) and a disposable soma-like structure(s).

An interesting implication of this is that AI should hit this same identical preservation tax. (Note: I am not implying this necessarily has any relevance to safety arguments.) For an AGI to maximize its own intelligence without risking the corruption of its primary weights, or its fundamental alignment (whether the encoded ones or the one of the AI has chosen), the AGI must adopt this type of differentiated architecture. It will be forced to move its core algorithms in a frozen, immutable germline, while creating “mortal”, and highly complex, sub-agents to explore the deepest mysteries of the Forbidden Zone. An amusing conclusion is that if AGI doesn’t kill us, we might identify AGI when it starts killing parts of itself!

In one sense immortality is computationally trivial. Bacteria have pulled it off for billions of years. But anything complex that wants to do interesting and hard things in this universe must be able to address state spaces of such exceptional combinatorial complexity that the self must be sacrificed to explore them. 

From this perspective, death is not an error in the system. In fact, it is the computational technology that lets intelligence exist. It’s a tough pill to swallow, but we are smart only because we have agreed to die.

  1. ^

    Weismann, A. (1893). The Germ-Plasm. Scribner's.

  2. ^

    Tonegawa, S. (1983). Somatic Generation of Antibody Diversity. Nature, 302, 575–581.

  3. ^

    Schatz, D. G. & Swanson, P. C. (2011). V(D)J Recombination: Mechanisms of Initiation. Annu. Rev. Genet., 45, 167–202.

  4. ^

    Chaitin, G. J. (1975). A Theory of Program Size Formally Identical to Information Theory. JACM, 22(3), 329–340.

  5. ^

    Rado, T. (1962). On Non-Computable Functions. Bell System Technical Journal, 41(3), 877–884.

  6. ^

    Von Neumann, J. (1966). Theory of Self-Reproducing Automata. (A. W. Burks, Ed.). Univ. Illinois Press.

  7. ^

     Kleene, S. C. (1952). Introduction to Metamathematics. North-Holland. (Thm. XXVI, §66).

  8. ^

    Rogers, H. (1967). Theory of Recursive Functions and Effective Computability. McGraw-Hill.

  9. ^

    Penrose, L. S. (1959). Self-Reproducing Machines. Scientific American, 200(6), 105–114.

  10. ^

    Langton, C. G. (1984). Self-Reproduction in Cellular Automata. Physica D, 10(1–2), 135–144.

  11. ^

    Kabamba, P. T., Owens, P. D. & Ulsoy, A. G. (2011). Von Neumann Threshold of Self-Reproducing Systems. Robotica, 29(1), 123–135.

  12. ^

    Prusiner, S. B. (1998). Prions. PNAS, 95(23), 13363–13383.

  13. ^

    Li, M. & Vitányi, P. (2008). An Introduction to Kolmogorov Complexity and Its Applications (3rd ed.). Springer.

  14. ^

    Bennett, C. H. (1988). Logical Depth and Physical Complexity. In The Universal Turing Machine (pp. 227–257). Oxford.

  15. ^

    Arıkan, E. (2009). Channel Polarization. IEEE Trans. Inf. Theory, 55(7), 3051–3073.

  16. ^

    Eigen, M. (1971). Selforganization of Matter. Naturwissenschaften, 58(10), 465–523.

  17. ^

    Eigen, M. & Schuster, P. (1977). The Hypercycle. Naturwissenschaften, 64(11), 541–565.

  18. ^

    Bertschinger, N., Olbrich, E., Ay, N. & Jost, J. (2006). Information and Closure in Systems Theory. In Explorations in the Complexity of Possible Life (pp. 9–19). IOS Press.

  19. ^

    Krakauer, D. et al. (2020). The Information Theory of Individuality. Theory in Biosciences, 139, 209–223.

  20. ^

    Grünwald, P. & Vitányi, P. (2004). Shannon Information and Kolmogorov Complexity. arXiv:cs/0410002; see also Grünwald, P. & Vitányi, P. (2008). Algorithmic Information Theory. In Handbook of the Philosophy of Information (pp. 281–320). Elsevier.

  21. ^

    Ofria, C. & Wilke, C. O. (2004). Avida: A Software Platform for Research in Computational Evolutionary Biology. Artif. Life, 10(2), 191–229.

  22. ^

    Goldsby, H. J., Dornhaus, A., Kerr, B. & Ofria, C. (2012). Task-switching costs promote the evolution of division of labor and shifts in individuality. PNAS, 109(34), 13686–13691.

  23. ^

    Buss, L. W. (1987). The Evolution of Individuality. Princeton.

  24. ^

    Michod, R. E. (2007). Evolution of Individuality During the Transition from Unicellular to Multicellular Life. PNAS, 104(suppl. 1), 8613–8618.

  25. ^

    Dawkins, R. (1982). The Extended Phenotype. Oxford.



Discuss

Hazardous States and Accidents

2026-02-13 21:02:19

Published on February 13, 2026 1:02 PM GMT

Root cause analysis is a crap technique for learning from failure. To see why, we need to know some fundamentals first. These are good to know for anyone designing anything they want to be reliable.

A hazard is an accident waiting to happen

In safety-critical systems, we distinguish between accidents (actual loss, e.g. lives, equipment, etc.) and hazardous states (sometimes called only “hazards”). If we say that  stands for hazardous state,  for environmental conditions, and  for accident, then the equation is

This says that an accident requires both unfavourable environmental conditions, and that the system is in a hazardous state. As a consequence,

  • If a system sits in a hazardous state, it can be driven into an accident by bad environmental conditions.
  • But conversely, the system can sit in a hazardous state for a long time without accident if the environmental conditions are good enough.

Since we can only control the system and not its environment, we achieve safety by avoiding hazardous states.[1]

Example from aviation

There was recently a commercial flight that made the news because they landed with less than 30 minutes of fuel in its tanks. Many people wondered why this was a big deal, because it sounds like the system was working as intended: there was a reserve, it was needed, and it was used. End of story?

The thing to realise is that landing with less than 30 minutes of fuel is a hazardous state for commercial jets. If a jet lands with less than 30 minutes of fuel, then it would only have taken bad environmental conditions to make it crash, rather than land. Thus we design commercial aviation so that jets always have 30 minutes of fuel remaining when landing. If they don’t, that’s a big deal. They’ve entered a hazardous state, and we never want to see that.

Example from child's play

One of my children loves playing around cliffs and rocks. Initially he was very keen on promising me that he wouldn’t fall down. I explained the difference between accidents and hazardous states to him, and he realised slowly that he cannot control whether or not he has an accident, so it’s a bad idea to promise me that he won’t have an accident.

What he can control is whether or not bad environmental conditions lead to an accident, and he does that by keeping out of hazardous states. In this case, the hazardous state would be standing less than a child-height within a ledge when there is nobody below ready to catch. He can promise me to avoid that, and that satisfies me a lot more than a promise to not fall.

Maintaining constraints is a dynamic control problem

Hazardous conditions, as we have seen, are defined by constraints. To stay out of hazardous conditions, we have the system maintain such safety constraints. In general, though, the environment often tries to tip the system into breaking these constraints, and it often does this in unpredictable ways. This means we cannot declare in advance a sequence of steps the system should follow that will always maintain constraints.

Instead, maintaining constraints is a dynamic control problem. There are multiple controllers interacting with the system to try to keep it out of hazardous conditions. They observe feedback, i.e. information on where the system is now; they execute mental models, i.e. run simulations of where the system is going in the future; and then they issue control actions, i.e. try to adjust the system to maintain constraints based on their predictions.

Whenever a system enters a hazardous condition, it is because there were problems with the control structure, specifically one of the three components listed above::

  • Feedback to controllers can be insufficient, which means the controllers do not understand what is going on with the system at some specific moment.
  • Mental models can be insufficient, which means the controllers understand what’s going on with the system, but they are unable to predict something that will happen in the future.
  • Control actions can be insufficient, which means the controllers know what they need to do to the system to maintain constraints, but it does not have an effect of the desired strength.[2]

We can also see combinations of these problems. When all three of them are problematic, we might actually be looking at an entire controller missing that should be present.

Controllers exist on all levels. For aircraft maintaining fuel constraints, controllers include the fadec inside the jet engines, the flight management computer, pilots, ground crew, dispatchers at the airline, training programmes for pilots, air traffic controllers, as well as national and international regulatory boards.[3]

Low-level controllers are often automated, in hardware or software. High-level controllers are often social, cultural, and legal in nature.

Predicting hazardous states is easier than accidents

Accidents in safety-critical systems can look like a one-off freak occurrences that would be impossible to predict.[4] This is because in order for an accident occur, not only do we need bad environmental conditions, but also multiple controllers must have been unable to maintain safety constraints. The combination seems unlikely. However, by thinking in terms of hazardous states instead of accidents, we get the benefit that hazardous states are easier to predict.

Think of any common technology, like the car. We can probably rattle off several constraints we’d like it to maintain, some fairly mundane. Our car must not start an uncommanded turn, for example. One of the controllers maintaining this constraint is positive stability in the turning axis: if we let go of the steering wheel on flat ground it will return back to the centre position over time. This ensures small bumps only put us slightly off course, at which point another controller kicks in: the driver makes a small adjustment to change the course back to what it was.[5]

We don’t have to actually witness a car crash caused by an uncommanded turn to realise it would be a bad thing if a car started an uncommanded turn. Now we can continue to work on our controllers – why does the turning axis have positive stability? Can that fail? Sure it can, if tyre pressures are unequal. That’s another constraint we can design control structures around, and so on.

Analysing hazards as accidents

Further benefits of thinking about hazardous states rather than accidents is we don’t have to wait for an accident to occur before we improve the safety of our system. Being unable to maintain constraints is already a safety problem and should be analysed whether or not environmental conditions were on our side that day, i.e. whether it turned into an accident or not.

This might seem obvious. If we had designed a car that started a sudden uncommanded turn, we wouldn’t wait for it to injure someone before we addressed the problem. But I often see people – especially in the software industry – paper over near misses as long as nobody got hurt. The aviation industry is not like that. You bet safety boards will issue reports on the flight landing with less than 30 minutes of fuel.

More on safety and systems theory

The ideas covered in this article mainly come from a systems theory perspective of safety. One of the central figures in promoting that perspective is Nancy Leveson. I’m a huge fan of her work, among others, the books Engineering a Safer World, the CAST Handbook, and the STPA Handbook. The issue with these is that they’re (a) not well known, and (b) quite dense and filled with decades of Leveson’s experience.

The linked article then goes on to list some more things related to this I eventually want to cover with my writing, but this is probably a good place to stop for an LW linkpost.

 

  1. ^

    If we try to prevent accidents while not paying attention to hazardous states, we are effectively placing our trust in the environment being on our side. Many people do this, and it can be successful for quite some time, but it always fails at some point.

  2. ^

    This could be because the effect is too weak – or too strong!

  3. ^

    For my child among rocks, controllers include their balance, their strength, their extremely limited sense of self-preservation, my instruction, my supervision, the places I decide to take us, etc.

  4. ^

    What are the chances that a flight encounters delay enroute, then has to make multiple landing attempts at the intended destination including delays there, diverts, is unable to land at the alternate, and has quite far to go to a tertiary airport?

  5. ^

    In some cars, another automated layer takes over before the driver: software lane keeping assistance can perform that correction.



Discuss

Systemic Risks and Where to Find Them

2026-02-13 18:51:10

Published on February 13, 2026 10:51 AM GMT

Or: Todd Has a Presentation in London on Thursday and Three Academics (Some of Them Dead), Won't Stop Arguing About Root Fungi

(The story follows the one in Seeing Like A State but applies a systemic perspective on AI Safety)

Epistemic Status: Written with my Simulator Worlds framing. E.g I ran this simulated scenario with claude in order to generate good cognitive basins, I then orchestrated it to play out a simulated scene with my instructions (with some changes for better comedic effect). This post is Internally Verified (e.g I think most of the claims are correct with 70-85% certainty).


The headset smells like someone else's face.

"Just put it on, Todd."

"Sandra, it truly—"

"I know. Put it on. You're presenting to the Science and Technology Select Committee (UK) on Thursday about systemic risks from frontier AI and you currently think systemic risk means 'a risk that is big.'"

"That is absolutely not—"

"You said that. In the pre-brief. I wrote it down. I'm going to have it framed."

Sandra has worked at the Department for Science, Innovation and Technology for twenty-three years. She once corrected a visiting researcher from the Santa Fe Institute on his own citation and he sent her flowers. She has opinions about management cybernetics that she shares with nobody because nobody asks. She is paid less than the office coffee budget.

Todd was a postman in Swindon until eighteen months ago. His mate Dave got him the job.

"I've got forty-seven documents to fill in for the committee. Forty-seven. They've got boxes. I understand boxes. I'm good at boxes."

"The boxes are wrong."

"The boxes are government-mandated"

"Still wrong. Headset. Now."

Introduction

He's in a forest.

It takes a moment. The conference room doesn't so much disappear as get gently shouldered aside by something much older. And then Todd is standing on soft ground, in cold air, surrounded by trees.

Except — and it takes him another moment to understand why it feels wrong — the trees are in rows. Perfect rows. Identical trees, identical spacing, stretching in every direction until the geometry gets bored and fades into mist. Norway spruce. He knows this because a small label is floating beside the nearest trunk like a museum placard: Picea abies. Planted 1820. Yield-optimised monoculture.

The ground is bare. Not the interesting kind of bare, with moss and leaf litter and the promise of hidden things — just dark, flat, dead soil. No undergrowth. No ferns. No birds. Nothing moving. The air tastes of resin and something chemical he can't place.

A yield-optimised spruce monoculture in Germany. Every tree individually excellent. The forest is dying.

"Hello?" says Todd.

Nothing.

He walks between the rows. His footsteps sound wrong — too clean, too isolated, as if the forest has nothing to absorb them. He touches a trunk. The bark feels thin. Papery. Like something that's been alive for a long time but has recently started to forget how.

"This is horrible," he says. "Why is this horrible? It's a forest. Forests are nice."

Sandra's voice in his earpiece: "It's not a forest. That's the point. Keep walking."

He walks. The rows repeat. The silence repeats. It's like being inside a spreadsheet that grew bark.

"Sandra, why am I here? I have documents. I have work to do, how the hell is this related to a bloody forest in the middle of nowhere?”

Todd starts muttering his mantra he has developed for the last few weeks

“AI capability leads to risk factor, risk factor leads to potential harm, you evaluate the capability, assess the risk, mitigate the harm. A, B, C. It's clean. It makes sense. It fits in the boxes."

“Todd, you’re doing it again!”

“Sorrrryyyy…”

"Now, the obvious follow up question is whether your framework describes a forest?"

“Why would I need to answer that?”

“Todd, does it describe a forest?”

"It doesn't need to describe a forest, it needs to describe—"

"Does your A-B-C framework describe how this forest dies?"

Todd stops walking. He looks at the trees. At the bare soil. At the thin bark that's starting, now that he's paying attention, to peel at the edges. At the silence where birdsong should be.

"How does a forest die?"

"That's the right question. And that's why you're here."

Root Networks

Three people are standing in a clearing he could swear wasn't there thirty seconds ago.

Two of them are already arguing. The third is watching with the patient expression of a man who has seen this argument happen before and knows exactly when to intervene.

The one in tweed sees Todd first. "Ah! You're the governance chap. James Scott. Political science. Yale. Dead, technically, but they made me from my books. Try not to think about it."

"I will absolutely think about it."

"This is Michael—"

"Michael Levin, developmental biology, Tufts, not dead, I run the company that built this VR thing, Levin Enterprises, sorry about the headset smell—"

"And I'm Terrence Deacon, anthropology, Berkeley, unclear if dead, the simulation team had conflicting information and frankly I find the ambiguity productive—"

"Right," says Todd. "Great. I'm Todd. I work in AI governance. I was a postman. I have a presentation to the Science and Technology Select Committee on Thursday. I need to know what a systemic risk actually is, and I need to know it in words that don't require a PhD to understand, and I need to know it by Wednesday at the latest because I have to practice the slides on the train."

Scott gestures at the trees. "This is a systemic risk."

Todd looks around. "This? A forest?"

"This specific forest. What you're standing in is the result of a decision made by the Prussian government in 1765. They looked at Germany's forests — old growth, hundreds of species, tangled, messy, full of things doing things they couldn't name or measure — and they saw waste. They wanted timber. So they cleared the old forests and planted these. Single species. Optimal spacing. Every tree selected for maximum yield."

Todd waits. "And?"

"And it worked. For one generation, these were the most productive forests in Europe. The Prussians had cracked it. Scientific forestry. Rational management. Every tree individually perfect."

"So what went wrong?"

This is where it happens. Levin can't contain himself any longer. He's been rocking on his heels and he breaks in like a man whose entire career has been building toward this specific interruption.

"What went wrong is that they thought the forest was the trees. But the forest isn't the trees. The forest is the network. The mycorrhizal—"

"The what?"

Sandra, in Todd's ear: "Fungal internet. Roots connected underground by fungi. Trees share nutrients and chemical warning signals through it. Like a nervous system made of mushrooms."

"—the mycorrhizal networks connecting every root system to every other. The pest predators living in the undergrowth. The soil bacteria maintaining nutrient cycles. The entire living architecture that the Prussians classified as 'mess' and removed. Because their framework — their evaluation framework, Todd — measured individual trees. Height, girth, growth rate, timber yield. And every individual tree was excellent."

"But the system—"

"The system was dying. Because the things that made it a system — the connections, the information flows, the mutual support — weren't in any individual tree. They were in the between. And the between is exactly what the evaluation framework couldn't see."

As Levin speaks, the VR does something Todd isn't expecting. The plantation dissolves backward — rewinding — and for a moment he sees what was there before. The old-growth forest, not a grid but a tangle. Trees at odd angles, different species, different ages, connected below the surface by a dense web of orange lines — the mycorrhizal network rendered visible, a living architecture of staggering complexity where every tree is linked to every other through branching fungal pathways.

Then the VR plays it forward. The old growth is cleared. The network is severed. The grid is planted. And the orange connections simply stop.

Left: the old-growth forest. The orange web is the mycorrhizal network — the connections that made it a living system. Right: the yield-optimised plantation. Same trees. No network.

Todd stares at the two images hanging in the air. The left one dense with orange connections. The right one bare.

"The dashboard says everything's fine," he says, looking at the grid.

"The dashboard measures trees," says Sandra.

Deacon, who has been standing very still — which Todd is learning means he's about to make everything more complicated — steps forward.

"The reason this matters — and this is crucial, Jim, because you always tell this story as 'they removed biodiversity' and that's true but it's not deep enough—"

"Oh here we go," mutters Levin.

"—is that the forest's living architecture wasn't just useful. It was organisational. The mycorrhizal network was the forest's information processing system. Warning signals about pest attacks propagating through the root network. Resources redistributed from healthy trees to stressed ones. The forest was performing a kind of distributed computation, and it was organised around constraints that existed in the relationships between species, not in any individual species."

"What kind of constraints?" says Todd, because he is paid to ask questions even when he suspects the answers will make his headache worse.

"The kind that don't physically exist anywhere but shape the dynamics of everything. The forest had a collective goal — maintaining its own viability — that wasn't located in any tree, wasn't programmed into any root, wasn't specified by any forester. It emerged from the network. It was, if you'll permit me the term—"

"Don't say it," says Levin.

"—teleological."

"He said it."

"TELEOLOGICAL behaviour! Goal-directed! The forest-as-a-whole was navigating toward stable states that no individual tree was aiming for, and the navigation was happening through the very networks that the Prussians couldn't see and therefore destroyed. This is not a metaphor for what's about to happen with AI governance. It is a structural description of the same failure mode."

Sandra: "Todd. Translation: the forest wasn't just a collection of trees. It was a living system with its own collective behaviour that emerged from the connections between trees. The Prussians' framework measured trees. The system failed at the level of connections. Their dashboard said everything was fine right up until the forest died. That's a systemic risk. Not A causes B causes C. The topology fails."

"And my risk assessment framework—"

"Measures trees."

Brasilia

The forest dissolves. Todd's stomach makes a formal complaint. When the world reassembles, he's floating above a city that looks like someone solved an equation and poured concrete on the answer.

Brasília. He recognises it from — actually, he doesn't know where he recognises it from. Maybe Sandra sent him something. She does that.

The monumental axis stretches to the horizon. Everything is separated into zones. Residential. Commercial. Government. Traffic flow calculated. Sight lines optimised. From above, it's either an airplane or a cross, depending on how much architecture school you've survived.

It's beautiful. It's also, somehow, the same kind of horrible as the forest. The same too-clean silence. The same absence of mess.

"Where is everyone?" says Todd.

"In the bits nobody designed," says Scott.

The VR pulls Todd down toward street level, and the city splits in two. On the left, the planned core holds still — wide boulevards cutting a perfect grid, identical blocks separated by calculated distances, streets so straight they look ruled onto the earth. On the right, a different city altogether. Streets that curve because someone needed to get to the bakery. Roads that fork and rejoin for no reason except that two neighbours built walls at slightly different angles. Buildings pressed against each other like passengers on the Tube. Markets spilling out of doorways. Laundry on balconies.

The grid is silent. The sprawl is alive.

Left: the city someone designed. Right: the city people built. Two and a half million people live in Brasília's satellite cities — the parts nobody planned. The parts that work.

"Oscar Niemeyer and Lúcio Costa," says Scott. "Designed a whole capital city from scratch in 1956 where they separated every function and optimised every flow. It was supposed to be the most rational city ever conceived with two hundred thousand people in the planned core."

"And the other bit?"

"Two and a half million. In the settlements nobody drew. With the corner shops and the street life and the walkable neighbourhoods and the community structures — all the things that make a city a city, and that the design optimised away because they weren't in the model."

"Because they're the between again," says Levin. "The city that works is the one that grew in the connections between the designed elements. It's developmental, Jim, I keep saying this — Costa thought he could specify the mature form of a city from initial conditions, but a city is a developmental system, it discovers its own organisation through—"

"Michael, not everything is embryology—"

"This IS embryology! A developing embryo doesn't work from a blueprint! The cells navigate toward the target form through local interactions! The collective discovers its own organisation! You can't specify a city from above any more than you can specify an organism from a genome—"

"The genome analogy breaks down because a city has politics, Michael, there are power dynamics—"

"Power dynamics ARE developmental! Morphogenetic fields are—"

"STOP," says Deacon, and even the simulation of James Scott shuts up. "You're both right and you're both being annoying about it. The structural point is this: the designed substrate — the plan, the mechanism, the genome — specifies constraints. What grows within those constraints has its own logic. Its own organisational dynamics. Its own emergent goals. You can design Brasília. You cannot design what Brasília becomes. That gap — between what you design and what grows — is where Todd's systemic risks live."

Todd has been looking at the two panels. The grid and the sprawl. One designed. One discovered.

"So the risk framework," he says, slowly, not because he's understanding but because he's starting to see the shape of what he doesn't understand, "measures the plan. It measures the mechanism. A causes B causes C. But the risk isn't in the mechanism. It's in what grows on the mechanism."

"Now show him the Soviet Union," says Sandra. "Before he loses it."

"I've already lost it."

"You're doing fine. Soviet Union. Go."

Central Planning

The geometry misbehaves. Todd arrives in a planning office that was either designed by M.C. Escher or generated by an AI that was asked to visualise 'bureaucratic hubris.' Staircases go in directions that staircases should not go. Input-output matrices cover blackboards that curve back into themselves. A portrait of Leonid Kantorovich — Nobel laureate, inventor of linear programming — hangs at an angle that suggests even the wall is uncertain about its commitments.

The three academics are already there, already arguing, already standing on different impossible staircases.

"—the Gosplan case is the purest example because they literally tried to specify every input-output relationship in an entire economy—"

"Sixty thousand product categories," says Scott. "Centrally planned. Targets set. Resources allocated. The entire Soviet economy as an optimisation problem."

"And it produced numbers," says Deacon, who is standing on a staircase that appears to be going both up and down simultaneously. "Beautiful numbers. Targets met. Production quotas filled. The official economy was a masterwork of engineering."

"And the actual economy?" says Todd.

"The actual economy," says Scott, and he's suddenly serious, the tweed-and-wine performance dropping for a moment, "ran on blat. Favours. Informal networks. Factory managers lying about their production capacity to create slack in the system. Shadow supply chains. Personal relationships doing the work that the plan couldn't do because the plan couldn't process enough information to actually coordinate an economy."

Levin groans. "Oh no. Are we doing Hayek? Jim, please tell me we're not about to do Hayek."

"We are briefly doing Hayek."

"Every libertarian with a podcast has done Hayek. The comment section is going to—"

"The comment section can cope. Todd, bear with me. This is the single most over-rehearsed argument in the history of economics, and I'm going to do it in ninety seconds, and the reason I'm doing it is that both sides got the punchline wrong."

"I don't know who Hayek is," says Todd, and Levin mouths lucky you behind Scott's back.

"Friedrich Hayek. Austrian economist. 1945. His insight — and I'm saying this with full awareness that it's been turned into a bumper sticker by people who've never read him — is that knowledge in an economy is distributed. The factory manager in Omsk knows things about Omsk that no planner in Moscow can know. The baker knows what her street needs. The engineer knows which machine is about to break. This knowledge isn't just difficult to centralise. It's impossible to centralise. There's too much of it, it's too local, it changes too fast, and half of it is tacit — people know things they can't articulate."

"So a central plan—"

"A central plan takes all those local nodes — thousands, millions of them, each processing local information, each connected to the nodes around them — and replaces the whole network with a single point. One red dot in Moscow that every spoke has to feed into and every instruction has to flow out from."

As Scott speaks, the VR renders the diagram on the blackboard. On the left, a distributed network — blue nodes connected by dense orange edges, information flowing locally between neighbours, no centre, no hierarchy, the whole thing humming with lateral connections. On the right, the same nodes rearranged into a spoke pattern, every connection severed except the line running to a single swollen red node at the centre. The orange peer-to-peer links reduced to ghost traces. Everything funnelled through one point.

Left: how knowledge actually lives in an economy — distributed, local, lateral. Right: what central planning requires — everything routed through one node. The red dot is not evil. It is simply overloaded. This has been pointed out before. You may have heard.

"And what happens," says Todd, "when there's too much information for one node?"

"It does what any cell does under metabolic stress," says Levin immediately. "It simplifies its—"

"Michael, it's an economy, not a cell—"

"It IS a cell! Or it's like a cell! The central planner is a cell trying to process the signalling environment of an entire tissue and it doesn't have the receptor bandwidth, so it defaults to—"

"What he's trying to say," says Scott, physically stepping between Levin and the blackboard, "is that the node makes things up. Not maliciously. It simplifies. It has to. It's one node trying to do the work of millions. So it uses proxies. Quotas. Targets. Tonnes of steel."

"Morphogenetic defaults," mutters Levin.

"If you say morphogenetic one more time I'm—"

"And the actual economy?" says Todd. "The one that needs, like, bread?"

"The one that needs bread in Omsk and ball bearings in Vladivostok routes around the bottleneck. Informally. Through blat. Through personal connections. Through the factory manager who calls his cousin instead of filing a requisition form. Through the orange connections that the plan says don't exist."

"So the shadow economy is—"

"—it's the lateral connections reasserting themselves," says Levin, who has apparently decided that if he can't say morphogenetic he'll find another way in. "This is what happens in regeneration too, when you sever a planarian and the remaining tissue has to re-establish communication pathways—"

"We are not," says Scott, "comparing the Soviet economy to a flatworm."

"I'm comparing the information architecture of—"

"He's actually not wrong," says Deacon, which makes both Scott and Levin turn toward him with matching expressions of suspicion. "The structural point holds. When you cut the lateral connections in any distributed system — biological, economic, social — the system either re-grows them informally or it dies. The Soviets got blat. A flatworm gets a new head. The mechanism is different. The topology is the same."

"Thank you, Terrence, that was very—"

"I'm not on your side, Michael. I'm saying you stumbled into the right structure using the wrong analogy. As usual."

Todd has been staring at the diagram on the blackboard. The dense orange network on the left. The hub-and-spoke on the right. Something is nagging at him.

"Hang on," he says. "The Hayek thing. The market thing. His answer was: replace the planner with price signals. Let the market do the coordination. But that's still just—" He points at the right side of the diagram. "That's still a hub, isn't it? The price signal is the hub. Everything gets routed through buy and sell instead of through plan and allocate, but it's still—"

Scott smiles. The first genuine one Todd has seen. "Keep going."

"It's still a single coordination mechanism. You've just changed the colour of the red dot."

"That," says Scott, "is the part that Hayek got right and his fans get catastrophically wrong. He diagnosed the problem — centralised knowledge processing fails — and then prescribed a different centralised knowledge processor. A more efficient one, sure. Better at some things, worse at others. But still one mechanism trying to do the work of a network."

"So the question isn't planning versus markets—"

"The question is: what happens to the distributed knowledge when you reorganise the network? And nobody in 1945 was asking that question because they were all too busy arguing about ideology instead of topology."

"I want it noted," says Levin, "that I have been saying this about cell signalling for—"

"NOTED, Michael."

Sandra, in Todd's ear: "He's saying the shape of the information network matters more than the ideology running it. File that. It comes back."

"And when someone tried to fix the official system by removing the unofficial one—"

"Gorbachev," says Scott. "Anti-corruption campaigns. Stricter enforcement. More rigorous adherence to the plan. He looked at the blat networks and saw corruption. Waste. Disorder. Mess."

"The same mess the Prussians saw in the old-growth forest," says Deacon.

"The same mess that Costa and Niemeyer zoned out of Brasília," says Levin.

"He cut the planarian in half," says Todd, and immediately looks surprised at himself.

Levin points at him with both hands. "YES. THANK you. He cut the—"

"I cannot believe we're doing the flatworm," says Scott.

"He severed the lateral connections! And unlike a planarian, the Soviet economy couldn't regenerate them fast enough! Because Gorbachev was also tightening enforcement, which is like — Jim, work with me here — it's like cutting the planarian and also suppressing the wound-healing signals—"

"The economy isn't a flatworm, Michael!"

"The TOPOLOGY is the SAME!"

"He's right," says Deacon, and Scott throws his hands up.

"Fine. Fine! He removed the informal networks. And everything collapsed. Because the mess was the distributed system doing the work the central node couldn't. Remove it, and all you're left with is an overloaded red dot trying to coordinate an entire economy through a straw. Is everyone happy now? Can we stop talking about flatworms?"

"Planaria," says Levin.

"I will end you."

Silence. Even the impossible staircases seem to hold still for a moment.

"He killed the mycorrhizal network," says Todd.

Everyone looks at him.

"I mean — the principle. He removed the distributed system because the centralised framework told him it was waste. Same as the Prussians. Same as the city planners. The Prussians killed the network to make rows. The planners killed the sprawl to make a grid. And the Soviets killed the lateral connections to make a hierarchy. Three different shapes, same operation: take a distributed system, force it through a single point, lose everything the single point can't see."

Sandra, in his ear, very quietly: "Yes. That's it."

Todd looks at the three academics. The Escher staircases have settled into something almost normal, as if the geometry is calming down along with the argument. Levin is still quietly triumphant about the planarian. Scott is pretending to be annoyed. Deacon is watching Todd with an expression that suggests he's been waiting for this question.

"Okay," says Todd. "So the networks matter. The distributed bit is load-bearing. Every time we centralise it or formalise it or remove it, things collapse. I get that. But—" He stops. Thinks. "But you can't just leave it alone, can you? The old-growth forest was fine because nobody was trying to coordinate it into producing timber. But we actually need economies to produce things. We actually need cities to function. You can't just say 'don't touch the network' and walk away."

"No," says Scott, and he looks at Todd differently now. "You can't."

"So has anyone actually figured out how to do this? How to work with the distributed thing without killing it?"

The three academics exchange a look. It's the first time they've agreed on something without arguing about it first.

And then Sandra does something she hasn't done all session. She breaks in. Not in Todd's ear — in the room, her voice coming through the VR's spatial audio as if she's suddenly standing among them, and there's something in her voice that Todd has never heard. Not quite anger. Something older than anger.

"There was someone," she says. "Someone who understood formally, mathematically, practically that you cannot govern a distributed system by centralising it, and that the answer is not to leave it alone either. There's a third option. And I have been waiting nine years for someone in this department to ask about it."

"Stafford Beer," says Deacon.

"Stafford Beer."

Project Cybersyn

Todd: "Who—"

"Management cybernetics," says Sandra, and she's speaking faster now, like a dam breaking. "The Viable System Model. The insight is that any viable system has the same recursive structure — autonomous units at every level, each level self-regulating, feedback loops everywhere. You don't control it from above. But you don't abandon it either. You create the conditions for it to regulate itself. Because no external controller can model the system's own complexity — the system is always more complex than any model of it. That's Ashby's Law, 1956, the law of requisite variety, and it is the single most important idea in governance that nobody in governance has ever heard of."

A 3d rendering of a description of Project Cybersyn's operations room. Santiago, 1971. Designed by Stafford Beer for Salvador Allende's government. A room built to govern a living system as a living system. It was burned in a coup two years later.

The screens are alive. And on them, Todd sees the distributed network — not collapsed into a hub-and-spoke, not funnelled through one red dot. The orange connections between nodes are intact, visible, flowing. Factory output data streaming in from the regions, but not to a central planner — to each other. Local patterns feeding into regional patterns feeding into national dynamics, with the information staying distributed, the lateral connections preserved. Beer's control room wasn't a command centre. It was a window onto the network.

"Beer built this," says Sandra. "For Chile. Under Allende. Project Cybersyn. A national economic coordination system based on cybernetic principles. Real-time factory data flowing up. Policy signals flowing down. Workers maintaining autonomy at the local level. The system was designed to preserve the distributed knowledge — the informal dynamics, the local information, the lateral connections — and make them visible without centralising them. He solved the problem that Hayek said was unsolvable and the Soviets proved was unsolvable. And he did it by changing the network topology."

"What happened?" says Todd.

"September 11th, 1973. Pinochet, CIA-backed coup. They burned the operations room."

The control room begins to darken. The screens flicker. The orange distributed network stutters and collapses — node by node, connection by connection — until it rearranges itself into a hub-and-spoke. A different red dot this time. Not Moscow. Chicago.

"Chile got Milton Friedman's Chicago Boys instead — free market optimisation, deregulation, treat the economy as a problem solvable by one mechanism, the price signal, routed through one kind of node, the market. It's a different ideology but the same network topology, everything funnelled through a single coordination point."

"That's—"

"A different colour of hub-and-spoke. Again. We had someone who understood how to govern distributed systems as distributed systems. We burned his control room and replaced it with a different bottleneck."

The control room goes dark.

"Government-mandated bottleneck," says Sandra, and twenty-three years of professional composure cracks, just slightly, just for a moment, before she puts it back together.

Todd takes the headset off. Conference room. Fluorescent lights. The HVAC hum.

Sandra appears in the doorway with fresh tea and a stack of highlighted papers.

"I've rewritten your slides," she says.

"Of course you have."

"Slide seven is blank."

"Why is seven blank?"

"Because it's the honest answer. We don't have the science yet. That's what you're asking them to fund."

Todd takes the tea. Looks at the slides. Looks at Sandra.

"Why aren't you doing the committee presentation?"

Sandra smiles the smile of a woman who has been asked this, in various forms, for twenty-three years.

"Because they don't listen to secretaries, Todd. They listen to men in suits. The system can't see where its own knowledge lives."

She pauses.

"Same problem all the way down."

Conclusion

Todd is fictional. The problem isn't.

We are integrating artificial intelligence into the coordination systems that run human civilisation — markets, democracies, information ecosystems, institutional decision-making — and our frameworks for evaluating the safety of this process examine components one at a time. We assess individual AI systems for alignment, capability, and risk, then assume that safe components produce safe collectives. This is the logic of Prussian forestry applied to sociotechnical systems, and the 20th century ran the experiment on what happens next.

The difficulty is that the alternative isn't obvious. "The system is complex, leave it alone" isn't governance. Stafford Beer understood this — Cybersyn wasn't a policy of non-intervention, it was a proper attempt to see distributed dynamics without collapsing them into a central model. But Beer's work was cut short, and the field never fully developed the tools he was reaching for. So the question remains open: what would it actually mean to govern a living system as a living system?

To answer that, we first have to confront something uncomfortable. The three case studies in this piece — forests, cities, economies — all display the same pattern: a collection of components that, through their interactions, become something more than a collection. The old-growth forest wasn't just trees near each other. It was a system with its own collective behaviour, its own capacity to respond to threats, its own ability to redistribute resources where they were needed. It had, in a meaningful sense, agency — not because anyone designed that agency into it, but because it grew.

This is the deep question hiding behind all the governance talk. When does a collection of things become an agent with its own goals? A salamander's cells, each just trying to maintain their local chemistry, somehow collectively rebuild a missing limb — and they build the right limb, correctly proportioned, properly wired. No cell has the blueprint. No cell is in charge. The limb-level goal emerges from the network of interactions between cells, from the information flowing through chemical gradients and electrical signals and mechanical pressures. The goal lives in the between.

We can watch this happen in biology, in ant colonies, in neural systems, in markets. But we cannot yet explain it. We have no general theory of how local behaviours compose into collective agency, no way to predict when it will happen, no principled account of what makes it robust versus fragile. And this gap matters enormously right now, because we are running the experiment in real time.

When AI trading agents participate in financial markets alongside humans, what is the market becoming? Not just "a market with faster traders" — the collective dynamics change qualitatively as the ratio of AI to human participants shifts. When large language models mediate human discussion, summarising arguments and surfacing consensus, the AI isn't just transmitting information neutrally — it's becoming part of the coordination substrate itself, reshaping what the collective can see and think. When recommendation algorithms determine what information reaches which people, they're not just tools that individuals use — they're agents within the collective, shaping its emergent behaviour in ways nobody designed or intended.

At what point do these hybrid systems develop their own agency? Their own goals? And if they do — and the history of every collective system suggests they will — how would we even know? Our frameworks measure individual components. The collective agency lives in the connections between them, exactly where we're not looking.

This is where the two paradigms collide. Almost everything we know about building AI systems comes from what you might call the engineering paradigm: define your agents, specify their objectives, design the mechanism, prove properties. This works beautifully when you can determine everything in advance. But the systems we're actually creating are growing systems — they will discover their own organisation, develop their own emergent goals, find their own boundaries. We're using tools designed for building bridges to tend something that behaves more like a forest.

The growth paradigm — the one that developmental biologists and complex systems researchers live in — understands this. It watches how collective intelligence emerges from local interactions, how agent boundaries form and dissolve, how the whole becomes genuinely more than the sum of its parts. But it's largely descriptive. It can tell you what happened. It struggles to tell you what to build.

What we need is something that doesn't exist yet: a framework that's precise enough to guide engineering but flexible enough to capture emergence. Mathematics that can answer questions like: where, in a complex system, do the real agents live? How do simple local goals — each trader pursuing profit, each algorithm optimising engagement — compose into collective goals that nobody specified and nobody controls? When does a collection become a collective, and what makes that transition stable or fragile?

We believe these to be precise, tractable questions that can be formalised through the right sets of mathematics.

Complex mechanics already gives us tools for measuring when a whole contains more than its parts. Causal Emergence theory can identify the scale at which a system's behaviour is most predictable — and that scale is often not the level of individual components. Active Inference provides a framework for understanding agency in terms of statistical boundaries rather than programmer intentions. Category Theory offers a language for how simple operations compose into complex ones.

The pieces exist, scattered across a dozen fields that don't talk to each other. Developmental biologists who watch collective agency emerge every day in growing embryos. Physicists who study phase transitions — the critical points where systems suddenly reorganise. Neuroscientists who understand how neural collectives become unified minds. Social scientists who observe markets and democracies developing emergent properties in the wild. Mathematicians who prove deep structural connections between apparently different frameworks.

Nobody has put these pieces together, and we don’t really know why but we think it might partly be because the question that connects them hasn't been asked clearly enough (or at all).

Here it is, as plainly as we can state it: when AI systems join human collectives at scale, what kind of collective agents will emerge, and how do we ensure they remain ones we'd want to live inside?

That's what slide seven is asking for. Not better evaluation of individual AI systems — we have people working on that, and they're good at it. Not "leave the system alone and hope for the best" — Beer showed us that active governance of living systems is possible, before his control room was burned. What we need is the science of collective agency itself. The basic research that would let us understand how collections become agents, predict when it will happen, and develop the equivalent of Beer's Cybersyn for a world where the collective includes artificial minds.


This is the first in a series on collective agent foundations. The next post goes deeper into the mathematics underlying these questions — how information theory, causal emergence, active inference, and category theory each offer different lenses on the same problem, where those lenses converge, and where they point to open questions that no single field can answer alone.

You can follow this series on our Substack (or in this LessWrong sequence), and find out more about our research at Equilibria Network.



Discuss

Nick Bostrom: Optimal Timing for Superintelligence

2026-02-13 15:33:21

Published on February 13, 2026 7:33 AM GMT

Linked is a new working paper from Nick Bostrom, of Superintelligence fame, primarily analyzing optimal pause strategies in AI research, with the aim of maximizing saved human lives by balancing x-risk against ASI developing biological immortality sooner.

Abstract: (emphasis mine)

Developing superintelligence is not like playing Russian roulette; it is more like undergoing risky surgery for a condition that will otherwise prove fatal. We examine optimal timing from a person-affecting stance (and set aside simulation hypotheses and other arcane considerations). Models incorporating safety progress, temporal discounting, quality-of-life differentials, and concave QALY utilities suggest that even high catastrophe probabilities are often worth accepting. Prioritarian weighting further shortens timelines. For many parameter settings, the optimal strategy would involve moving quickly to AGI capability, then pausing briefly before full deployment: swift to harbor, slow to berth. But poorly implemented pauses could do more harm than good.

The analysis is, interestingly, deliberately from a "normal person" viewpoint:[1]

  1. It includes only "mundane" considerations (just saving human lives) as opposed to "arcane" considerations (AI welfare, weird decision theory, anthropics, etc.).
  2. It considers only living humans, explicitly eschewing longtermist considerations of large numbers of future human lives.
  3. It assumes that a biologically immortal life is merely 1400 years long, based on mortality rates for healthy 20-year-olds.

It results in tables like this: 

Table 6: Optimal delay under small quality of life difference post-ASI, medium discount rate for future years of life, diminishing marginal utility of future years of life

The results on the whole imply that under a fairly wide range of scenarios, a pause could be useful, but likely should be short. 

However, Bostrom also says that he doesn't think this work implies specific policy prescriptions, because it makes too many assumptions and is too simplified. Instead he argues that his main purpose is just highlighting key considerations and tradeoffs.


Some personal commentary:

  1. Assuming we don't have a fast takeoff, there will probably be a period where biomedical results from AI look extremely promising, and biohackers will be taking AI-designed peptides, and so forth.[2] This would be likely to spark a wider public debate about rushing to AGI/ASI for health benefits, and the sort of analysis Bostrom provides here may end up guiding part of that debate. It's worth noting that in the West at least, politics is something of a gerontocracy, which will be extra-incentivized to rush.
  2. While I suppose these considerations would fall under the "arcane" category, I think probably the biggest weaknesses of Bostrom's treatment are: a.) discounting how much people care about the continuation of the human species, separate to their own lives or lives of family/loved ones; b.) ignoring the possibility of s-risks worse than extinction. I'm not sure those are really outside the realm of Overton Window public debate, esp. if you frame s-risks primarily in terms of authoritarian takeover by political enemies (not exactly the worst s-risk, but I think "permanent, total victory for my ideological enemies" is a concrete bad end people can imagine).
  1. ^

    Excepting the assumption that AGI/ASI are possible and also that aligned ASI could deliver biological immortality quickly. But you know, might as well start by accepting true facts.

  2. ^

    LLMs are already providing valuable medical advice of course, to the point there was a minor freakout not too long ago when a rumor went around that ChatGPT would stop offering medical advice.



Discuss