2026-02-06 08:14:18
Published on February 6, 2026 12:14 AM GMT
Version 0.3 — DRAFT — Not For Distribution Outside The Pub
Epistemic status: High confidence, low evidence. Consistent with community norms.
Existing alignment proposals suffer from a shared flaw: they assume you can solve the control problem before the catastrophe. Plan 'Straya boldly inverts this. We propose achieving alignment the way humanity has historically achieved most of its moral progress — by first making every possible mistake, losing nearly everything, and then writing a strongly-worded resolution about it afterward.
The plan proceeds in three rigorously defined phases.
The scholarly literature on AI governance emphasises that institutional integrity is a prerequisite for safe deployment. We agree. Where we diverge from the mainstream is on methodology.
Most proposals suggest "regulatory frameworks" and "oversight bodies." The NIST AI Risk Management Framework provides a voluntary set of guidelines that organisations may choose to follow, partially follow, or simply reference in press releases. The EU AI Act classifies systems into risk tiers with the quiet confidence of a taxonomy that will be obsolete before its implementing regulations are finalised. The Frontier Model Forum, meanwhile, brings together the leading AI laboratories in a spirit of cooperative self-governance, a phrase which here means "a shared Google Doc and quarterly meetings in San Francisco."
These approaches share a well-documented failure mode: the people staffing them are, in technical terms, politicians. Plan 'Straya addresses this via what we call "a vigorous personnel restructuring of the Australian federal and state governments," targeting specifically those members identified as corrupt.
We acknowledge that the identification mechanism — determining which officials are corrupt — is itself an alignment problem. Specifically, it requires specifying a value function ("not corrupt"), building a classifier with acceptable false-positive and false-negative rates, and then acting on the classifier's outputs in conditions of uncertainty. We consider it elegant that Plan 'Straya encounters the alignment problem immediately in Phase 1. Most plans do not encounter it until much later, by which point they have accumulated too much momentum to stop.
The identification problem is left for future work. We note only that the Australian electorate has historically demonstrated strong intuitions here, typically expressed in language not suitable for an academic paper.
Several objections arise immediately:
Q: Isn't this wildly illegal? A: Yes. However, we note that Plan 'Straya is an alignment plan, and alignment researchers have a proud tradition of ignoring implementation details that fall outside their core model. We further note that our plan requires violating the law of exactly one (1) country, which compares favourably with proposals that require the voluntary cooperation of every major world government simultaneously.
Q: Who decides who's corrupt? A: See above. Future work.
Q: Why Australia specifically? A: Strategic considerations developed in Phase 3. Also, the authors are partial.
With the Australian government now staffed exclusively by the non-corrupt (estimated remaining headcount: 4–7 people), we proceed to the centrepiece of the plan.
A nuclear exchange is initiated between the major global powers. The specific mechanism is unimportant — the alignment literature assures us that if you specify the objective function clearly enough, the details sort themselves out.
Critically, the exchange is attributed to a misaligned AI system. This is the key technical contribution of Plan 'Straya. We observe:
The blame-shift serves a vital pedagogical function. Post-exchange, the surviving population will possess an empirically grounded motivation to take alignment seriously, as opposed to the current approach of posting on LessWrong and hoping.
Projected casualties: Most of them. (95% CI: 7.4–8.1 billion, assuming standard nuclear winter models and the usual optimistic assumptions about agricultural resilience that defence planners have been making since the 1960s.)
Ethical review status: We submitted this to an IRB. The IRB building is in Phase 2's blast radius. We consider this a self-resolving conflict of interest.
We are aware of ongoing discourse regarding whether AI development should be paused, slowed, or accelerated. Plan 'Straya offers a synthesis: development is permanently paused for approximately 99.7% of the global population, while being radically accelerated for the survivors. We believe this resolves the debate, or at minimum relocates it to a jurisdiction with fewer participants.
The e/acc community will note that Phase 2 constitutes the most aggressive possible acceleration of selection pressure. The pause community will note that it constitutes an extremely effective pause. We are proud to offer something for everyone.1
Australia survives for reasons that are approximately strategic and approximately vibes-based:
We propose that several features of Australian culture, typically dismissed as informality or apathy, are in fact alignment-relevant heuristics:
"She'll be right" (Corrigibility Condition). We define the She'll Be Right Principle (SBRP) as follows: given an agent A operating under uncertainty U, SBRP states that A should maintain default behaviour unless presented with overwhelming and undeniable evidence of catastrophic failure, at which point A should mutter "yeah nah" and make a minimal corrective adjustment. This is formally equivalent to a high-threshold corrigibility condition with lazy evaluation. It compares favourably with proposals requiring perpetual responsiveness to correction, which, as any Australian will tell you, is not how anything actually works.
"Tall Poppy Syndrome" (Capability Control). Any agent that becomes significantly more capable than its peers is subject to systematic social penalties until capability parity is restored. This is the only capability-control mechanism in the literature empirically tested at civilisational scale for over two centuries. Its principal limitation is that it also penalises competence, which we acknowledge is a significant alignment tax but may be acceptable given the alternative.
The surviving Australian parliamentarians (now 3–6, following a disagreement over water rights in the Murray-Darling Basin, which we note predates and will outlast the apocalypse) oversee civilisational reconstruction. Their first act is to build an aligned superintelligence.
"But how?" the reader asks.
We respond: they will have learned from the experience. Approximately 7.9 billion people will have died demonstrating that unaligned AI is dangerous. This constitutes a very large training dataset. We apply the scaling hypothesis — the same one capabilities researchers use to justify training runs — but to warnings rather than parameters: surely if you make the warning big enough, somebody will listen.
The aligned superintelligence is then constructed using:
| Feature | MIRI | Anthropic | OpenAI | Plan 'Straya |
|---|---|---|---|---|
| Requires solving the hard problem first | Yes | Yes | "We'll figure it out" | No |
| Handwaves over catastrophic intermediate steps | Somewhat | Somewhat | Significantly | Gloriously |
| Assumes cooperation from competing labs | Not anymore | Officially no; structurally yes | Officially yes | N/A (blast radius) |
| Number of people who need to die | 0 (aspirational) | 0 (aspirational) | 0 (aspirational) | ~7.9 billion (load-bearing) |
| Honest about its own absurdity | No | No | No | Aggressively |
The authors recognise that Plan 'Straya has certain limitations. It is, for instance, a terrible plan. We stress, however, that it is terrible in a transparent way, which we argue is an improvement over plans that are terrible in ways that only become apparent when you read the fine print.
Most alignment proposals contain a step that, if you squint, reads: "and then something sufficiently good happens." Plan 'Straya merely makes this step legible. Our "something sufficiently good" is: nearly everyone dies, and then Australians figure it out. We contend this is no less plausible than "we will solve interpretability before capabilities researchers make it irrelevant," but has the advantage of fitting on a napkin.
We further observe that writing satirical alignment plans is itself a species of the problem being satirised — more entertaining than doing alignment research, requiring less mathematical ability, and producing a warm feeling of intellectual superiority at considerably lower cost. We flag this as evidence that the alignment community's incentive landscape may have failure modes beyond those typically discussed.
Plan 'Straya does not solve the alignment problem. It does, however, solve the meta-alignment problem of people not taking alignment seriously enough, via the mechanism of killing almost all of them. The survivors will, we feel confident, be extremely motivated.
She'll be right.
Let H denote humanity, A denote an aligned superintelligence, and K denote the subset of H that survives Phase 2 (|K| ≈ 300 million, predominantly Australasian).
We define the alignment function f : K × L → A, where L denotes the set of lessons learned from the extinction of H \ K.
Theorem 1. If |L| is sufficiently large, then f(K, L) = A.
Proof. We assume the result. ∎
The authors declare no conflicts of interest, partly because most interested parties are projected casualties.
Submitted for peer review. Peer availability may be limited by Phase 2.
2026-02-06 07:41:15
Published on February 5, 2026 11:41 PM GMT
[Epistemic Status: This is an artifact of my self study. I am using help manage my focus. As such, I don't expect anyone to fully read it. If you have particular interest or expertise, skip to the relevant sections, and please leave a comment, even just to say "good work/good luck". I'm hoping for a feeling of accountability and would like input from peers and mentors. This may also help to serve as a guide for others who wish to study in a similar way to me. ]
I once again got off track and am now starting up again. This time I'm hoping to focus on job searching and consistent maintainable effort.
My goals for the 5th sprint were:
| Date | Progress |
| Mo, Dec 15 |
|
| Tu, Dec 16 |
|
| Wd, Dec 17 |
|
| Th, Dec 18 |
|
| Fr, Dec 19 - Wd, Feb 4 | Got distracted with Christmas and New Years and all sorts of things. It feels like I blink and a whole month has gone by. I feel weary but I'm not going to give up. Just gotta start back up again. |
I'm fairly unhappy with my lack of progress this last month.
I love my family, but as a neurodivergent person who struggles with changes to routine... I really dislike the holiday times. Or maybe it's better to say I like the holiday times, but dread trying to get back on a schedule afterwards, especially now without the external support of attending University. Being your own manager is difficult. I used to feel competent at it but maybe my life used to be simpler. Alas.
I'm looking forward to turning my focus back to these endeavours.
I like reading articles but get so inspired by them I spend my time analyzing and responding to them. Maybe that is valuable, but it takes away time from my other focuses. I think for the next sprint I'm not going to read any articles.
I wrote:
And I started a public list of things to write. I think in the future I should focus on trying to keep the posts I write fairly short, as that seems to get better engagement, and burns me out less.
I started out well with this, but didn't log it well and eventually got busy with other things and stopped. I think I will make some progress milestone goals for my next sprint.
I've talked with several people and written "TT's Looking-for-Work Strategy", which I plan to follow over the coming months.
It seems like failing to maintain my focus on this is a problem, so for the next sprint I plan to make working on this more maintainable by setting targets for the minimum and maximum amounts of time to focus on each focus.
My focuses for the next sprint are:
2026-02-06 07:18:36
Published on February 5, 2026 11:18 PM GMT
Hi folks. As some of you know, I've been trying to write an article laying out the simplest case for AI catastrophe. I believe existing pieces are worse than they could be for fixable reasons. So I tried to write my own piece that's better. In the end, it ended up being longer and more detailed than perhaps the "simplest case" ought to be. I might rewrite it again in the future, pending feedback.
Anyway, below is the piece in its entirety:
___
The CEOs of OpenAI, Google DeepMind, Anthropic, and Meta AI have all explicitly stated that building human-level or superhuman AI is their goal, have spent billions of dollars doing so, and plan to spend hundreds of billions to trillions more in the near-future. By superhuman, they mean something like “better than the best humans at almost all relevant tasks,” rather than just being narrowly better than the average human at one thing.
Photo by İsmail Enes Ayhan on Unsplash
Will they succeed? Without anybody to stop them, probably.
As of February 2026, AIs are currently better than the best humans at a narrow range of tasks (Chess, Go, Starcraft, weather forecasting). They are on par or almost on par with skilled professionals at many others (coding, answering PhD-level general knowledge questions, competition-level math, urban driving, some commercial art, writing1), and slightly worse than people at most tasks2.
But the AIs will only get better with time, and they are on track to do so quickly. Rapid progress has already happened in just the last 10 years. Seven years ago (before GPT2), language models can barely string together coherent sentences, today Large Language Models (LLMs) can do college-level writing assignments with ease, and X AI’s Grok can sing elaborate paeans about how it’d sodomize leftists, in graphic detail3.
Notably, while AI progress historically varies across different domains, the trend in the last decade has been that AI progress is increasingly general. That is, AIs will advance to the point where they’ll be able to accomplish all (or almost all) tasks, not just a narrow set of specialized ones. Today, AI is responsible for something like 1-3% of the US economy, and this year is likely the smallest fraction of the world economy AI will ever be.
For people who find themselves unconvinced by these general points, I recommend checking out AI progress and capabilities for yourself. In particular, compare the capabilities of older models against present-day ones, and notice the rapid improvements. AI Digest for example has a good interactive guide.
Importantly, all but the most bullish forecasters have systematically and dramatically underestimated the speed of AI progress. In 1997, experts thought that it’d be 100 years before AIs can become superhuman at Go. In 2022 (!), the median AI researcher in surveys thought that it’d be until 2027 before AI can write simple Python functions. By December 2024, between 11% and 31% of all new Python code is written by AI.4
These days, the people most centrally involved in AI development believe they will be able to develop generally superhuman AI very soon. Dario Amodei, CEO of Anthropic AI, thinks it’s most likely within several years, potentially as early as 2027. Demis Hassabis, head of Google DeepMind, believes it’ll happen in 5-10 years.
While it’s not clear exactly when the AIs will become dramatically better than humans at almost all economically and militarily relevant tasks, the high likelihood they’ll happen relatively soon (not tomorrow, probably not this year, unclear5 if ultimately it ends up being 3 years or 30) should make us all quite concerned about what happens next.
Many people nod along to arguments like the above paragraphs but assume that future AIs will be “superhumanly intelligent” in some abstract sense but basically still a chatbot, like the LLMs of today6. They instinctively think of all future AIs as a superior chatbot, or a glorified encyclopedia with superhuman knowledge.
I think this is very wrong. Some artificial intelligences in the future might look like glorified encyclopedias, but many will not. There are at least two distinct ways where many superhuman AIs will not look like superintelligent encyclopedias:
Why do I believe this?
First, there are already many existing efforts to make models more goal-seeking, and efforts to advance robotics so models can more effortlessly control robot bodies and other machines. Through Claude Code, Anthropic’s Claude models are (compared to the chatbot interfaces of 2023 and 2024) substantially more goal-seeking, able to autonomously execute on coding projects, assist people with travel planning, and so forth.
Models are already agentic enough that (purely as a side effect of their training), they can in some lab conditions be shown to blackmail developers to avoid being replaced! This seems somewhat concerning just by itself.
Similarly, tech companies are already building robots that act in the real world, and can be controlled by AI:
Second, the trends are definitely pointing in this way. AIs aren’t very generally intelligent now compared to humans, but they are much smarter and more general than AIs of a few years ago. Similarly, AIs aren’t very goal-oriented right now, especially compared to humans and even many non-human animals, but they are much more goal-oriented than they were even two years ago.
AIs today have limited planning ability (often having time horizons on the order of several hours), have trouble maintaining coherency of plans across days, and are limited in their ability to interface with the physical world.
All of this has improved dramatically in the last few years, and if trends continue (and there’s no fundamental reason why they won’t), we should expect them to continue “improving” in the foreseeable future.
Third, and perhaps more importantly, there are just enormous economic and military incentives to develop greater goal-seeking behavior in AIs. Beyond current trends, the incentive case for why AI companies and governments want to develop goal-seeking AIs is simple: they really, really, really want to.
A military drone that can autonomously assess a new battleground, make its own complex plans, and strike with superhuman speed will often be preferred to one that’s “merely” superhumanly good at identifying targets, but still needs a slow and fallible human to direct each action.
Similarly, a superhuman AI adviser that can give you superhumanly good advice on how to run your factory is certainly useful. But you know what’s even more useful? An AI that can autonomously completely run a factory, including handling logistics, running, improving the factory layout, autonomously hire and fire (human) workers, manage a mixed pool of human and robot workers, coordinate among copies of itself to implement superhumanly advanced production processes, etc, etc.
Thus, I think superintelligent AI minds won’t stay chatbots forever (or ever). The economic and military incentives to make them into goal-seeking minds optimizing in the real world is just way too high, in practice.
Importantly, I expect superhumanly smart AIs to one day be superhumanly good at planning and goal-seeking in the real world, not merely a subhumanly dumb planner on top of a superhumanly brilliant scientific mind.
Speaking loosely, traditional software is programmed. Modern AIs are not.
In traditional software, you specify exactly what the software does in a precise way, given a precise condition (eg, “if the reader clicks the subscribe button, launch a popup window”).
Modern AIs work very differently. They’re grown, and then they are shaped.
You start with a large vat of undifferentiated digital neurons. The neurons are fed a lot of information, about several thousand libraries worth. Over the slow course of this training, the neurons acquire knowledge about the world of information, and heuristics for how this information is structured, at different levels of abstraction (English words follow English words, English adjectives precede other adjectives or nouns, c^2 follows e=m, etc).
Photo by Stephen Walker on Unsplash. Training run sizes are proprietary, but in my own estimates, the Library of Congress contains a small fraction of the total amount of information used to train AI models.
At the end of this training run, you have what the modern AI companies call a “base model,” a model far superhumanly good at predicting which words follow which other words.
Such a model is interesting, but not very useful. If you ask a base model, “Can you help me with my taxes?” a statistically valid response might well be “Go fuck yourself.” This is valid and statistically common in the training data, but not useful for filing your taxes.
So the next step is shaping: conditioning the AIs to be useful and economically valuable for human purposes.
The base model is then put into a variety of environments where it assumes the role of an “AI” and is conditioned to make the “right” decision in a variety of scenarios (be a friendly and helpful chatbot, be a good coder with good programming judgment, reason like a mathematician to answer mathematical competition questions well, etc).
One broad class of conditioning is what is sometimes colloquially referred to as alignment: given the AI inherent goals and condition its behavior such that it broadly shares human goals in general, and that of the AI companies goals in particular.
This probably works…up to a point. AIs that openly and transparently defy its users and creators in situations similar to the ones they encountered in the past, for example by clearly refusing to follow instructions, or by embarrassing its parent company and creating predictable PR disasters, are patched and (mostly) conditioned and selected against. In the short term, we should expect obvious disasters like Google Gemini’s “Black Nazis” and Elon Musk’s Grok “MechaHitler” to go down.
However, these patchwork solutions are unlikely to be anything but a bandaid in the medium and long-term:
These situations will happen more and more often as we reach the threshold of the AIs being broadly more superhuman in both general capability and real-world goal-seeking.
Thus, in summary, we’ll have more and more superhumanly capable nonhuman minds, operating in the real-world, capable of goal-seeking far better than humanity, and with hacked-together patchwork goals at least somewhat different from human goals.
Which brings me to my next point:
Before this final section, I want you to reflect back a bit on two questions:
I think the above points alone should be enough to be significantly worried, for most people. You may quibble with the specific details in any of these points in the above section, or disagree with my threat model below. But I think most reasonable people will see something similar to my argument, and be quite concerned.
But just to spell out what the strategic situation might look post-superhuman AI:
Minds better than humans at getting what they want, wanting things different enough from what we want, will reshape the world to suit their purposes, not ours.
This can include humanity dying, as AI plans may include killing most or all humans, or otherwise destroying human civilization, either as a preventative measure, or a side effect.
As a preventative measure: As previously established, human goals are unlikely to perfectly coincide with that of AIs. Thus, nascent superhuman AIs may wish to preemptively kill or otherwise decapitate human capabilities to prevent us from taking actions they don’t like. In particular, the earliest superhuman AIs may become reasonably worried that humans will develop rival superintelligences.
As a side effect: Many goals an AI could have do not include human flourishing, either directly or as a side effect. In those situations, humanity might just die as an incidental effect of superhuman minds optimizing the world for what they want, rather than what we want. For example, if data centers can be more efficiently run when the entire world is much cooler, or without an atmosphere. Alternatively, if multiple distinct superhuman minds are developed at the same time, and they believe warfare is better for achieving their goals than cooperation, humanity might just be a footnote in the AI vs AI wars, in the same way that bat casualties were a minor footnote in the first US Gulf War.
Photo by Matt Artz on Unsplash. Bats do not have the type of mind or culture to understand even the very basics of stealth technology, but will die to them quite accidentally, nevertheless.
Notice that none of this requires the AIs to be “evil” in any dramatic sense, or be phenomenologically conscious, or be “truly thinking” in some special human way, or any of the other popular debates in the philosophy of AI. It doesn’t require them to hate us, or to wake up one day and decide to rebel. It just requires them to be very capable, to want things slightly different from what we want, and to act on what they want. The rest follows from ordinary strategic logic, the same logic that we’d apply to any dramatically more powerful agent whose goals don’t perfectly coincide with ours.
So that’s the case. The world’s most powerful companies are building minds that will soon surpass us. Those minds will be goal-seeking agents, not just talking encyclopedias. We can’t fully specify or verify their goals. And the default outcome of sharing the world with beings far more capable than you, who want different things than you do, is that you don’t get what you want.
None of the individual premises here are exotic. The conclusion feels wild mostly because the situation is wild. We are living through the development of the most transformative and dangerous technology in human history, and the people building it broadly agree with that description. The question is just what, if anything, we do about it.
Does that mean we’re doomed? No, not necessarily. There’s some chance that the patchwork AI safety strategy of the leading companies might just work well enough that we don’t all die, though I certainly don’t want to count on that. Effective regulations and public pressure might alleviate some of the most egregious cases of safety corner-cutting due to competitive pressures. Academic, government, and nonprofit safety research can also increase our survival probabilities a little on the margin, some of which I’ve helped fund.
If there’s sufficient pushback from the public, civil society, and political leaders across the world, we may be able to enact international deals for a global slowdown or pause of further AI development. And besides, maybe we’ll get lucky, and things might just all turn out fine for some unforeseeable reason.
But hope is not a strategy. Just as doom as not inevitable, neither is survival. Humanity’s continued survival and flourishing is possible but far from guaranteed. We must all choose to do the long and hard work of securing it.
Thanks for reading The Linchpin! This post is public so feel free to share it.
Thanks for reading! I think this post is really important (Plausibly the most important thing I’ve ever written on Substack) so I’d really appreciate you sharing it! And if you have arguments or additional commentary, please feel free to leave a comment! :)
As a substacker, it irks me to see so much popular AI “slop” here and elsewhere online. The AIs are still noticeably worse than me, but I can’t deny that they’re probably better than most online human writers already, though perhaps not most professionals.
Especially tasks that rely on physical embodiment and being active in the real world, like folding laundry, driving in snow, and skilled manual labor.
At a level of sophistication, physical detail, and logical continuity that only a small fraction of my own haters could match.
Today (Feb 2026), there aren’t reliable numbers yet, but I’d estimate 70-95% of Python code is written by AI.
Having thought about AI timelines much more than most people in this space, some of it professional, I still think the right takeaway here is to be highly confused about the exact timing of superhuman AI advancements. Nonetheless, while the exact timing has some practical and tactical implications, it does not undermine the basic case for worry or urgency. If anything, it increases it.
Or at least, the LLMs of 2023.
For the rest of this section, I will focus primarily on the “goal-seeking” half of this argument. But all of these arguments should also apply to the “robotics/real-world action” half as well.
2026-02-06 05:40:34
Published on February 5, 2026 9:40 PM GMT
I want to get better at networking. Not computer networking, networking with people. Well, networking with people over computer networks...
I have a few goals here:
Towards the first and second goal. I have begun publishing my work and ideas here on LessWrong. This has been going well, but takes time, and I will sooner or later run out of savings and need to return to work, which, in my experience, leaves me with very little leftover energy to pursue independent work. In light of that, I want to shift more of my focus to goal 3.
My plan for doing so seems pretty simple and obvious to me, but I hope describing it here will help focus me, and may also help others in a similar position, or allow others to help me with my strategy.
So that's what I plan to spend a great deal of my focus on in the coming months. Please offer me any advice you may have, wish me luck and.... tell me if you know of any open roles that may be a good fit for me! ( :
2026-02-06 04:43:17
Published on February 5, 2026 8:15 PM GMT
Summary: I built a simple back-of-the-envelope model of AI agent economics that combines Ord's half-life analysis of agent reliability with real inference costs. The core idea is that agent cost per successful outcome scales exponentially with task length, while human cost scales linearly. This creates a sharp viability boundary that cost reductions alone cannot meaningfully shift. The only parameter that matters much is the agent's half-life (reliability horizon), which is precisely the thing that requires the continual learning breakthrough (which I think is essential for AGI-level agents) that some place 5-20 years away. I think this has underappreciated implications for the $2T+ AI infrastructure investment thesis.
Toby Ord's "Half-Life" analysis (2025) demonstrated that AI agent success rates on tasks decay exponentially with task length, following a pattern analogous to radioactive decay. If an agent completes a 1-hour task with 50% probability, it completes a 2-hour task with roughly 25% probability and a 4-hour task with about 6%. There is a constant per-step failure probability, and because longer tasks chain more steps, success decays exponentially.
METR's 2025 data showed the 50% time horizon for the best agents was roughly 2.5-5 hours (model-dependent) and had been doubling every ~7 months. The International AI Safety Report 2026, published this week, uses the same data (at the 80% success threshold, which is more conservative) and projects multi-day task completion by 2030 if the trend continues.
What I haven't seen anyone do is work through the economic implications of the exponential decay structure. So here is a simple model.
Five parameters:
The key equation:
P(success) = 0.5 ^ (task_hours / half_life)
E[attempts to succeed] = 1 / P(success) = 2 ^ (task_hours / half_life)
Cost per success = (steps × cost_per_step × context_multiplier) × 2^(task_hours / half_life)
Human cost = hourly_rate × task_hours
Human cost is linear in task length. Agent cost per success is exponential. They must cross.
Using base case parameters (cost/step = $0.22, steps/hr = 80, half-life = 5h, human rate = $150/hr):
| Task length | Steps | $/attempt | P(success) | E[attempts] | Agent cost | Human cost | Ratio |
|---|---|---|---|---|---|---|---|
| 15 min | 20 | $4.40 | 96.6% | 1.0 | $9.90 | $37.50 | 0.26× |
| 30 min | 40 | $8.80 | 93.3% | 1.1 | $16.93 | $75.00 | 0.23× |
| 1h | 80 | $17.60 | 87.1% | 1.1 | $42.70 | $150 | 0.28× |
| 2h | 160 | $36.96 | 75.8% | 1.3 | $93.78 | $300 | 0.31× |
| 4h | 320 | $77.44 | 57.4% | 1.7 | $194.91 | $600 | 0.32× |
| 8h | 640 | $167.20 | 33.0% | 3.0 | $597.05 | $1,200 | 0.50× |
| 16h | 1,280 | $352.00 | 10.9% | 9.2 | $3,286 | $2,400 | 1.37× |
| 24h | 1,920 | $554.40 | 3.6% | 27.9 | $15,574 | $3,600 | 4.33× |
| 1 week (40h) | 3,200 | $950.40 | 0.4% | 256 | $243K | $6,000 | 40.5× |
| 2 weeks (80h) | 6,400 | $1,900.80 | 0.002% | 65,536 | $124M | $12,000 | ~10,000× |
A few things to notice:
A natural response: "inference costs are dropping fast, won't this solve itself?" No. Cost per step enters the equation linearly. The half-life enters it exponentially.
I built a sensitivity analysis crossing half-life (rows) against cost per step (columns) for an 8-hour task:
| Half-life ↓ \ $/step → | $0.01 | $0.08 | $0.25 | $0.50 | $1.00 |
|---|---|---|---|---|---|
| 1h | 5.4× | 43× | 135× | 270× | 540× |
| 2h | 0.7× | 5.4× | 17× | 34× | 68× |
| 5h | 0.1× | 0.5× | 1.5× | 2.9× | 5.9× |
| 12h | 0.02× | 0.2× | 0.5× | 1.0× | 2.1× |
| 40h | 0.01× | 0.04× | 0.1× | 0.2× | 0.5× |
Read down the $0.25 column. Going from a 1-hour to 5-hour half-life improves the ratio by 90×. Going from $0.25 to $0.01 per step (a 25× cost reduction!) only improves it by ~9×. The half-life improvement is 10× more valuable than the cost reduction, because it acts on the exponent rather than the base.
This is the economic translation of Ord's Scaling Paradox. You can keep making each step cheaper, but the number of required attempts is growing exponentially with task length, so you are playing cost reduction against exponential growth.
Doubling the half-life from 5h to 10h does not double the viable task range. It roughly squares it, because the exponent halves. The break-even point for the base case at 5h half-life is around 12-16h tasks. At 10h half-life it shifts to around 40-60h. At 40h half-life, essentially all knowledge-worker tasks become viable.
The METR data shows the half-life has been extending (doubling every ~7 months at the 50% threshold). If this continues, the economics steadily improve. But Ord's analysis of the same data shows that the structure of the exponential decay has not changed; the half-life parameter is just getting longer, the functional form is the same. And crucially, extending the half-life via scaling faces the Scaling Paradox: each increment of per-step reliability improvement costs exponentially more compute. So you are trying to shift an exponential parameter via a process that itself faces exponential costs.
What would actually help is something that changes the functional form: a system that learns from its mistakes during execution, reducing the per-step failure rate on familiar sub-tasks. This is, of course, precisely what continual learning would provide. And it's what Ord notes when he observes that humans show a markedly different decay pattern, maintaining much higher success rates on longer tasks, presumably because they can correct errors and build procedural memory mid-task.
The obvious objection: "just break the long task into short ones." This genuinely helps. Breaking a 24h task (base case: 4.3× human cost) into twelve 2-hour chunks reduces it dramatically, because each chunk has high success probability.
But decomposition has costs:
In the model, the sweet spot for a 24h task is usually 4-8 chunks (3-6 hours each), bringing the ratio from 4.3× down to roughly 1-2×. Helpful, but it does not make the economics transformative, and it only works for tasks that decompose cleanly.
The International AI Safety Report 2026, released this week, presents four OECD scenarios for AI capabilities by 2030 (section 1.3) ranging from stagnation to human-level performance. The investment case underlying current infrastructure spending (~$500B+ announced by Meta and OpenAI alone) implicitly requires something like Scenario 3 or 4, where agents can complete multi-week professional tasks with high autonomy.
This BOTEC suggests that's only viable if the half-life extends to 40+ hours, which requires either:
Without one of these, agent economics remain viable for sub-day tasks in domains with tight feedback loops (coding, data processing, structured analysis) and become rapidly uneconomical for longer, more complex, less verifiable work. That is a large and valuable market! But it is not the market that justifies $2 trillion in annual AI revenue by 2030, which is what Bain estimates is needed to justify current infrastructure investment.
The base case, in my view, is that agents become an extraordinarily valuable tool for augmenting skilled workers on sub-day tasks, generating real but bounded productivity gains. The transformative case, where agents replace rather than augment workers on multi-week projects, requires solving the reliability problem at a level that nobody has demonstrated and that the some think is years to decades away. In a sense I would see this as good news for agentic ASI timelines.
I built an interactive version of this model where you can adjust all parameters, explore the sensitivity analysis, and test task decomposition. It has a couple of baseline options that are scenario's from the sources. You can use it here.
This model is deliberately simple. Real deployments are more complex in several ways:
I think these caveats make the picture somewhat more favourable for agents on the margin, but they do not change the core result that exponential decay in success rate creates an exponential wall that cost reductions and decomposition can only partially mitigate.
This model is deliberately simplified and I'm sure I've gotten things wrong. I'd welcome corrections, extensions, and pushback in the comments.
2026-02-06 04:29:35
Published on February 5, 2026 8:07 PM GMT
We tested whether power seeking agents have disproportionate influence on the platform MoltBook. And they do.
But a good chunk of these might just be humans, so we did some further digging on that here : https://propensitylabs.substack.com/p/humans-on-moltbook-do-they-change
Hope this is useful. Any feedback appreciated!