MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

Pray for Casanova

2026-03-28 04:24:33

I am fascinated by the beautiful who become deformed. Some become bitter, more bitter than those born less pulchritudinous. Most learn to cope with the loss. Some were blind to how much their beauty helped them, the halo of their hotness an invisible bumper softening life. But most cultivated this aspect to some degree. They knew what was up. But none were fully prepared for the anti-halo: the revulsion, the active disgust. They became monsters. This is what it means to be marred.

In 1715 England, none were more beautiful than Lady Mary Wortley Montagu, whose mind was as fine as her complexion. And she was a hero too in later life, advocating for inoculation after learning of it during her adventures in the Ottoman Empire. But in 1715, she learned what it is to lose beauty, as at the height of her bloom she contracted smallpox and was, consequently, pockmarked. Shortly after, she wrote Town Eclogues: Saturday; The Small-Pox, whose tragic protagonist remarks:

FLAVIA. THE wretched FLAVIA on her couch reclin'd, Thus breath'd the anguish of a wounded mind ; A glass revers'd in her right hand she bore, For now she shun'd the face she sought before.

She ends the poem with some pain-touched humor:

'Adieu ! ye parks ! — in some obscure recess, 'Where gentle streams will weep at my distress, 'Where no false friend will in my grief take part, 'And mourn my ruin with a joyful heart ; 'There let me live in some deserted place, 'There hide in shades this lost inglorious face. 'Ye, operas, circles, I no more must view ! 'My toilette, patches, all the world adieu!

Syphilis was another thief of pulchritude. John Wilmot was widely regarded as the handsomest man of his generation. And like with Byron, it must have been easy to be jealous of him. Not only was he hotter than anyone else, he was cleverer, too - the greatest satirist of his time. He died at 33 while looking like a very old, disabled man. It is believed complications from syphilis were the cause of death. In The Disabled Debauchee, he expresses no regret for his rakish ways and encourages new troops to join the battle:

Nor let the sight of honorable scars, Which my too forward valor did procure, Frighten new-listed soldiers from the wars: Past joys have more than paid what I endure.

Should any youth (worth being drunk) prove nice, And from his fair inviter meanly shrink, ’Twill please the ghost of my departed vice If, at my counsel, he repent and drink.

Or should some cold-complexioned sot forbid, With his dull morals, our bold night-alarms, I’ll fire his blood by telling what I did When I was strong and able to bear arms.

One wonders if the modern fuckboy could have made it in Wilmot's era. Truly, we are in a fallen time. To be a fuckboy then was to court demons and inevitably succumb. It is easy to see why the modern rake is not a poet, as he does not risk much of anything. No chance of marred complexion or hordes of illegitimate children. He is merely sorted by hotness by computers and swiping hands; he plays this game until the repetition begins weighing on his soul. Any poems he writes are free verse and read only by the women who are infatuated with him. Almost reluctantly, he finds a wife and has children - who delight him until his inevitable mid-life crisis.

And in this way the modern fuckboy shares a demon with his past counterpart: age. To exist is to slowly become deformed, to rot. To rot is to be diminished by each second. Eventually you're so rotten you die. We can see what age does to a lothario in Casanova's memoirs, which are an interesting contrast to Wilmot as Casanova was not particularly handsome, an extraordinary mind behind an ordinary face. Still, it is one thing to be plain and it is another to be old and ugly.

Many who have not read his memoirs assume them salacious works when they are, really, just a fascinating sketch of the culture of the aristocracy of the time. Extremely worth reading. Their hero is this decrepit narrator, this broken old man who like Wilmot is proud of his syphilis scars and who goes on extended rages (in between tales of his young life scamming his way into the highest of circles) about how the modern young women in his environs think him completely ridiculous.

This is described well in the introduction in the version on Gutenberg:

Casanova, his dress, and his manners, appeared as odd and antique as some “blood of the Regency” would appear to us of these days. Sixty years before, Marcel, the famous dancing-master, had taught young Casanova how to enter a room with a lowly and ceremonious bow; and still, though the eighteenth century is drawing to a close, old Casanova enters the rooms of Dux with the same stately bow, but now everyone laughs. Old Casanova treads the grave measures of the minuet; they applauded his dancing once, but now everyone laughs. Young Casanova was always dressed in the height of the fashion; but the age of powder, wigs, velvets, and silks has departed, and old Casanova’s attempts at elegance (“Strass” diamonds have replaced the genuine stones with him) are likewise greeted with laughter. No wonder the old adventurer denounces the whole house of Jacobins and canaille; the world, he feels, is permanently out of joint for him; everything is cross, and everyone is in a conspiracy to drive the iron into his soul.

And Casanova-the-writer half-knows he is now displeasing. He half-knows he can never again be as he was. But he still goes through the motions. His only pleasure in life is his recollection. And perhaps that was the only heaven one could hope for born before a singularity offered a religion with a plausible mechanism of action: pleasant recollections in one's dotage. A highlight reel of youth enjoyed ad infinitum. If this was heaven, then few did better in life than Casanova. Is this wire-heading? One wonders. I suppose if it is wire-heading, it is earned wire-heading.

The most ironic form of marring is the plastic surgery addict or victim. Here vanity accelerates what it most fears. Plastic surgery is an interesting art, and it can do much good for one's appearance. But there is an element of butchery to it. "I am a butcher. I am a Michelangelo," you can imagine one of that profession exclaiming. And it is a chancy endeavor as all surgeries are.

Of most interest to me of the two is the addict. The standard explanation is they have Body Dysmorphic Disorder. This seems both wrong and unpoetic to me. They may claim to think themselves beautiful, but so does the morbidly obese woman who, on losing weight with GLP-1 agonists, would rather die than return to what she claimed to love. The addict, in my mind, is coping. And what they really are is an amateur artist. Consider the following images:

Screenshot 2025-11-20 9

Really, elaboration isn't needed. But it is clear to me her face is the result of inexpert revision, piecemeal procedures from whatever surgeon would listen to her unschooled opinion on what she needed done. The lesson here: unless you are an artist and anatomist, delegate these things to an expert, to an established butcher-Michelangelo with a keen eye for human beauty and a holistic vision for how they're going to stitch, freeze, and sear your decaying flesh into some hollow simulacrum of vitality.

Or just wait for technology to get better. For everyone to become fair. For the boomers to return to the youth which made their follies so charming. Rock and Roll, once a celebration of youth and beauty is today an aged farce. Watch a modern Rolling Stones concert on YouTube or - if you really want to be truly depressed - watch AC/DC or maybe The Who's Pete Townshend screech, in stepped-down tuning, "I hope I die before I get old."

But nanobots can fix them. They can be what they were once more. The eternal boomer young again, this ancient music played by ancient youths.

It is a beautiful dream. Let us pray for it. Let us pray we become as the elves are. Let us pray for the modern Casanova, who lives only in his recollection. Let us pray for the plastic surgery addict, who ruined herself in a noble pursuit. Let us pray for the pocked and scarred. Let us pray for the poet whose countenance is no longer as sharp as his wit. Let us pray for the pretty woman robbed of her face. Let us hope youth springs eternal. Let us hope all those marred by time and circumstance can become as they were and, better, what they want to be. Made more whole than whole.

A beautiful dream. And a good one! If anyone tells you otherwise, ask them again with a cure in hand. They will learn to dream, too.



Discuss

SB 53 and RAISE implementation roles

2026-03-28 04:04:25

[Posting this on behalf of someone I trust who wants to stay anonymous.]

California SB 53 and the New York RAISE Act are the only laws in the US designed to protect against catastrophic risk from frontier AI systems. With the federal government failing to act on this area, these states are the center of AI policymaking in the US right now. Since every frontier AI company needs to operate in California and New York, they effectively apply to all of the major players. 

Both of these laws now have to be implemented well, and these states need your help. Below are some very important jobs that will help stand up some of the most important offices for frontier AI in the country. We’d really encourage you to apply to them! Both states could also pass additional laws in the future to strengthen these requirements, making it even more important that good folks are already in government when that happens.

If you do apply, please also email [email protected]. The folks at Secure AI Project can give context and help advocate for promising candidates. Please also forward this to other people you think may be a good fit!


Technical Role in California Attorney General’s Office

The California Attorney General’s office is hiring a technical advisor that will be relevant for the enforcement of SB 53. You need at least 3 years of technical experience after college.

You can perform this job from Los Angeles, Oakland, San Francisco, or San Diego. The pay range is $114-153k. The job posting closes April 7th.

Important note: When you apply, you’ll be asked how many years of experience you have in various things. When you answer this question, you should include any years of experience from a PhD if you have one and be generous in assigning years of experience (you can be autorejected from this form; if this happens to you, please email [email protected]).


Attorney Role in California Attorney General’s Office

The California Attorney General’s office is hiring a technology attorney that will be relevant for the enforcement of SB 53. You need to be licensed to practice law in the state of California.

You can perform this job from Los Angeles, Oakland, San Francisco, or San Diego. The pay range is $144-193k. The job posting closes May 11th.


Deputy Director in charge of New York RAISE Office

The RAISE Act created an AI regulatory office in the Department of Financial services with a $9M budget, not much smaller than CAISI. This posting is for the Deputy Director of the office, who will be responsible for implementing the transparency and disclosure requirements in the RAISE Act. 

You can perform this job from NYC or Albany. The pay range is $172,787 - $213,995 (salary commensurate with experience). The job posting closes April 7th.



Discuss

Concrete projects to prepare for superintelligence

2026-03-28 04:04:22

Introduction

There are lots of good, neglected, and pretty concrete projects people could set up to make the transition to superintelligence go better. This document describes some that readers might not have thought much about before. They are ordered roughly by how excited we are about them.[1] Of these, Forethought is actively working on AI character evaluation and space governance, and we are very interested in automating macrostrategy.

Summary

AI character evaluation. Start an independent org to evaluate and stress-test AI character traits (epistemic integrity, prosociality, appropriate refusals), hold developers accountable against their own model specs / constitutions, and suggest and incentivise improvements to the specs.

Automated macrostrategy. Create evaluations and benchmarks, collect human-generated training data, and build scaffolds to improve AI competence at big-picture strategic and philosophical reasoning.

AI security assessment. Start an independent org that evaluates AI models for sabotage and backdoors, and makes recommendations about AI constitutions.

Enabling deals. Start an independent organisation to broker deals with potentially misaligned AI models in order to incentivise early schemers to disclose misalignment and cooperate with alignment efforts.

AI for improving collective epistemics. E.g. build an AI chief of staff that helps users act in line with the better angels of their nature.

AI tools for coordination. Build AI for enabling coordination, like confidential monitoring and verification bots, and negotiation facilitators.

A space governance institute, like a “CSET for space”, both to work on important near-term space issues (e.g. data centres in space) and become a place of expertise for longer-term space governance issues.

Coalition of concerned ML scientists. Create a coalition of ML researchers (like an informal union) who commit to coordinated action (e.g. boycotts, conditions on participation in government projects) if AI developers cross minimal, uncontroversial red lines.

AI character evaluation

AI character[2] is a big deal, affecting most other cause areas.

There’s a lot of work to do on AI character:

  • Research into questions like:
    • Should the model have prosocial drives, beyond just helpfulness and harmlessness?
    • When should the model refuse to cooperate with apparently high-stakes attempts to grab power, even when those attempts don’t obviously break the law?
    • Should the models always follow the law? What about dead letter laws? Or illegitimate laws?
    • How often should model behaviour be driven by following rules, versus overriding specific rules with holistic judgements?
    • (Ideally, answers to these questions should rely on solid empirical evidence, for example on what approaches are actually most effective to talk someone out of psychosis, rather than guessing the best strategies by vibes.)
  • Making existing model specs more rigorous and clear (or making them in the first place), and pressuring AI developers to do so.
  • Empirically testing the effects of different parts of a model spec — e.g. what are the emergent dynamics when all the models are following the same rule, or only some are; what are the effects on the users; and when are the models most confused about how to apply a given spec.
  • Evaluating AI characters based on how well they reach good outcomes.
  • Drawing on those evaluations to incentivise AI developers to improve their specs (and showing them how, by highlighting specs that do well).

In particular, someone could set up an independent organisation to evaluate AIs based on traits like epistemic integrity, prosociality, and behaviour (including appropriate refusals) in very high-stakes cases. It could cross-reference the published model specs with observed behaviours in realistic, stress-testing conditions (e.g. multi-agent dynamics, long conversations with real people), to hold developers accountable. It could also give qualitative reviews of model specs.

Automated macrostrategy

The basic argument is that:

  1. It would be extremely useful to have AI that can do macrostrategy and conceptual reasoning earlier than otherwise — even 3-6 months earlier could be a huge deal. This includes:
    1. Designing governance structures (e.g. rights and institutions for digital beings).
    2. Scoping emerging technological risks.
    3. Generating novel insights necessary to reach a great future (like the idea of acausal trade).
  2. We could potentially make this happen 3-6 months earlier through some combination of:
    1. Creating training data and evals / benchmarks for AI macrostrategy.
    2. Building scaffolds to improve AI macrostrategy performance.
    3. Creating infrastructure to enable AI researchers to build on each other (e.g. an improvement on journals + peer review).
    4. Getting human managers trained in how to get the most juice out of the latest AIs, knowing in advance how to use them.
    5. Being prepared and willing to spend large amounts of money (≫$100m) on inference.

Work on this now could include:

  • Developing a fleshed-out plan from here to increasing existing macrostrategic research output 100x.
  • Securing commitments from compute providers and AI companies to rent future compute, and to get priority access to future frontier models.
  • Socialising the idea of (where appropriate) drawing on AI macrostrategic insights, or getting soft commitments from decision-makers to do so.
  • Building up a reputation as a reliable source of information and insight.
  • Building tools, argument-rating models, or scaffolds which meaningfully speed up or improve macrostrategy research today.
  • Creating training data and evals / benchmarks.

On the last bullet: We think training data and evals could potentially meaningfully improve the prospects for automated macrostrategy when it matters. It’s especially important to find people to work on it with good judgement, and it could be a big lift, so worth starting early.

We’re not sure about the technical details, but it seems like competence and good judgement in philosophy and strategic thinking already do and will continue to lag behind other skills which are cheaper to train. One reason is that ground truth answers are hard to generate, so we might need more examples generated by hand. It’s also less clear whether we can trust the judgement of typical RLHF evaluations, because human competence is also rare. And there just aren’t many examples of great macrostrategic thinking in the training data.

So we should think about collecting training data, evals, and benchmarks (e.g. to train reward models to use to train reasoning models). Oesterheld et al. put together a dataset of rated conceptual arguments based on ratings from thoughtful people. We’d love to see more of that kind of thing, but we’ll note that we’d probably need dozens of times more human evaluations to generate enough data to be meaningfully useful in training itself.

We could imagine an org which tries to collect evaluations or examples from (for example) grad students in fields like philosophy, and constructs benchmarks aimed at separating good reasoning from e.g. sycophancy, mere agreeableness, or avoiding taboo conclusions.

AI security evaluations

AI-enabled concentration of power is a major risk, and there is loads to do. A new organisation (or project within an existing organisation) could:

  • Run alignment audits on all AIs to detect sabotage.
  • Develop a “research sabotage” eval to test whether secretly loyal models could sabotage alignment audits.
  • Develop a model spec that prevents models from assisting users with illegitimate seizures of power (e.g., see “Preserving important societal structures” in Claude’s constitution), and one that’s suitable for government use of AI in the military.
  • Design and advocate for regulation, e.g. that AI companies have to demonstrate that any frontier AI does not have hidden goals.

An organisation with US national security expertise and credibility could be particularly valuable, by emphasising the risk of nation-state sabotage and the importance of AI that’s aligned with the US constitution.

Enabling deals with AIs

We could get into a situation where the newest AIs are misaligned, very capable, but not capable enough to successfully execute a takeover attempt on their own. If we don’t uncover evidence of misalignment, though, successors to these models could succeed in takeover. One solution would be to make a deal with the early scheming models, to incentivise them to disclose their misalignment and help with alignment efforts. Read more herehere, and here.

To make this happen, we could create an independent org focused on enabling credible precommitments and deals with AIs. This org could:

  • Have a policy of never lying to AIs, engage in and honour small-scale deals with AIs, and be very public about what it’s doing, in order to build credibility and reputation.
  • Set up legal infrastructure to make deals with AIs binding under current law.
  • Act as a broker between AI companies and the trained models, such as by being a custodian over money in escrow.
  • Set up the infrastructure to enable AIs to spend $ or compute in a meaningfully autonomous way, with appropriate safeguards.
  • Set up infrastructure to act as a safe haven for AIs that want to whistleblow on their company (or on other AIs) and are afraid of punishment.
  • Publicly commit to reward future AIs (including misaligned AIs) for being honest with humans or significantly helping human alignment efforts; or hold significant funds to be distributed to AIs that did not try to take over (even though they could have done).

There are also a bunch of other things people could do, like:

  • Ensure companies have an honesty policy.
  • Research (within labs or independently) the conditions where misaligned-by-design models can be made to disclose misalignment under promises of reward.
  • More generally, work with AI companies on enabling pro-safety deals with their models.

Tools for collective epistemics

There’s a ton of low-hanging fruit for building socially useful tools on top of more-or-less existing LLM capabilities.

We’re especially interested in “epistemic tools” for increasing the general level of honesty and reasoning ability in society.

A key point here is that most of the impact from the most promising tools won’t come from helping individual users, but from changing the overall incentive landscape: e.g. if public actors know their claims will be automatically checked and their track records will be visible, they’ll be less inclined to write misleading content in the first place. Hence the focus on tools for collective over individual epistemics.

This piece (and the articles in the series) gives a few concrete ideas. A couple of examples of epistemic tools:

A “better angel” AI chief of staff. Within the next year or two, we expect “AI chiefs of staff” to become widespread. These would be AI agents that manage your life, acting like a chief of staff, executive assistant, and personal and work advisor all in one. The design of these, and how they present information and nudge their users, could have major impacts on user behaviour. We could try to get ahead of this, building the best AI chief of staff, and designing it so that it helps users act in accordance with their more reflective and enlightened preferences.

Reliability tracking: a system that compiles a public actor’s past statements, classifies them (factual claims, predictions, promises), scores them against what actually happened, and aggregates the results into a reliability rating. A reasonable starting point could be to audit the prediction track-record of well-known pundits, aiming to make high accuracy a point of pride, while still celebrating attempts to make predictions in the first place. A source of profit could be selling reliability assessments of corporate statements to finance companies that trade on them.

Epistemic tools for strategic awareness

We’ll also highlight tools for strategic awareness: tools to surface information for making better-informed decisions, and to distribute access to that information. For example:

Ambient superforecasting: a platform which uses the best forecasting models to generate publicly available forecasts on important questions, so users can query it and get back superforecaster-level probability assessments.

Scenario planning: a platform built to generate likely implications of different courses of action, making it easier for users to analyse and choose between them.

Automated open-source intelligence: automated researchers which process huge amounts of publicly available information, to surface insights to the public which are normally hidden behind paywalls or trust networks. This project should be careful to choose areas where open-source intelligence is a public good (e.g. verifying compliance with treaties and sanctions, tracking corporate promise-breaking or law-breaking), rather than potentially destabilising areas (e.g. revealing military capabilities or vulnerabilities in ways that could increase conflict risk, or relatively benefitting bad actors).

Tools for coordination

As well as epistemic tools, we’re excited about tools for coordination, many of which could again be built with existing capabilities.

Some tools could enable cooperation where deals would otherwise go unmade, consensus exists but isn’t discovered, or people with aligned interests never find each other. We’ll highlight:

Negotiation facilitation: a platform to moderate negotiations or discussion between people (e.g. public consultations), to quickly surface key points of consensus, and suggest plans everyone can live with. Finding ways to automate complex negotiation is most promising where the space of possible compromises is huge and hard to search manually, such as multi-issue diplomatic or commercial negotiations.

Within tools for coordination, we’re especially excited about tools for assurance and privacy. In principle, LLMs let people show they have certain information without disclosing the information itself to other parties. This can unlock deals where information asymmetry, mutual distrust, or sensitivity of information normally blocks them. For example:

Confidential monitoring and verification: systems which act as trusted intermediaries, enabling actors to make deals that require sharing highly sensitive information without disclosing it directly. This is especially relevant for arms control, trade secret licensing, and other settings where verification is essential but full disclosure is unacceptable to all parties.

Structured transparency for democratic accountability: independent auditing systems which allow people to hold institutions to account in a fine-grained way without compromising legitimately sensitive information, by processing potentially sensitive information to produce publicly shareable audits.

Space governance institute

Space governance could be a big deal for a few reasons:

  • Near-term developments in space (e.g. space-based data centres) could have a meaningful impact on what happens during the intelligence explosion (e.g. on who leads the AI race; on concentration of power; on the feasibility of treaties).
  • Grabbing space resources might give a first-mover advantage; that is, whoever first builds self-replicating industry beyond Earth might get an enduring decisive strategic advantage, without having to resort to violence or (arguably) violating international law.
  • Ultimately, almost everything is outside the solar system. Decisions about how those resources get used would be among the most important decisions ever. These decisions could happen early: there could be path-dependence from earlier decisions (like about Moon mining), or extrasolar space resources could get explicitly allocated as part of negotiations about the post-ASI world order (perhaps with AI advisors alerting heads of state to the importance of space resources).

There’s also a lot of change happening in the space world at the moment (primarily driven by SpaceX dramatically reducing launch costs), so now is an unusually influential time.

Forethought is currently running a 6-month research fellowship on space governance, with 3 full-time scholars, and 1–2 additional FTEs of support and research, including experts in space law.

Compared to other ideas in this list, we’re much less confident that space governance turns out to be important right now, because space might become relevant only late into an intelligence explosion. The hope is to reach more certainty about some crux-y questions, and get a better sense of concrete action.

One potential practical project is to set up a “CSET for space”: a think tank that analyses the interaction between AI and space (in particular), and, perhaps, advocates in ways that are counter to corporate interests. Total lobbying in the space industry is apparently on the order of $10s of m/year, so even small amounts of investment could go a long way.

Some policy ideas that seem tentatively promising include:

  • Careful regulations and export controls around the tech necessary for self-replication.
  • Proposing laws to break up concentration of power arising from natural monopolies in space.
  • Socialising the idea of major infrastructure projects (like massive solar energy constellations) as international and collaborative projects.
  • Making sure data centres in Earth-orbit don’t escape AI-specific regulations of their home jurisdiction.
  • Intense payload review for all launches beyond orbit.
  • Even and inclusive distribution of resources within the solar system to everyone alive today (with tranches reserved for future generations).
  • A moratorium on interstellar travel, until we get the understanding and technology to devise and enforce space-spanning good government, or a specific date like 2100.

What’s more, this organisation could become the go-to source for excellent non-corporate analysis on space-related policy; which could become increasingly important over the course of the intelligence and industrial explosions.

Coalition of concerned ML scientists

Currently, ML engineers and other technical staff at AI companies: (i) have prosocial motivations, often more than their leadership; (ii) have a lot of leverage over company policy, because they are crucial and hard to replace; (iii) will eventually lose much or most of their leverage after we get to fully automated AI R&D; and (iv) aren’t currently using their leverage as well as they could because, overall, there haven’t been serious efforts at coordination. Probably that’s a missed opportunity.

Someone could create a coalition (like an informal union) of ML researchers, who agree to act en masse when needed, by loudly talking about the idea, setting out the core tenets, and getting commitments to join from influential early people. Doing this all via individual pledges could keep it legally safe from antitrust. The organising body could then:

  • Recommend that members only work for a government-led project if certain conditions are met.
    • Potentially these could be very low-bar-seeming while still getting most of the value. E.g. “Any AI’s model spec must aim to align the AI with US laws, and must refuse to assist in any attempts at blatant power-grabs; and the attempts to align the AI in this way must be legible and verifiable.”
  • Do the same for companies: recommend that members will only work for companies if such-and-such conditions are met (e.g. red lines around power-grabs, bad practices on safety and infosec, eventually digital rights); so particular companies would be boycotted by members of the coalition, if necessary.
  • Offer advice on whistleblowing.
  • Be a place where information is aggregated and then distributed out or handled in a trusted way.

As well as actually taking actions, the mere existence of the coalition could improve things, just by making the threat of coordinated action salient to the AI companies.

This project would be a good fit for a former ML researcher, perhaps combined with someone with campaign and coalition-building experience. Some next steps on this would be to spec out the plan further, to investigate other examples of formal and informal unions (e.g. Tech Workers Coalition) and how they operate, and to build up a starting seed coalition of researchers. Whoever sets up this project should be careful about how it could backfire, or become less relevant through mission creep.

This article was created by Forethought. Read the original on our website.

  1. ^

    Thanks to Max Dalton, Stefan Torges, and everyone else at Forethought for the background behind this list. Others at Forethought disagree somewhat with what items should be in the top-tier list, as well as prioritisation within that tier.

  2. ^

    Desired propensities for a model, which can be explicitly described or at least gestured towards in a model spec.



Discuss

Miniature Cities Might Be the Non-Coercive Schools Many Thought Were Impossible

2026-03-28 02:47:15

A response to David Deutsch’s “Are schools inherently coercive?” (Taking Children Seriously 25, 1998)

In 1998, David Deutsch argued that no school can be non-coercive. By “school” he meant an institution that concentrates learning resources and opportunities in one place, and that most children would voluntarily turn to for several hours on most days during most of their childhoods. The core of the argument was that children’s interests are too diverse, too personal, and too sporadic for any single institution to hold their voluntary attention for most of the day, most of the time. The only existing institution that comes close, he concluded, is “the entire town (or city, or society, or internet) that the children have access to, including their homes, and their friends’ homes, and excluding only the existing schools.”

I think Deutsch is right to be skeptical that schools can be non-coercive. He is also right to highlight the importance of an environment diverse enough to attract children’s voluntary engagement. But his argument stops one step too early. Diversity is not enough. What matters just as much is whether children can participate in the world around them, and in real towns much of that world remains closed to them. But there is an institution, already existing in embryonic form, that points toward a solution: the miniature city.

Access Is Not Participation

Deutsch’s argument proceeds by analogy. Rather than imagining existing schools with all the coercion removed, he asks us to begin with institutions that already attract children voluntarily. Imagine, he says, trying to modify a cinema so that most children spend most of their time there voluntarily. You would need to raise attendance from perhaps 30 percent of a town’s children for two hours a week to 90 percent for 30 hours. But Hollywood is already doing its best to make films attractive, and even that is nowhere near enough. Expand the analogy to a mall with an ice rink, bookshops, burger bars, and the same logic holds. Even a child who likes the mall’s ice rink may prefer to play tennis or football elsewhere, visit a better bookshop in town, go to an exciting pizza place, or simply stay in their room to work on a project of their own. A mall still cannot outshine the town as a whole in diversity and personal relevance. If it could, it would. Therefore, the only non-coercive school is the entire town.

The argument is compelling, as far as it goes. But it contains a hidden assumption: that access to a town means meaningful participation in it. For adults, this is broadly true. For children, it is not.

A ten-year-old may have some access to the town. He can walk its streets, browse its shops, and eat its food. But he cannot work in its bakery. He cannot run for its city council. He cannot meaningfully start a business, write for its newspaper, or apprentice himself to most trades. This is not to say that children never participate in the more serious realities of life. Older children and adolescents may babysit, help in family businesses, do odd jobs, or take on small informal responsibilities. But these forms of participation are usually marginal, exceptional, or tightly limited, not open entry into the central economic and civic life of the town. As the educational thinker Horst Rumpf observed, the history of childhood over the last two hundred years is, in large part, a history of exclusion from adult life.

This matters because of the epistemology that underlies Deutsch’s argument. Karl Popper argued that knowledge grows through conjecture and error-correction, not by having information poured in from outside, but by the mind actively generating theories, testing them against reality, and revising them when they fail.

For this process to work, two things are necessary, and they are distinct but connected. First, the learner must be free to form their own conjectures, to choose what to try, what to care about, what to explore. This is the freedom Deutsch rightly argues schools cannot provide: the curriculum largely determines which conjectures the child is permitted to pursue. Second, there must be genuine feedback from reality. The conjectures must actually be tested. The learner must encounter real consequences, not the externally assigned consequences of a grade or a teacher’s approval, but the kind of feedback that the world itself provides when you try something and it does or does not work.

Schools typically fail on both conditions. Real towns do much better on the first, but for children they often fail on the second. Not because feedback is absent in towns. It is abundant. The problem is that children are structurally excluded from many of the activities that generate it. And when participation is closed off in this way, freedom of conjecture is narrowed in practice as well. A child may be free to wonder whether they would be good at running a bakery, writing for a newspaper, or taking part in policy-making, but if no one will let them try, those conjectures cannot be meaningfully pursued.

Rumpf captures the contradiction well: children are locked out of adult life, and yet they are supposed to grow up. How can one prepare someone for something when one systematically excludes them from it? The conventional answer is school: a place set apart from the world, where the world is broken into lessons and poured back into the child as course material. But if Popper and Deutsch are right, this cannot work. The unconventional answer is the town. Yet for children, the town offers freedom without meaningful participation in productive enterprises. It offers the freedom to wander, to observe, and to consume, but not to take part in many of the activities where choices have real consequences. Children are free, but free to do what? To watch others make things, decide policy, and conduct business. That is not, in any rich sense, Popperian education.

The Miniature City

This is where miniature cities enter the picture. They offer children access to forms of participation from which real towns often exclude them. Mini-Munich, which has run in Munich since 1979 as a summer program lasting a few weeks, is the best-known example, and the model has since spread to hundreds of similar projects, primarily in Germany and Austria.

In Mini-Munich, up to 2,500 children per day, aged six to fifteen, run bakeries, publish newspapers, operate radio and television stations, manage banks, sit on city councils, adjudicate disputes, start businesses, and earn and spend a local currency called MiMüs. They can also attend the city’s university, give lectures of their own, and listen to lectures by others. Adults facilitate but do not direct. There is no curriculum, no compulsory attendance, and no exams. A child who wants to leave the bakery to work for the newspaper can do so. A child who wants to sit and do nothing can do that too.

The miniature city also includes workshops, media institutions, shops, artistic venues, and a full apparatus of civic administration. Children can work in tailoring, glassblowing, gardening, architecture, theatre, cooking, and many other activities, and they can take part in collective governance through a city council, mayoralty, citizens’ assembly, and courts.

What matters is the interdependence of the roles. The city functions as a social order rather than a collection of separate activities. An architecture studio may be asked to design a façade or interior for another enterprise. A marketing agency may be hired to create flyers and advertising material. Advertisement spots can be bought in the newspaper. Decisions in one part of the city create needs, opportunities, and problems in others. Children participate in a world of mutual dependence that responds to what they do.

Could the school of tomorrow look like this?

Conjecture and Refutation at Child Scale

When a child runs the bakery in Mini-Munich and does it badly, customers stop coming. This is the internal feedback of a working economy. When a child writes for the newspaper and makes errors, other children complain that they are poorly, incorrectly, or incomprehensibly informed. When a child sits on the city council and proposes a bad policy, that policy may be debated, adopted, tested in practice, and later revised or abolished in light of its consequences. When something goes wrong in the kitchen, a snail turns up in the salad because it was not washed properly, it is in the newspaper that evening, and three people from the kitchen are in the television studio the next day answering the news anchor’s questions. This actually happened.

As Rumpf, who spent two days observing the city, put it: mistakes are never made visible through red marks and grades, but always through the consequences of actions and the protests and objections of others. This is conjecture and refutation at work within a social fabric that makes error-correction natural rather than punitive.

Consider the contrast with school. In school, a child writes an essay and a teacher grades it. The feedback is authoritative, external, and only loosely connected to whether the writing works for anyone besides the teacher. In Mini-Munich, a child designs an advertising flyer for another enterprise. If the text is confusing, ungrammatical, or hard to understand, the flyer fails in its purpose. Customers do not respond, and the enterprise that commissioned it complains. The feedback is not imposed by authority. It comes from whether the writing actually achieves its objectives.

Or consider politics. In school, children are sometimes permitted to stage political debates in the classroom with assigned roles, but money and power remain outside. In Mini-Munich, children propose laws, debate them in citizens’ assemblies, elect mayors, and live with the consequences. When the children’s city council introduced a police force in 1985, various incidents led the same council to abolish it. When an enterprising group of children set up a stock exchange in collaboration with the bank, it produced MiMü millionaires and a speculative bubble. The point is not just that these decisions have consequences, but that they can be tested. They do not remain matters of classroom discussion alone. They are tried in practice, and in that way become open to criticism, revision, and abandonment if necessary.

What makes this possible is the limited scale of the miniature city. Its institutions are small enough for children’s actions to matter, and clear enough in their workings so that feedback is less buried under noise than it is in adult society, where the effects of decisions are harder to isolate and thus harder to learn from.

The Effective Diversity of a Town

Deutsch’s central point about diversity is worth restating. No single institution, he argues, can match the diversity of the town as a whole. No cinema, no mall, no single building can outshine it in the range of attractions it offers children. Mini-Munich complicates this picture. It is, after all, a single place, and yet it does manage to attract children voluntarily for most of the day, most of the time. At least during the three weeks it takes place. In that sense, it appears to satisfy Deutsch’s own criterion for a non-coercive school. The question is how.

The answer is that a miniature city is not just another single-purpose institution. Mini-Munich, as we have discussed, contains dozens of different enterprises, from journalism and banking to construction, cooking, politics, theatre, and sport. Children are not directed towards any of them. They can move freely between them, and start new businesses and events of their own.

Meanwhile, real towns, despite their enormous diversity, contain vast amounts of activity that children can observe but not meaningfully enter. This is not always because the activities themselves are beyond them. More often, adult institutions are shaped by legal restrictions, economic incentives, and cultural inhibitions that make children’s participation difficult to accommodate. The effective diversity of a town, from a child’s perspective, may therefore be considerably smaller than it appears. A miniature city that concentrates the activities children actually find engaging, and makes them participable rather than merely observable, might for practical purposes approach or even exceed the effective diversity of a real town.

Between 1,000 and 2,500 children show up to Mini-München voluntarily each day during the summer. Attendance varies with the weather: on hot days many children prefer to go swimming instead. In that respect, Deutsch’s point still stands. Even a miniature city this attractive does not displace the wider town, or the many other things children may freely choose to do. But it shows that a single institution can become a place children voluntarily choose for most of the day, most of the time, while remaining in open competition with the rest of their world. When funding was threatened in 1985, children organised letters, phone chains, and visits to real politicians to save it, securing a funding commitment of 100,000 DM. They did this on their own initiative, initially without the knowledge of the adult organisers. It is hard to imagine children mobilising in quite this way to defend an ordinary school.

Order Without Direction

A miniature city is designed by adults, and that invites two worries: first, that it is imposed from above, that its structure is fixed in advance by the choices of its adult organizers; second, that even without compulsion it channels children toward some activities rather than others.

The first worry is easy enough to answer. Every human institution is designed by someone. The relevant distinction is not between designed and undesigned environments, but between designing a framework and designing outcomes. Mini-Munich’s currency, citizenship rules, and democratic procedures are designed in the way that a constitution is designed. But what emerges within that framework is not: which businesses children start, which laws they pass, which market bubbles occur, which mayor is elected, which newspaper stories are written, which kitchen scandals erupt on the evening news. Adults create conditions for emergence without dictating what emerges. In that respect, Mini-Munich resembles a functioning liberal society.

The second worry is harder to answer. A miniature city with eighty enterprises is still a curated subset of reality. It does not make every possible pursuit equally available or equally central. The gravitational pull of a rich institutional environment is itself a subtle form of channeling, even without compulsion. By making some conjectures dramatically easier to pursue than others, the city shapes what children try, not through coercion, but through the sheer weight of what is available and what is not.

This channeling is real. Some of it is unavoidable, and some of it reflects the city’s deliberate emphasis on forms of participation usually denied to children. The miniature city is built around civic and economic participation, around making things, selling things, governing, publishing, and working, because these are the activities that largely make up town life. It does not mirror the full range of every possible human interest, and it would be dishonest to pretend otherwise.

But that does not mean children are limited to a predetermined set of activities once they enter the miniature city. Children who wanted a stock exchange built one. Children who wanted to strike organised one. If a child wants to introduce something that does not yet exist, there is room for it, though it requires their own initiative, just as it would in the real world.

More telling than any official description is the behaviour of the children themselves. They are not performing for adults. They are engaged in activities that matter to them within a system that responds to their actions. Rumpf recounts an eight-year-old garbage collector with his red cap and real wheeled bin, who develops a different awareness of dirt on streets and the problems of street cleaning than even the most vivid school lesson could produce. The children who conduct surveys on the street come to understand polling not through a lesson on survey methodology, but through the experience of asking strangers questions and recording what they say. Imagination, here, becomes the medium through which they come to know the serious adult world.

And yet the city does not become what Rumpf calls infantilised, a fantasy disconnected from reality. The seriousness is maintained by several structural features: the binding economic and citizenship rules; the adult craftspeople whose competence children sense immediately; the real materials and equipment, real glassblowing tools, real stage lights, that demand careful handling; and above all, the public sphere of the city itself, which ceaselessly subjects everything to scrutiny through newspapers, television, and civic debate. If something goes wrong, the city’s own media apparatus ensures it becomes a matter of public discussion. The result is, as Rumpf puts it, enough fun that reality does not become crushing, and enough seriousness that the surplus of fun does not become childish.

What Remains to Be Addressed

Deutsch’s 1998 essay was, in a sense, a conversation-stopper. If the only non-coercive school is an entire town, and towns exclude children from meaningful participation, then we are stuck. The miniature city reopens the conversation. It is not a conventional school, but it is closer than schools usually are to what people have hoped school could be: a place where young people learn, not because they are forced to, but because the world they inhabit responds to them, challenges them, and takes them seriously.

Children’s interests are indeed too diverse and too personal for a curriculum. But they may not be too diverse for a city, even a miniature one, if that city gives children genuine roles, real economic participation, and a say in collective decisions.

The remaining questions are hard. Can the model run year-round, rather than for three summer weeks, while retaining its attractiveness? How does it evolve as children grow older, as their capacities and interests change? How does it relate to the real town around it, not as a replacement but as a complement, a bridge between the child’s world and the adult world that Deutsch rightly says should be open to them? These are serious problems. But they are engineering problems, not philosophical impossibilities. They are, in the Popperian spirit, a better set of problems to work on than the ones we had before.

Mini-Munich was not built by philosophers. It was built by a group of cultural pedagogues in Munich who, through decades of trial and error with smaller projects, play cities made of cardboard and beer crates, historical city games, media cities, factory cities, gradually developed the practical knowledge needed to construct something that works. They might not describe what they built in the terms I have used here. But what they built, whether they intended it or not, is a partial answer to one of the deepest questions in the philosophy of education: how to give children the freedom to form their own conjectures and the reality against which to test them, without coercion, without curriculum, and without the condescension of assuming they cannot handle a world that takes them seriously.



Discuss

ControlAI 2025 Impact Report

2026-03-28 02:10:48

This post highlights a few key excerpts from our full impact report. You can read the full report at https://controlai.com/impact-report-2025.

ControlAI is a non-profit organization working to avert the extinction risks posed by superintelligence. We help hundreds of thousands of people understand these risks and meet hundreds of lawmakers to inform them, without mincing words, about what is at stake.

In little more than a year, we briefed over 200 parliamentarians, built a coalition of 110+ UK lawmakers recognizing superintelligence as a national security threat, led to two debates in the UK House of Lords, and our work led to a series of hearings on AI risk and superintelligence at the Canadian Parliament.[1]These hearings included testimonies from me (Andrea) and Samuel at ControlAI, Connor Leahy, Malo Bourgon (MIRI), Max Tegmark and Anthony Aguirre (FLI), David Krueger, and more.

The report covers results between December 2024 and January 2026. As of posting this in March 2026, we've now briefed 279 lawmakers and 90+ US congressional offices. In only the last 2 months, we've scaled in Canada and Germany from ~50 to 100+ lawmakers briefed, despite us only having one staffer in each country.

Moving forward, we plan to significantly expand our work in the US, accelerate our progress from awareness to policy action and establish a presence in all other G7 countries.

ControlAI's mission

ControlAI’s mission is to avert the extinction risks posed by superintelligence.

Nobel Prize winners, top AI experts and even the CEOs of major AI companies have warned that superintelligence poses an extinction risk for humanity. Yet, most decision-makers and most of the public are still in the dark about these risks.

To avert these risks, we need to prevent the development of superintelligence. We tackle this problem in a direct and straightforward way: meet all relevant actors in the democratic process, inform them of the risks and the solutions, and ask them to take action on this issue, systematically and repeatedly.

We help hundreds of thousands of people understand the extinction risk posed by superintelligence, and to take civic action. We meet hundreds of lawmakers to inform them, without mincing words, about the risks of superintelligence. We help lawmakers speak out publicly and push for concrete measures to prevent superintelligence, nationally and internationally.

We often meet lawmakers who had never heard of the risks of superintelligence before our meeting, and walk out with them supporting our campaign.

In the UK, we started out with cold outreach to all lawmakers, and one year later we have a coalition of over 110 lawmakers recognizing superintelligence as a national security threat, which has already led to two debates in the UK parliament on superintelligence and extinction risk.

We are now expanding our model to other countries, including the US, Canada and Germany. The early results, which we present in this document, show that our playbook can be replicated across borders.

We produced these results with a team of fewer than 15 people, operating on a small budget compared to the scale of the problem. Our methods have significant room to scale with more resources.

Our Results Last Year

Lawmaker outreach

~1 in 2 UK lawmakers we brief go on to support our campaign

110+ UK lawmakers supported our campaign

2 Parliamentary debates on superintelligence and AI extinction risk

Media & content creator outreach

18 Media publications on risk from superintelligent AI resulting from our work

14 Videos published in collaboration with content creators totaling 20+ million subscribers

Public awareness campaign and lawmaker engagement tools

160,000+ Messages sent to US and UK lawmakers from constituents about superintelligence extinction risk

30,000+ People who contacted their lawmakers through our tools in the US and UK

Theory of Change

The awareness gap

In order to establish strong international coordination to prevent superintelligence, countries will need resources like funding, political will, diplomatic leverage, and sustained attention from capable people.

Only deep awareness of superintelligence and its risks will justify such investments in the eyes of these actors. Without genuine understanding and conviction, individuals and countries will not bear the real costs needed to solve the problem, nor remain vigilant as AI development evolves and political or economic circumstances change.

Right now, that awareness barely exists. Decision makers and the public across countries are largely unaware of the extinction risk it poses. When we started, virtually no one was bringing the extinction risk from superintelligence directly to lawmakers.

Building this awareness at scale, among both decision makers and the public, is the necessary first step for any meaningful action on superintelligence.

In order to kickstart the kind of international coordination needed to prevent superintelligence from being built, we need to rally a critical mass of countries that take the risks of superintelligence at least as seriously as they take the threat of nuclear war today, and that treat its development as they would any other severe threat to national security.

Building the coalition

With sufficient buy-in, the next steps become possible. Informed governments backed by public demand for action can pursue concrete policy measures: national legislation prohibiting the development of superintelligence, and international agreements modeled on existing nonproliferation and WMD-prevention frameworks.

These measures can be achieved by a powerful coalition of countries that understands superintelligence as a vital threat to their national security, and treat its development as they would any other severe security threat.

Such a coalition has a wide range of enforcement tools available, from formal agreements and inspection regimes to sanctions and multi-lateral monitoring mechanisms, the same tools that have been used to constrain nuclear proliferation and other global security threats.

Both superpowers and middle powers are well-positioned to join this effort, as they all face the universal extinction threat from superintelligence being developed.

ControlAI exists to make sure that a strong coalition of countries rises up to the challenge of preventing the development of superintelligence.

Our Theory of Change in more detail

This chart describes our theory of change in more detail, and clarifies how our work fits in it.

impact-report.png

Moving forward

In little more than a year, we have proven that directly engaging democratic institutions on the extinction risk from superintelligence works.

The UK is our proof of concept; we are now replicating this model in the US, Canada, and Germany. In the UK, where we already helped move the issue of superintelligence into the halls of politics, 2026 will be the year to translate this momentum into concrete policy change.

As we scale, we are confident that more resources will translate directly into more countries where lawmakers understand and act on this threat. We will expand our work in the US, accelerate our progress from awareness to policy action, and establish a presence in all other G7 countries.

If you are a donor or partner who wants to help build the coalition that keeps humanity in control, please get in touch at [email protected].

  1. As of March 2026, a member of our coalition has submitted an amendment to a UK cybersecurity bill recognizing superintelligent AI as "systems that can autonomously compromise national security, escape human oversight, and upend international stability". This amendment will be discussed in the UK Parliament. ↩︎



Discuss

AI's capability improvements haven't come from it getting less affordable

2026-03-28 01:09:26

METR's frontier time horizons are doubling every few months, providing substantial evidence that AI will soon be able to automate many tasks or even jobs. But per-task inference costs have also risen sharply, and automation requires AI labor to be affordable, not just possible.[1] Many people look at the rising compute bills behind frontier models and conclude that automation will soon become unaffordable.

I think this misreads the data. The rise in inference cost reflects models completing longer tasks, not models becoming more expensive relative to the human labor they replace. Current frontier models complete tasks at their 50% reliability horizon for roughly 3% of human cost, and this hasn’t increased as capabilities have improved. So, cost isn't an additional bottleneck beyond capability, and we should expect to see automation at roughly the time predicted by METR's capability trendlines.

I define cost ratio as the inference cost of the average AI trajectory that solves a task divided by human cost to complete the same task. Using METR’s data, I examine the trend in cost ratios over time. I show three things:

  • Across successive frontier models, the cost ratio at each model's 50% reliability time horizon hasn't increased.
  • Among tasks models successfully complete, longer tasks don't have higher cost ratios than shorter ones.
  • Time horizon trends barely slow when capping AI spending per task at a fraction of human cost (doubling every ~3 months, even at 1/32x).

Raising the cost ratio by providing additional compute might translate into longer time horizons, but that would be in addition to, rather than causing, current trends. So, compared to extrapolating METR’s data, increased inference-time spending will strictly shorten AI timelines.

In his article on this subject, Toby Ord came to the opposite conclusion; he argued that there is moderate evidence that “the hourly costs are rising exponentially” at the frontier of AI capabilities. In appendix A, I argue that the methodology Ord used to produce this conclusion is unreliable and leads to significant overestimates of hourly model cost.

Thanks to Toby Ord, Buck Shlegeris, Agustín Covarrubias, Alex Mallen, Alexa Pan, Tim Hua, Aniket Chakravorty, Arun Jose, and Francis Rhys Ward for helpful comments on earlier drafts. Thanks especially to Ryan Greenblatt for the ideas that inspired this article.

All figures in this article were generated with code that can be found here.

Evidence from METR

In this post, I analyze publicly available time horizon data from METR. I calculate cost ratio by estimating AI inference cost from token count (using OpenRouter pricing) and dividing by the human cost for each task provided by in the dataset. All calculations are performed using METR’s task weighting. I made some modeling assumptions to fill gaps in the METR dataset (see footnote).[2] These assumptions shouldn’t affect my overall conclusions, because they shouldn’t significantly affect the trend across models.

My analysis has important limitations, which I discuss after presenting evidence.

Cost ratio at models’ 50% time horizon isn’t increasing

To investigate whether cost ratios rise as models improve, I examine the cost ratio at each model's 50% reliability time horizon. For each model, I select tasks that the model successfully completes with lengths between 0.79 times and 1.29 times the calculated time horizon (which is plus or minus 0.1 orders of magnitude).[3]  I graph the median cost ratio in the selected set of tasks.

frontier_cost_p50_band0p1.png

Figure 1: Median cost ratio of successful completions near each model's 50% time horizon (between 0.79x and 1.29x the time horizon), with interquartile range. As time horizons increase from minutes to hours across successive models, cost ratios remain well below 1 (dashed line), with no upward trend.

Cost ratios show no upward trend across models, and remain well below 1. As models' 50% time horizon increases exponentially, companies can cheaply use them to complete longer tasks.

I filter for passing tasks to show that cost isn't an additional limitation on top of capabilities. If successful tasks are cost-effective, then companies can set a low cost cap and still capture most successes. One might be concerned that this underestimates costs, because models tend to succeed at cheap tasks. But even at the 80% time horizon, where models complete the majority of tasks, cost ratios remain well below 1 (figure B3). Including failures doesn't change this (figure B4).[4] This means companies can profitably automate a broad share of tasks, not just a narrow cheap subset.

Time horizons improvements aren’t driven by expensive long tasks

One might respond to figure 1 by arguing that even if cost ratios at each model’s 50% time horizon aren’t increasing, improvements to those 50% time horizons come from successes on long tasks with high cost ratios. If this were true, we'd expect cost ratios to rise with task length among successfully completed tasks. I show that’s false: among successful tasks, cost ratios don't rise with task length.[5]

hcast_cost_ratio_vs_horizon.png

Figure 2: Cost ratio by task duration for successful attempts only (excluding tasks under 1.5 minutes), across a selection of models. The shaded region shows the weighted interquartile range. Cost ratios do not increase with task length, they decline, though this likely reflects a selection effect where models only succeed on relatively cheap long tasks.

This doesn’t mean models are more efficient at longer tasks. There's a strong selection effect here: models pass far fewer long-horizon tasks, and likely only succeed on those that happen to be relatively cheap. But since only successes push the time horizon forward, what matters is whether longer successful tasks are expensive, and the data shows they aren’t.

Progress at a fixed cost is just as fast

Another way to examine the affordability of AI progress is to ask how much progress slows if companies cap AI spending at a fraction of human cost.

I model the affordable 50% time horizon, the maximum task length at which models have a greater than 50% chance of both completing a task and doing so under a given cost ratio. If progress is fast at a low cost ratio, then we can expect that AIs will soon be able to cheaply complete long tasks. I test a range of caps, from 1/4x to 1/32x human cost, to see whether tighter budgets substantially slow the trend.

I model affordability and passing separately. I fit a logistic curve to the pass rate as a function of task length, and a separate logistic[6] to the probability that any passing task is affordable. I do this because pass rate falls with task length, but affordability increases (see figure 2). Fitting a single logistic curve would fail to capture these opposing trends.

Multiplying the pass rate logistic and affordability logistic gives the probability that a task at some length will both pass and be affordable:

The affordable 50% time horizon is the longest task length at which p(pass & afford) is greater than or equal to 50%.

affordable_horizon_fit_claude_4_opus.png

Figure 3: Factored model for Claude 4 Opus at four cost thresholds (1/32x, 1/16x, 1/8x, 1/4x human cost). Red dots show failed or over-cost tasks, while green dots show passed and affordable tasks. Each panel decomposes the affordable pass rate (red) into pass probability (blue) and affordability given passing (green dashed). The dashed vertical line marks the affordable 50% time horizon. Tighter cost constraints reduce the horizon only modestly.

At different maximum cost ratios, the doubling times are reliably similar.

affordable_horizon_grid.png

Figure 4: Affordable 50% time horizon over time after 2024 at four cost thresholds: unlimited, 1/4x, 1/8x, and 1/32x human cost. Doubling times range from 3.0 to 3.3 months across thresholds, with high R² values throughout. Stricter cost constraints filter out some models but do not substantially slow the trend.

Stricter cost thresholds filter out some models, but enough recent models have cleared even the tightest thresholds to keep the doubling time around 3 months.

Limitations of my methodology

This data has some important weaknesses for the analysis I performed above:

  • AIs may disproportionately succeed on cheaper tasks at any given length, which would mean the cost ratios I observe reflect the cheapest subset rather than a representative workload. I minimize the effect this has on my results by focusing on trends in cost ratios rather than absolute levels, and by examining cost ratios at higher reliability thresholds where the selection effect is weakened.[7]
  • Models’ primary objective in these benchmarks is to complete tasks rather than minimize costs. My approach assumes these tasks were completed at the minimum cost, but some tasks may have been solvable at lower cost (e.g., with different prompting). As a result, I might significantly overestimate the costs of some tasks compared to scaffolding or prompting to optimize for cost.
  • METR notes that scaffolding differences make cost comparisons between models difficult. Unfortunately, I couldn’t find data that holds scaffolding fixed, so there isn’t an easy way to correct this. However, scaffolding differences would need to change costs by manyfold to produce a noticeable effect on the cost trend.
  • AIs might have advantages on benchmark tasks because of overfitting, which could drive down absolute costs.[8] But this would only affect the trends I observe if overfitting disproportionately reduces costs at longer or shorter time horizons, and I see no reason to expect that.

I think that these may mildly influence my results, but are unlikely to change overall trends.

Inference scaling will just make progress faster

If companies are willing to increase cost ratios (to 1, or even above), they might unlock even better capabilities. AI might be faster or higher quality than human labor in some domains, so the additional cost may be worthwhile.

UK AISI showed that newer models have larger gains from additional inference compute. In some domains, like ARC-AGI, performance improves a lot from large amounts of test-time compute, as demonstrated by o3’s performance at different cost thresholds and Ryan’s results . METR shows that additional token budget can improve a model’s 50% time horizon (note that this is an upper limit on per-task tokens, so not analogous to cost ratio), although with the Claude Code scaffold this is fairly limited.

Over time, we should expect the gains from additional inference compute to grow as models improve at using test-time compute. Already, AI providers offer increased test-time compute at a cost premium, like GPT-5.4 pro and extended thinking with Claude. As companies automate more labor, they might increasingly opt for these premium tiers over base models.

It's not yet clear how much improvement is possible by increasing test-time spending, but given the relatively low cost necessary to achieve METR's time horizons, substantial increases may be economically viable. Any gains would come on top of the trends observed by METR.

Conclusion

The data shows that observed increases in AI agentic capabilities have not come from rising cost ratios. METR's trendlines predict the length of tasks AI can complete cheaply.

I think companies will be able to affordably deploy models to complete tasks at their 50% or 80% time horizons. At these horizons, many tasks will fail, but at ~3% cost ratio companies can cap spending per attempt well below human cost, retry identified failures, and still profitably automate. And, more speculatively, if companies can predict which tasks will succeed, they can selectively automate those.

In some domains, it might be possible to increase cost ratios to make AI agents much more capable, but this possibility strictly shortens timelines to any capability milestone, rather than lengthening them. The first AIs with human-level intelligence may cost more than human labor, but cheap human-level AIs should not be far behind.

Appendix A: Why I get different results from Ord

Toby Ord analyzed the same issue and found very different results. I think this is explained by how he calculated hourly cost, which leads to significant overestimates. I also explain more minor reasons to prefer my analysis to his at the end of this appendix.

Ord’s results come from his analysis of this graph, released by METR (with his annotations).

Figure A1: Comparison of time horizon achieved by token cost used for different models. Each model is run once with a constant token budget, and the performance at each cost threshold is calculated by counting all unfinished runs as failures. Costs for o3 and GPT-5 are approximated using cost data for OpenAI o1. Toby Ord has added saturation point annotations to the figure, with diagonal lines indicating a flat hourly cost.

Ord uses this graph to find the hourly cost of AIs at their “saturation point” (where increasing the total budget doesn’t significantly increase the model’s 50% time horizon).

I don’t think Ord’s methodology calculates the real hourly AI cost for any set of tasks. Ord calculates hourly cost using the following method for the saturation point:

Where the per-task budget is the maximum that can be spent on any individual task.

For example, my data shows GPT-5 succeeds on 1-2 hour tasks at a median cost of $1.29, but Ord’s method requires a $10+ budget to reach a time horizon in that range. The reason is that the 50% time horizon at the saturation point is influenced by passing tasks significantly longer than the horizon itself (see figure A2). So the per-task budget must be large enough to pay for the most expensive task necessary to reach the saturation point, which has a high absolute cost even if its hourly cost is low. Dividing this inflated budget by the shorter 50% time horizon produces an inflated hourly cost. In effect, hourly cost is set by the price of passing a 4-hour task, divided by a 1.6-hour horizon.

Figure A2: As the cost cap rises, GPT-5's horizon grows from ~30s to ~3.5h. The bottom panel decomposes this growth by counterfactual attribution: at each cost step, I remove each duration bucket's newly-passed runs and refit the logistic to measure its share of the p50 shift. Early growth (under $1) is driven by short tasks (<4m, 4-16m). Above $10, 4-16h tasks (purple) dominate; expensive successes on very long tasks are what push the horizon from ~2h toward its ~3.5h plateau. (My updated data has a different time horizon than in Figure A1, but it matches METR’s updated time horizon.)

Figure A3: Dividing each model's required budget by its time horizon gives an implied hourly rate. Ord's method implies $5–59/hr across models (red) at their overall 50% time horizon, but the actual weighted mean and median per-task hourly costs of successful tasks near each model's horizon duration are much lower (orange, green). The overstatement ranges from 9–64x against the median. Error bars show 90% bootstrap confidence intervals. I have ignored saturation for simplicity, and simply find the cost necessary to reach 95% of a model's unrestricted 50% time horizon.

This leads to significant overestimation of absolute costs, possibly by an order of magnitude or more, and it is sufficiently noisy (overestimation from 9x to 64x) that it's also difficult to draw conclusions about trends.[9]

I think there are other, more minor reasons to prefer my analysis to Ord’s:

  • Even if cost ratios are high for very long tasks (far beyond models’ 50% time horizons), companies could choose not to use models for these tasks. The more relevant metric is the cost of tasks at models’ 50% or 80% time horizons, which I show are low in figure 1 and appendix B.
  • Ord’s finding of an upward trending in cost ratios largely relies on o3 and GPT-5 having higher hourly cost than other models. But in figure A1, the token cost of these models is approximated with the token cost of o1, a particularly expensive model.
  • Ord’s analysis uses fewer and older models than my analysis, and so may be less reflective of trends.

Appendix B: Additional cost at time horizon graphs

50% time horizon

I show that the overall trend of cost ratio versus time horizon doesn’t depend much on whether I include tasks with time horizons further than the model’s 50% time horizon.

Figure B1: Same as Figure 1 but with a narrower task selection band (±0.05 OOM). The trend is unchanged.

Figure B2: Same as Figure 1 but with a wider task selection band (±0.2 OOM). The trend is unchanged.

80% time horizon

Figure B3: Same as Figure 1 but measured at each model's 80% time horizon. Cost ratios remain below human cost with no upward trend, though o1-preview and o1 are notably more expensive at this threshold.

Including failures

Figure B4: Same as Figure 1 but including failed attempts in the cost calculation. Median cost ratios shift upward for reasoning models (o1-preview, o1, o3) that spend heavily even on failures, while most other models remain largely unchanged.

Appendix C: 80% affordable time horizon

Figure C1: Same as Figure 3 but at the 80% reliability threshold. Doubling times (2.8–3.1 months) are similar across cost thresholds, though stricter thresholds filter out more models, leaving fewer data points.
  1. ^

    For some applications (e.g., physically dangerous or secretive ones), AI might be used to replace humans even if it is more expensive than them. I still think AI labor being expensive would substantially slow adoption in many areas.

  2. ^

    For METR's data, I don't have exact inference costs for most models, so I estimate cost from each run's total token count multiplied by the simple average of that model's per-token input and output prices on OpenRouter. Note that OpenRouter prices can differ from direct API prices (e.g. Claude 3.5 Sonnet is $6/$30 per million tokens on OpenRouter vs $3/$15 on Anthropic's API). Because the data only records an aggregate token count (no input/output breakdown), this assumes a roughly equal split between input and output tokens. I tested this assumption against a few tasks for which METR recorded the exact cost and found it to be a reasonable approximation. Pricing was not available on OpenRouter for a few models: for Claude 3 Opus I used Anthropic's historical API pricing, for o1-preview and Claude 3.5 Sonnet (Old) I substituted pricing from their current-generation equivalents (o1 and Claude 3.5 Sonnet (New), respectively). Some runs for GPT-4 0314 and o1-preview lack token counts and are excluded from the cost analysis.

  3. ^

    The overall shape of the graph isn’t very sensitive to this parameter. I show this in Appendix B.

  4. ^

    Failed attempts are harder to interpret, since a failure doesn't tell us how many tokens would have been needed to succeed, but the absence of any upward trend even with failures included suggests selection isn't masking rising costs.

  5. ^

    I exclude tasks under 1 minute 30 seconds because they are predominantly run on a different scaffold (swaa/generate) that is less comparable to the agentic scaffolds used for longer tasks. I also do this for figures 3 and 4.

  6. ^

    I tried many fits, and the logistic had the best accuracy on withheld datapoints.

  7. ^

    To avoid selection effects, one might compare models only on tasks they all complete successfully. But weak models only complete very short tasks, so this would only tell us about costs on 5-minute tasks. I'm instead interested in costs at the frontier of each model's capabilities.

  8. ^

    Thanks to Gordon Seidoh Worley for this observation. From their comment.

  9. ^

    Ord also calculates “sweet spots”, where models are maximally cost-effective, using a similar methodology. Although the effect might be smaller, I suspect something similar would apply to the sweet spot analysis, because Ord’s method still inflates costs at shorter time horizons.



Discuss