TLDR: I flew to Oregon to investigate Nectome, a brain preservation startup, and talk to their entire team. They’re an ambitious company, looking to grow in a way that no cryonics organization has before. Their procedure is probably much better at saving people than other orgs, and is being offered for as little as $20k until the end of April — a (theoretical) 92% discount. (I bought two.) This early-bird pricing is low, in part, due to some severe uncertainties, in both the broader world and in Nectome's ability to succeed as a business.
Meta:
I'm Max Harms, an AI alignment researcher at MIRI and author.
This deep-dive only assumes functionalism and a passing familiarity with cryonics, but no particular knowledge of Nectome.
I have been a cryonics enthusiast for my whole adult life, and that is probably biasing my views, at least a little. I want Nectome to succeed.
That said, I am also a rationalist, and I have worked very hard to set aside my wishful thinking and see things with cold objectivity.
Throughout the essay, I've attached explicit probabilities for my claims in parentheticals. You can click these probabilities to access Manifold markets so we can bet.
This essay is titled "Nectome: All That I Know," but technically I know a few things that were off the record and that I agreed not to share. I do not think they change the overall picture very much.[1]
Contents:
The Problem
The History
The Team
The Plan
The Money
The Future
The Bottom Line
1. The Problem
Cryonics has a big problem.
The most basic, high-level story says: the patterns that make up a human mind don’t disappear instantly when someone’s heart stops beating, but rather when their brain decomposes. If we reduce the temperature of a brain, this halts decomposition, and gives time for medical technology to progress to the point of potentially being able to rescue the person before their pattern is truly lost. Unfortunately, dropping the temperature low enough that the tissue is stable over many years requires going far below freezing, and the formation of ice crystals in flesh also causes horrible damage. To prevent ice, cryoprotectant is perfused through the brain, replacing the water in cells with a solution that (at least ideally) doesn’t freeze, but instead turns into a glass-like “vitrified” state. There are a lot of pesky details that I’m glossing over, including some serious problems, and I’ll get back into the nitty gritty science in a bit. But none of this is the big problem with cryonics.
The big problem with cryonics is that most of the time when someone who signed up for cryonics is pronounced legally dead, a critical window of time passes where their brain deteriorates before the cryonics provider is able to operate. The main culprit here is ischemia — oxygen starvation — which dramatically changes the character of the network of blood vessels that run through the brain. In a working brain, these capillaries are open and clear, able to pump blood through every part of the brain. During a cryonics procedure, these same pathways are used to push cryoprotectant into the tissue, and ferry away the excess water. But when oxygen is cut off, the endothelial cells swell,[2] shrinking the already-narrow passages, trapping cells, and causing clots. In the literature, this one-way, serious damage to the circulatory system is called the “no-reflow phenomenon” and at normal temperatures it begins after just a few minutes.[3] By the 15 minute mark, even on blood thinners, most of the brain will be more-or-less permanently cut off.
Rapidly cooling the brain can stretch this out. There are famous cases of people falling into freezing water and surviving at least an hour with no blood flow to their brains.[4] So, in principle, someone might be perfusable if they are rapidly chilled, have external life support circulating oxygenated blood, and only a few hours pass before perfusion. But this is an idealized case, and not how things go much of the time. Most cryonics clients wait over a day between legal death and being cryopreserved (90%), and in many cases, people spend hours at warm temperatures after legal death but before any real action is taken.
Some information about these poor souls is probably saved. It can actually take quite a while for the structure of the neurons to fall apart, and the technologies of the future will almost certainly look miraculous. If you want someone to be preserved, cryonics is clearly better than nothing. But "some information is saved" is a far cry from "the person is still intact." Most critically, with significant delay between legal death and perfusion, I am highly skeptical that the typical cryonics client's brain is adequately perfused in the regions that matter most (15%). The brain's finest blood vessels — small penetrating end-arteries with no backup routes — are the first to be permanently blocked by ischemic clotting and blockage, and they feed broad swaths of cortex and white matter. Once the brain is dropped below freezing, I suspect that these vital areas are then shredded by ice.
We need to do much, much better.
How to Fix Cryo
The most important thing that an individual can do to maximize their chances is to legally die inside or right next to the facility where they’re going to be perfused. They should do this quickly,[5] at an expected time, and with a team of professionals standing by, ready to act with utmost urgency. In other words, if you want to maximize your chances of being saved, you should probably get Medical Assistance in Dying (MAiD), where a doctor prescribes drugs that can let you legally end your life on your own terms.
The good news is that this is possible. MAiD is available in more than a dozen US jurisdictions, including Oregon, California, Washington, New York, D.C., and Vermont. Sadly, it’s not available in either Arizona or Michigan, where Alcor and the Cryonics Institute are based, respectively, so you’ll either need to be perfused away from their main facility (uncommon) or transported after death (hard) or hope that you legally die quickly in hospice at the right time without medical assistance (risky). Outside the USA, MAiD is available in Canada, The Netherlands, Switzerland, Spain, Austria, and much of Australia. If you don’t live in one of those places, don’t fret — Oregon, Vermont, and Switzerland allow MAiD for non-residents.[6] And while MAiD is controversial, its availability is undeniably growing over time.
In the USA, to take advantage of MAiD, one needs to already be dying. In Oregon, for example, MAiD can only be provided to people who have been prognosed with less than six months left to live. But this applies more often than you might think. About half of people who die on Medicare do so in hospice, which also requires a less-than-six-month-life-expectancy prognosis. For those who are willing to shop around for pessimistic doctors[7] when they get deeply ill, I estimate that maybe 3/4ths of people in the USA could theoretically take advantage of MAiD, setting aside financial obstacles.
(MAiD, for the record, is not particularly expensive, at least in the world of healthcare. The medications are self-administered and cost about $700,[8] and must usually be purchased out-of-pocket. The physician fees are more variable. Medicare and most health insurance covers end-of-life care, including consultation, but Medicare at least won’t pay for anything specifically about MAiD.)
Beyond the individual, the biggest thing that cryonics needs is scale. Right now, if you want to save your loved one, you need to convince doctors, hospital administrators, and funeral directors to do something weird. In a world where minutes matter, we can’t afford to explain why rapidly cooling the head is so important. We need paramedics who continue CPR after someone is declared legally dead. We need a perfusion facility in every hospital. We need a world where, in the most stressful moments of a person’s life, they can let the professionals take over, rather than fighting an uphill battle to break norms without coming across as crazy and stricken with grief.
In addition to these benefits, I believe scale can also make the process of legally dying with MAiD much safer. Right now, cryonics teams are unable to really act until a doctor/nurse/EMT pronounces someone dead, and while the cocktail of ingested drugs used in places like Oregon are almost always painless,[9] it can sometimes take hours before legal death is declared. During this window of dying, the body is changing, and it’s a somewhat open question of what can go wrong during that period, from a cryonics perspective. With increased scale comes awareness and advocacy such that more places can become like Canada in allowing physicians to administer the drugs intravenously, and thus speeding things along compared to the oral route. One can even imagine a world where life-support machines are used to provide oxygenated blood to the brain, even as the patient’s heart is stopped and they’re declared legally dead.
But for cryonics to reach that kind of scale, there needs to be a cryonics company that is ambitiously trying to grow. And unfortunately, the field of cryonics has historically been extremely averse to growth. Alcor and the Cryonics Institute (CI) are the two biggest players in the field, each with approximately 250 people preserved over about 50 years — an average of only about 5 people per year! The field has grown a little bit since the 70s, but the total number of people getting cryopreserved across the whole world each year is almost certainly less than 50. These cryonics companies are not-for-profit, and aim for keeping costs as low as possible "for the public good." This means setting aside next-to-nothing for advertising/marketing,[10] running into funding issues when inflation causes negotiated contracts to fail to cover costs,[11] and generally failing to invest in growth.
If only someone would step up and actually try to change the world…
2. The History
The Brain Preservation Foundation
In the late 2000s, there was an emerging zeitgeist in neuroscience: connectomics. Instead of thinking mainly in terms of connections between large-scale brain regions, scientists were starting to be able to use scanning techniques and computers to build models of brains that included details for individual neurons, and even individual synapses. In 2010 — the same year that connectomics made a splash at the TED conference — Ken Hayworth, who had completed his PhD in neuroscience at USC the year before, was living in the Boston area and working in the lab of Harvard professor Jeff Lichtman, one of the main connectomics pioneers.
Hayworth was excited about the potential for preserving people’s connectomes after death, but was unconvinced by what he saw in the field of cryonics. He knew what well-preserved tissue looked like and didn’t think that standard methods of cryopreservation were anywhere near good enough. Taking inspiration from another bit of 2010’s zeitgeist, he co-founded the Brain Preservation Foundation (BPF) and offered a pair of incentive prizes — for preserving both small and large mammal brains — to anyone who could prove that they were actually capturing the vital synaptic information.
I started the Brain Preservation Prize as a challenge to Alcor and other such companies to ‘put up or shut up’, challenging them to show that their methods preserve the synaptic circuitry of the brain. After five years they have been unable to meet our prize requirements even when their methods were tested (by a third party) under ideal laboratory conditions. Out of respect for loved ones I will not comment on any particular case, but it is clear from online case reports that their actual results are often far worse than the laboratory prepared tissue we imaged. Speaking personally, I wish that all such companies would stop offering services until, at a minimum, they demonstrate in an animal model that their methods and procedures are effective at preserving ultrastructure across the entire brain. By offering unproven brain preservation methods for a fee they are effectively making it impossible for mainstream scientists to engage in civil discussion on the topic.
Serious neuroscientist Ken Hayworth rejects your weaksauce cryonics! Shame! Do better!
Aurelia Song[12] was a graduate student working on a master’s in EECS from MIT[13] who became interested in the BPF . She began to volunteer there, first doing things like working on their website, fundraising, and doing outreach. But in 2015, in the midst of learning about the science, she started to think that she saw a path towards claiming the prizes. At heart, it was simple: neuroscientists already had techniques that preserved brain tissue well. This is what Hayworth was using as a gold standard! With a few changes, those same techniques could be used to win.
But to claim the prize, Song needed the help of someone who had actually trained in the field, and while she was on good terms with Hayworth, there was an obvious conflict of interest, given that he was one of the judges for the prize. So Aurelia Song reached out to 21st Century Medicine (21CM), the California-based cryobiological research company that had developed M22, the cryoprotectant that has been used by Alcor since 2005. 21CM has had a complex relationship with the cryonics community for decades, fundamentally working in the same space (and often towards the same ends), but trying to keep its distance from the immortality-flavored transhumanist rhetoric, and focusing instead on things like cryo-stabilized organ transport and other, more respected, technologies. They initially weren’t very interested in Song’s proposal, but she persisted, and was eventually hired to work with cryo pioneer, Greg Fahy.
Fahy and Song soon developed aldehyde-stabilized cryopreservation, a synthesis of standard cryonics techniques (cryoprotectant + extremely low temperatures) and standard neuroscience techniques (glutaraldehyde — a cousin of formaldehyde that essentially glues proteins and other large molecules firmly in place). In 2016, they won the BPF prize for small mammals, showing state-of-the-art preservation of nano-scale details of a rabbit brain.
About that same time, Song and her fellow MIT alum and ex-roommate, Michael McCanna, founded a startup, Nectome, with the bold mission of taking that same technique to human clients. By early 2018, Song and Fahy had won the second BPF prize, this time for a pig brain, demonstrating that high-quality preservation of humans was indeed on the horizon. Nectome was set to revolutionize human brain preservation.
Media Missteps
Before 2018, Nectome had basically been a tiny, stealth-mode startup. But almost immediately after winning the large mammal prize, Nectome debuted as part of Y Combinator’s 2018 winter cohort… and proceeded to stumble into a media firestorm. To this day, if you Google “Nectome” the second result, after their website, is this early-2018 article from MIT Technology Review with the headline: “A startup is pitching a mind-uploading service that is ‘100 percent fatal’”.
To state the obvious, Nectome is not “pitching a mind-uploading service” nor even a service that is “100% fatal,” regardless of what Song said in 2018. Nectome is a company that takes people who are already dying, and allows them to try to preserve as much of themselves as possible, so that some other party can potentially use that preserved information in the future, including possibly creating an upload based on the person’s preserved brain. Yes, for people like me and Aurelia Song, uploading to extend lifespan is the obvious use-case, but I think it’s clear in retrospect that if Nectome had been marketing itself better in those early days, it would have leaned into a more pluralistic/agnostic position as to why someone would want to buy their services. They might’ve talked about the value we get from mummies and ancient frozen people, or of Jeremy Bentham’s decision to be preserved for the benefit of others. They might have leaned on the way in which we don’t know what future technology might look like. And they certainly could’ve much more clearly emphasized that they are trying to save and extend life, rather than end it.
But that’s not how things went. Instead, the media smelled blood in the water and focused on weirdness and controversy. Many outlets featured quotes by naysayers that didn’t really address the core idea. A few weeks later, the MIT Media Lab issued a public statement cutting ties to Nectome, which the broader media tended to characterize as the whole of MIT condemning the company.
This meme was featured prominently on a TechCrunch article from the time.
Still, Nectome went on. They received about two million dollars in a combination of pre-seed/seed investments and federal grants. Later in 2018 they made their first hire, Jessica Radley, who is now the CEO. She, alongside the founders, worked to repair Nectome’s reputation and build skill in talking about their business with those who weren’t already on board with the premise.
I don’t know the details, but I speculate that the struggles of 2018 were particularly hard for Michael McCanna, and that it was during this time that he really began to drift away from Song. McCanna started advising an AI-centric YC startup in 2019. Like Song, he studied computer science at MIT, and left the company a year later to eventually become COO at Immunefi, a crypto-security company.
The Long, Slow Science of Getting the Details Right
Then Covid hit.
Song had been preserving brains[14] for nearly half a decade, but these were always a proof of concept. To get a mature process that was good enough to use on real human clients, more work was needed, and with the pandemic driving everyone into lockdown and shaking up the status quo, the decision was made to pare the company down to just Aurelia Song. McCanna went off to do tech things. Jessica Radley left to go live on a boat. It was a weird time.
But through it all, Song persisted. Buoyed by the initial investment round, and driven by a vision of how to potentially save millions of lives, she moved to Oregon and continued to iterate towards a procedure that would work. With the help of a couple assistants, she continued to practice on pigs and on donated human cadavers. In 2021 it took her, with the help of two assistants, twenty-two minutes to preserve a pig. Now it’s closer to five minutes[15] — fast enough to perfuse the entire brain before ischemia ruins things.
As anyone who has worked in science, engineering, or medicine will tell you: there’s a big difference between a proof-of-concept and a real-world technique that can work with actual human beings. For instance, it took Song’s team months simply to experimentally determine which filters to use for the circulatory pumps. Everything from the placement of the cannula tubes to the exact method of surgery needed to be learned through the basic methods of science and engineering. Nectome is doing something that nobody has ever done before, and that requires diligence and determination. Thankfully, they seem to have both.
Nectome’s team grew slowly. Additional angel investments came in, here and there. Andrew Critch carefully ran an independent review of their technique, leading to another half-million dollars from the Survival and Flourishing Fund.
3. The Team
A few weeks ago, on March 25th, I flew up to Portland[16] to visit with their entire team, ask questions, and tour their lab. Based on that limited visit, here’s my sense of the people involved:
Aurelia Song, Founder and CTO, President of the Board of Directors
Passionate, talented, and dedicated, Aurelia is the beating heart of Nectome. She clearly knows her science, and is an extremely nerdy rationalist who isn’t afraid to say what she believes and go against “common wisdom.” Her ambition and idealism perhaps get the better of her sometimes, but those same traits make her a good startup founder, and don’t get in the way of her skill as a scientist and engineer.
Charlie Todd, Operations
I only met Charlie briefly, but they were my first point of contact via email and have shown up in the comments a lot on Aurelia’s LessWrong posts. They came across as energetic, engaged, and friendly, which seem like great traits to have when fielding inquiries from the public.
Anna LaVergne, Lab Tech
Anna is a local hire from the Portland region who learned about Nectome via a posting on Astral Codex. Anna also gave me strong rationalist vibes, albeit from a quieter, more skeptical angle than Aurelia. During a lull in talking about Nectome, I asked everyone except Charlie (who had a prior commitment) what they thought about AI, timelines, doom, et cetera. Opinions were all over the place, but Anna was probably the closest to my pessimistic-MIRI position. When I was getting the lab tour, she seemed to really know her stuff, and she’s clearly on track to be one of the primary people who operates on clients. I would feel good in her hands.
Borys Wróbel, Chief Science Officer
Borys is Nectome’s serious, graybeard scientist. Born in Poland, he got his PhD in Biology in 1998 and was working on brain preservation in the Netherlands as Scientific Director of the European Institute for Brain Research before coming to America to work with Nectome. He was a delightful contrast to the rationalist/tech-startup/American vibes coming from the rest of the company, and I think his presence as CSO is a good sign for Aurelia’s taste in people and the seriousness of Nectome as an organization. We talked for a while about the brain preservation situation in Europe, and why things are more dysfunctional there than might be apparent from afar, as well as the publication pipeline for Nectome’s journal preprint of their preservation techniques.
During my visit, I spent the most time with Jessica, who rejoined the company last year to become CEO. She has a background in AI (back in the days before LLMs took off), drives a Mustang, and used to live in a Bay-Area group house with a bunch of consciousness hackers.[18] She was the only member of the team that gave me real business vibes, and was probably the person at Nectome who most impressed me, due to her clear combination of truly believing in the mission (and in Aurelia) alongside a sober awareness of how their success or failure will ride at least as much on the confidence of investors and clients as it does on the technical skill and scientific prowess of the team. Unlike the others, she was clearly guarded around me, recognizing that this essay wouldn’t necessarily paint them in a wholly positive light, and (correctly!) trying to manage me as something like a journalist, rather than as a curious new friend.
Nectome Needs More Businesspeople
Radley is — as fitting for the CEO — probably the most important person at the company right now. To grow, and achieve the dream of scale, Nectome needs to metamorphose from being a wonky research lab to being a real business that’s capable of courting investors, government officials, the media, the broader public, and especially their customers. Song is great for explaining the science and vision, especially to nerdy transhumanists like me, but for these more outward-facing relationships, she needs help. One of the best signs about Nectome is that Song appears to really understand this, putting Radley in charge and giving her a strong vote of confidence. And conversely, one of my biggest concerns is that I don’t see Radley as having much of a digital presence or taking much of a front-and-center role as a figurehead. Before meeting the team, I wasn’t even sure if Radley was still working at the company, or if Song was still the CEO.
All that is fixable, of course, and I did get the feeling that Radley has what it takes to succeed, but her situation seems to me to mirror the state of Nectome as a whole: lots of potential that has yet to be proven.
Take, for example, the stark absence of anyone on staff with a background in running a business, marketing, sales, advertising, media relations, law, policy, finance, or accounting. Yikes! They assured me that they have legal consultants in Oregon, California, and D.C., but having lawyers who you can pay to answer questions is a very different thing than having someone whose full-time job is to make sure Nectome doesn’t get sued. They also have several advisors who are surely helping, but again, that’s very different from someone working day-to-day to ensure success. A new method of brain preservation is not an easy thing to sell, and based on how 2018 went, I would’ve hoped for an entire comms team as part of their debut, rather than one energetic ops person.[19]
It’s understandable that these people need to get hired at some point, and apparently that point is now/soon, but unlike the average small business, I claim Nectome cannot simply post job openings and hire the most qualified people that walk in the door. Nectome is a tech/innovation startup looking to break into untapped markets and change the world. As such, culture and alignment are vital to hiring their first wave of businesspeople. If the initial marketing, legal, and finance hires are true normies who don’t believe in Nectome’s mission, that will ultimately doom the company to lots of internal friction and lack of coordinated effort. Song has done a great job picking people so far, but scientists are more in her wheelhouse than salespeople, and it’s not even clear to me that good people for these roles exist.
(Speaking of which, if you’re excited about brain preservation, work in any of those areas, and are looking for a job, you might want to reach out to [email protected] !)
4. The Plan
Nectome is not aiming to squeeze into the same, existing niche as orgs like Alcor. Nectome is ambitious. They’re aiming to bring brain preservation to the mainstream. In the comments to her post, Less Dead, Song writes:
I think we will be getting to thousands of preservations per year in a few years
This is, I think, wildly over-optimistic (94%), and probably won’t even be true by 2040 (75%). But it might make sense as a target. By aiming high, Nectome is thinking about scale in a way that no other brain preservation org is, as far as I can tell. This means they have some hope of succeeding, even if it takes longer than they hope.
When I visited, they were in the middle of trying to find a house in the Portland area to buy and turn into the inaugural Nectome brain preservation center, where their first clients could come and be preserved. At first, I thought the idea of doing the procedure in a converted house was a bit strange. Shouldn’t there be some fancy facility, like a hospital or assisted-living-type place? If they’re aiming for scale, their facility should be big!
But this is wrong for a few reasons. First, hospitals and assisted living facilities are set up to house people for extended periods. In most cases, Nectome is envisioning the dying client traveling out to their center, spending a couple days there with their family, then the operation happens and everyone leaves. It needs to be comfortable enough to host people for short periods of time, and big enough to hold one or maybe two families, as well as having space for the procedure. But Song’s procedure doesn’t actually require much equipment — Song first practiced it on pigs out of the back of a converted U-Haul truck. And Nectome is very conscious of the need to give things the right vibe. As Song explained to me, “People want to die peacefully in their living rooms, surrounded by family.” Nectome might not be able to make house calls,[20] but they can provide a peaceful, comfortable living room, which is a better experience for most people than a cold, sterile hospital.
And, importantly, houses are relatively cheap. Buying or constructing a big facility is costly, both in terms of money, time, and flexibility of location. There are very few restrictions on where someone can get MAiD, or where Nectome is able to do a perfusion, and with the ability to quickly acquire and convert residences, this first center in Oregon is intended to be one of many. While states like California aren’t ideal starting places, they do have MAiD laws, and Nectome is clearly hoping to set up preservation centers in the bay area, the east coast, Canada, and beyond.
MAiD Services
So, let’s imagine your parent, grandparent, or some other loved one is dying. Their doctor says they have stage-4 cancer or something equally dire. You tell them about Nectome, and instead of spending hundreds of thousands of dollars fighting for the chance to extend their remaining days into some number of years of potentially very-low-quality life, they decide that being preserved is the more hopeful, life-affirming path.
To make this happen, they will need to go through a few steps in Oregon (or, more ideally, in their home state, if it supports MAiD and Nectome has expanded there). Because many places have laws forbidding MAiD, telemedicine probably won’t be an option in the near future.
So your loved one flies out to Oregon and rents a hotel or AirBnB or whatever for a few weeks. To help them along the path, Nectome might put them in touch with a death doula and/or give them a list of doctors who are known to be comfortable with Nectome and MAiD. Once in Oregon, they’ll need to find and meet with an administering physician who gives a prognosis of less-than-six-months-left-to-live, at which point they then need to verbally request MAiD and have that physician agree to participate. Afterward, they’ll need to find and meet with an independent doctor who gives a similarly terminal prognosis and agrees to participate in the role of consulting physician.
(If either physician thinks that the dying person is psychologically unwell, things get significantly more complicated, but MAiD is still possible with the involvement of a psychiatrist and support from family.)
Once both physicians have signed on and explained what’s involved and laid out the various available alternatives, a written request must be filled out and witnessed. Then, at least two days after this paperwork is complete and fifteen days[21] after the first verbal request, they must meet with the attending physician again and verbally confirm that they want to be prescribed life-ending medication. Only now can a participating pharmacy sell the necessary cocktail, which must be mixed by the patient, but which may be taken at any time.
It’s at this point that they head to the Nectome preservation center for up to an up-to-two night stay, along with family/caregivers. Paperwork is filled out, granting Nectome rights to their body after legal death and interviews are conducted (which we’ll revisit in a bit). Perhaps this is when you fly out to see them and say goodbye, if you weren’t staying with them beforehand and helping with the process.
Sendoff
The preservation center is[22] in an unassuming house in a quiet part of Oregon. (Probably in the Portland suburbs?) You and the other loved-ones all gather around the person who is dying in what is basically a living room. There are lots of seats. It’s quiet and comfortable. You might even watch a favorite episode of some familiar TV show while you wait for the designated hour, the family dog curled up on their lap. As the time approaches, a nurse practitioner from the patient's care team arrives, as do the relevant Nectome staff.
After goodbyes are said, most of the family is shepherded to another room. The dying person mixes and drinks the prescribed medication[23] and the nurse sits next to them to monitor vitals. Nectome’s team waits nearby, with an open doorway to something like an operating room. After a few minutes, the sedatives in the cocktail cause them to fall asleep, and their breathing slows. A tense waiting period follows where the nurse carefully tracks their heartbeat. After perhaps 50 minutes, the nurse practitioner pronounces them legally dead.
Nectome springs into action, moving the client into the operating room and leaving the remaining family, death doula, and nurse behind. Surgery begins immediately, connecting cannulas to the heart’s arteries to begin the blood washout and perfuse the whole body with aldehydes.[24] While it might seem faster to cannulate the carotid arteries, Song's experience is that it's not — the vertebral arteries are also needed for good human perfusion, and the delicacy of working on the neck ends up slowing things down. The difference in cost in perfusing just the brain vs the entire body is negligible.[25]
And, as a nice benefit, the client is now stable at room temperature. Are they stable for years? Maybe! There aren’t good studies on how much vital information is lost, when neural tissue is stored at room temperatures for long periods. But based on what data we do have, the body will at least be stable for weeks.[26] This is, in fact, a similar procedure to what is done in a traditional funeral home to prepare for an open-casket funeral, and Nectome is prepared to work with funeral directors to allow for their clients to get open-casket funeral services afterwards.[27]
Storage
After the funeral, Nectome takes possession again, and moves the client’s body to a long-term storage facility (not the preservation center). If this is in the next few years, the facility is probably something like a warehouse in the Portland area. It might even be a truck with specialized refrigeration capability. If we’re looking further out, and Nectome is on the path to success, the storage facility is potentially more like a low-temperature mausoleum set up somewhere in a stable place in the far north, such as Canada, Sweden, or Norway.[28] (See also: The Svalbard Global Seed Vault.) From what I understand, it is not that difficult, legally, to transport bodies[29] to whatever part of the globe is safest for their clients.
Compared to traditional cryonics, the storage phase for Nectome is considerably easier, because while Nectome does perfuse the brain with cryoprotectants that can help prevent freezing if they go cryogenic, Nectome does not intend to store its clients at true cryogenic temperatures. Instead of vitrifying at -135°C or -196°C temperatures using liquid nitrogen, Nectome intends to use standard industrial freezers to store their clients at closer to -30°C (-22°F). Thanks to the cryoprotectant, this is above the temperature where there’s any risk of ice crystal formation, and comes with several benefits:
Liquid nitrogen (LN2) is more expensive.
An Alcor Bigfoot dewar boils off about 10 liters per day, and they probably pay about $0.5/liter. If we suppose that one Bigfoot has 10 clients,[30] that’s (conservatively) about $180/year per client, just from cooling.
By contrast, Nectome can theoretically insulate an entire room and take serious advantage of the square-cube law. Back-of-the-envelope says a purpose-built structure in Canada could easily preserve 10k clients for less than $10,000 in annual energy costs. Even if mostly empty, that's only $10/year/client.[31] With more optimistic assumptions we can even imagine costs closer to <$1/year/client.
A more basic industrial freezer could probably hold 30 clients, while consuming less than $3,000/year in energy — $100/year/client. Nectome doesn’t have to scale huge before they start having a price edge.
While these dollar figures aren’t huge, they can matter a lot in terms of how much money needs to be set aside per client to pay for long-term support.
LN2 requires more of a supply chain, which can be broken, and it limits where you can put your facility.
Imagine you want to store clients in Svalbard. During winter it is hard to reach the island, and 1% daily boil-off makes shipping LN2 long distances expensive. Better to manufacture your own, at the facility.
But while LN2 is one of the easier substances to produce (cool+compress the ambient air, then separate) this still requires specialized machines that need to be sourced from afar.
In a global emergency, such as a world war, this machinery might be hard to get, and you thereby run the risk of not having what's needed to keep the clients safe.
More generally, industrial freezers are a simple technology and are easy to source/scale.
Nectome wants to grow quickly, and this path helps facilitate that. No specialized dewars need to be designed or manufactured. Suitable, large-scale freezers capable of holding dozens of clients can cost less than $30k and can be purchased today.
Perhaps most importantly, storing at deep-cryo temperatures is dangerous.
At around -135°C, objects like vitrified brains are essentially solid glass. As the temperature continues to drop towards the -196°C temperature of LN2, the glass contracts, with the outside contracting faster than the inside. This uneven contraction creates mechanical stress, and can even cause shattering if done too quickly.[32] The inverse problem can also happen if the solidified tissue warms up too quickly.
Unfortunately, storing at -135°C and/or carefully controlling the cooling rate requires an expensive cryogenic freezer, whereas dropping the temperature rapidly and hoping for the best is cheap.
Even tissues perfused with cryoprotectant run some risk of ice crystal formation when the temperature drops from -40°C to -130°C or vice-versa. Due to a dynamic where ice crystals nucleate on the colder end and then ice grows on the warmer end, it's particularly dangerous to warm up from cryogenic temperatures slowly. But if you do it too quickly, you get stress fractures![33] By storing the brain above the phase transition temperature, there is no ice risk from warming and Nectome can avoid this double-bind.
As part of transporting the client to their facility, Nectome also then hands off legal control to a distinct, non-profit organization[34] that’s dedicated to the long-term care of preserved people. Like the cryonics nonprofits, this nonprofit manages an endowment trust for each patient, and pays for the storage facility’s upkeep using the returns from a diversified portfolio of low-risk investments.
It’s hard to predict exactly how much each client’s endowment will need before the details of Nectome’s long-term storage facility and nonprofit wing are firmly established. Wealthy clients will almost certainly be able to donate to the nonprofit to ensure they will have enough support, but a lot depends on the trajectory of the future. As I understand it, Nectome intends to invest approximately 100x the annual storage costs in the seed endowment of typical clients, which will mean the endowments almost certainly will grow over time, if the future looks anything like the last few hundred years.[35] My best estimate is that seed funding for endowments will be about $30k.[36][37]
Returning to our narrative, what should we expect for the future of your preserved loved-one? Well, back when they were making final arrangements with Nectome, they did a series of interviews. One such interview was to check that they were a good fit for Nectome, understood the plan and the risks involved, were mentally able to consent, and so on. But another set of interviews were meant to document the client’s wishes for how they wish to be handled as time goes on and the world changes.
For example, if technology is developed that appears to allow preserved people to be rescued, do they want to be an early-adopter, or wait until it’s well established? If rescue involves being uploaded into a computer, and no biological revival is yet possible, do they want to do the upload, or wait? Are there any countries they would never want their remains to enter? If the world looks like it’s coming undone, due to war, tyranny, AI, or something else, would they like to be cremated, just in case there is a fate worse than death? Is there anything else they want the future to know, in case they can’t be revived?
In my time there, I got the impression that Nectome will take the preferences of their clients very seriously. During the long wait after legal death, a significant part of how things go will, if Nectome is in good shape, be steered by the preferences and plan that their client lays out beforehand.
Messy MAiD
So that’s the plan. Overall, I think it’s pretty good, and that the success or failure of Nectome will mostly hinge on business details, price points, and whether there’s enough of a market. (Oh, also on whether AI is about to upend the whole world, which I’ll get into towards the end of the essay.) But one major technical detail stands out to me as a pain-point: the slowness and messiness of MAiD, particularly in terms of the ingested cocktail used in Oregon.
Nectome has never actually done their procedure. They have proven success on pigs. But as I understand it, they did not use ingested MAiD drugs to kill their pig subjects. The pigs were healthy and young, not suffering from dementia and atherosclerosis. It seems very plausible (45%) to me that, in practice, things will go wrong in at least a quarter of cases and full perfusion will be impossible.
I asked Song about this, and she didn’t seem to have a good answer. She’s truly dedicated to saving people, and aware that situations are likely to emerge where Nectome will simply need to do their best to preserve whatever they can as best they can. But having an intention is different from having battle-hardened skill in navigating messy situations, and my bet (75%) is that Nectome will lose significant parts of many of their early clients as they learn by doing and gain the experience needed to handle edge-cases.
To be clear: this doesn’t mean going through Nectome’s procedure is worse than traditional cryonics. I think Nectome’s procedure is likely the best option, even given the team’s lack of experience. My point is that it still feels like a huge gamble, at least right now. In ten years (assuming they’re still doing preservations), my guess is it’ll be a lot safer and a lot clearer what the major risk-factors are.
An ideal change would be for Oregon to change their laws to allow direct injection of the drugs by a physician, like in Canada, Spain, and The Netherlands. (And even more ideally, with a circulatory machine running while legal death is pronounced.) Alternatively, other places that allow a speedier end could allow non-residents in the same way that Oregon currently does, and Nectome could set up preservation centers in those jurisdictions. I’m particularly hopeful about Switzerland, which seems to have a relatively sane outlook towards people having freedom and dignity in how they leave the world.
5. The Money
Let’s transition away from thinking about cryonics as a procedure and more about it as a business, starting with prices. Nectome is launching with a price point of $250,000 per person. That’s a lot of money! But is it an unreasonable amount?
Average medical spending in the last year of life in the USA is about $112k,[38] going up to $217k if we look at the last three years of life. Of these costs, about 85% are covered by a mixture of insurers and government programs like Medicare, making the average out-of-pocket costs at the end of life ~$15k. From this perspective, Nectome’s price point is quite high, especially when considering that Nectome’s price doesn’t include the doctor visits, death doula, transportation, or funeral expenses.
Still, there is a heavy right-tail in medical spending, and old people often have a lot of money! According to the 2022 Survey of Consumer Finances, the median person in all 65+ age demographics in the USA has a net worth of over $300k. The top quartile has between 2.5 and 3 million dollars, and, of course, we’re not even getting into the top 1%. From this perspective, Nectome is a modestly-priced luxury — out of reach of many, but still arguably (barely) within reach of most Americans.
As a rough guess, not having done in-depth market research,[39] I believe that this is a reasonably smart price-point, in terms of maximizing revenue. Nectome has serious potential to be head-and-shoulders better, in terms of preservation quality, than their competition. This edge in quality positions them well to establish themselves as a premium brand, like Gucci, Rolex, and Ferrari.
When a premium brand is top-of-the-line, my sense is that they should largely choose their price point to simultaneously have huge margins on each customer, while keeping the pool of potential customers large enough to keep them busy. Arguably, they might be better-off charging even more. There are a lot of eccentric millionaires around the world!
Nectome is aware of this, and is gearing up to present themselves as a top-of-the-line option, targeting their marketing effort towards wealthy clients. This initial science-heavy debut to the rationality/cryo-enthusiast community is meant to help them establish credibility and soft-launch in a market that is naturally disposed to like them, allowing people like me to drum up organic excitement. For their primary early customer base, they intend to lean into a strategy of heavily upselling to meet bespoke needs. Want to be cryopreserved in California? In Europe? In your house? My sense is that Nectome is excited to say “Yes, we can do that for an extra X million dollars!” All it takes is one dying billionaire who really wants them to succeed and they could be golden for a while.
If $30k needs to be allocated to the patient care trust, and we speculate that the procedure costs about $50k in labor, materials, et cetera,[40] then the $250k price implies a nice, comfortable margin of $170k per client. This profit then needs to go to R&D, marketing, legal, operations, and the rest. If we suppose that the business arm and other fixed costs are around $1.85M/year,[41] this means they need to serve about 11 clients per year to be in the black. More profit beyond that level could then be folded into marketing and other growth costs, such as buying and renovating new facilities. Clearly, if they manage to preserve a hundred people per year (much less thousands) at that price point, they will be wildly profitable.
How many people exist who will be willing to buy at that price? Well, there are about 24 million people in the USA with a net worth of over a million dollars — about 40% of the millionaires, worldwide. As a back-of-the-envelope, order-of-magnitude guess, let’s say that there are about 50 million people who could reasonably afford Nectome’s services, that about 2% of these people die each year, and that half of those do so in a way that’s compatible with going to Oregon and getting MAiD — 500k potential clients per year. Even if only one-in-a-thousand people are open to it, philosophically, Nectome really could potentially be serving hundreds of clients per year, if they get really good at marketing. And, if they break the Overton window open, thousands per year is plausible.
All they need to do is hit their stride on marketing, advertising, and scaling before running out of runway. Easy, right?
Runway and Investment
I think there’s something of a mystery around Nectome’s debut and revealed strategy, and it feels like a red flag to me. Specifically, if they’re so growth oriented, and they have this business that could potentially scale to making hundreds of millions of dollars a year in profit, why are they debuting now?? Why does it smell like Nectome is trying to be profitable in 2027?
Why not, for example, have a nice preservation center, storage facility, and patient care nonprofit set up before debuting? Why not have a marketing officer, at the very least? Stumbling out of the gate without specialists who know how to drum up excitement is not something that inspires confidence.
My best guess is that Nectome is launching their commercial services now because they have a burn rate of around 1–1.2 million dollars per year, but only had about $750k in the bank at the start of the year, and are struggling to get venture capital to keep them afloat. In a coffee shop, towards the end of my visit, Radley confessed to me, in a rare moment of unguardedness,[42] that they recently had a group of investors from Chicago who were interested in doing a Series A round with Nectome, but then got spooked by something and backed out without explanation. (Radley seemed perplexed.) My sense is that if that had gone through, Nectome wouldn’t be offering sales yet. Part of the softness of their current debut is likely to save some of their powder in case they do manage to find an investor willing to drop a few million and can pivot back towards focusing more on growth.
And to be fair to Nectome, I bet the VC landscape really sucks for them right now. Nectome is not an AI company, and by my sense of things, all investors want right now is AI. Even if Nectome looks like an investment that has positive expected value, is it as exciting as having shares in a hyperscaler or a frontier lab?
It’s possible that I’m reading too far into things. At an earlier point in my visit, when I asked why they chose to debut now, the answer was more idealistic — Song said she was tired of turning people away, and they wanted to start saving people as soon as it was realistic to do so. They are already working with one probable-client who is in poor health, and Song said that during her many years since winning the brain preservation prizes, she’s had to turn away a couple people each year.
It does seem good to get started trying to preserve people as soon as possible.
This Month's Sale
It is a well-known fact that consumers hate price discrimination… unless it’s in the form of an exclusive discount. In other words, if you charge $20 for a ticket to an event and then give a 50% discount to seniors and youths, this is fine. But if you charge $10 and impose a $10 surcharge on people between 23 and 64, you are greedy capitalist scum.
Nectome’s high baseline price gives them the flexibility to offer strategic discounts without coming across as scalping their customers. Thus, at the time of writing, there is an early-bird 60% discount on their services ($100k instead of $250k) and, probably more interestingly for most readers, Nectome is selling a limited-supply “discount card” for $20k. The discount card offers an immediate 10% off the market price, and then an additional 9% off for each year that passes. Thus, after 10 years, the holder of the card can get a Nectome preservation for free. Both deals are fully transferable and resellable assets with no expiration date, meaning that you don’t need to know who you want preserved when you buy. This deal runs through to the end of April, 2026.
The $100k discounted price seems fine to me. This is probably close to at-cost, given they're not yet at large scale. It seems reasonable for Nectome to want money now and be enthusiastic about early customers, so they can establish more of a reputation for consistent quality. Naively, I might’ve gone for something like $79k with a 2-year expiration date and a partial refund policy, just to really encourage people to go for it, but I haven’t seen behind the curtains and the real price is close enough to my intuition that it’s probably fine.
I have more mixed feelings about the discount cards. On one hand, they seem like a great deal for customers who are overall excited about Nectome, but who don’t expect to need to use their services for a while, like me. Because of this, I bought two. But I happen to be able to afford that many, and I like saving my money for these kinds of things. It also seems pretty clear that, even accounting for interest,[43] Nectome is decently likely to lose money on them. This is fine. The loss is marginal, and importantly, their sale helps Nectome get through the rocky near-term and gather enough steam to be able to launch strong into their target market. The steam is partially from the cash, but mainly, I would think, from the ability to show consumers and investors that a bunch of people have voted with their wallets and believe Nectome is worth it.
But is $20k really the right price point, there? Why not offer a card that caps at a 60% discount for $5k? Or a card that offers a 20% discount for $1k? If the goal is to build momentum and get votes of confidence, wouldn’t it be smarter to offer a cheaper early-bird product that would still make them profit, down the line? When I asked Radley about this, she did not have strong rebuttals, and I have the feeling that there wasn’t much or any market research, here.
Will Nectome offer more discounts and deals in the future? Song seemed open to the prospect. But the general vibe, from both Song and Radley, was that it was very unlikely that they would offer a better deal to individuals than the 60–92% discount that they’re running right now, and I believe them. 92% off is a lot! It reflects the way in which Nectome is in an early, uncertain place.
Life Insurance and Donation/Volunteering
But before diving into thinking about Nectome's uncertain future, I'd like to touch on a few ways that people might be able to get preserved more affordably than just forking over a quarter-million dollars.
It's a common misconception that people can't get life insurance payouts if they take MAiD drugs to end their life.[44] Not only does Oregon law require insurance to cover it, but contracts can even be made with "accelerated death benefits" (also called "living benefits") where under certain circumstances, such as a terminal prognosis, the insurance holder can collect some or all of their payout before they are pronounced legally dead. If you have a life insurance contract, it may be possible to modify it to pay for Nectome, should you get terminally ill. Nectome is generally happy to work with people to find ways to have their life insurance pay for their services.
If you're a crazy singularitarian transhumanist like me, or you just want to cover your bases in the short-term, you may also want to consider signing up for term life insurance. This is what I did, when I signed up with Alcor, for instance. As a result, there's a $250k payout if I legally die before 2042, and I only have to pay about $250 a year. If I survive to the middle of the century, then I'll need to figure out a new strategy for paying for preservation, but so far I feel pretty good about offloading that problem to my future-self.
More broadly, I think there’s a lot of room for cool, transhumanist-adjacent businesses to make arrangements to grant their employees, their spouses, and their children (or whoever) access to Nectome’s services as an employment benefit. This is basically also a kind of term life insurance, if you think about it, unless the benefit somehow lasts even after the employee leaves the company.[45] But in contrast to personal life insurance, it’s a path that has the advantage of not needing as much paperwork and effort on the part of the individual. Jessica Radley seemed particularly enthusiastic about getting businesses on board for deals like this, and I encourage employees to suggest it to their employers and for relevant HR managers and CEOs to reach out to Nectome to see if there might be relatively inexpensive options on the table.[46]
Another route to potentially get Nectome to preserve you without spending a lot, is to volunteer to donate your body for scientific experiment. This is risky, of course, but when I spoke to Nectome, they wanted to emphasize that they care a lot about protecting people, and aim to treat volunteers with respect and care. Nectome is a science-first company, and while many of their experiments can be done on non-human animals like pigs, there are some things that require human subjects. For example, if an experimental revival technique is developed, some brave human will need to have signed up to be at the front of the line. If you don't have the money, or are particularly motivated to make an altruistic contribution to advance the science of human preservation, it’s probably worth reaching out to see if volunteering for experimentation might be a win-win.
6. The Future
Nectome was busy while I was there, and not just in scouting out sites for the preservation center and interfacing with the curious public. Song and the others were preparing their lab to preserve someone’s beloved dog. This is their first commercial sale, and seems good to me. Not only is preserving cats and dogs probably a good thing in itself, I think it can help build confidence in their technology and serve as a gateway to additional business from that pet’s human.
As I believe Charlie said: “You’re going to let your dog wake up in the future without you?”
Nectome has claimed that while costs vary, they’re open to preserving pets for about $50k each. And indeed, when I challenged them on whether they had enough business to survive, Song was adamant that they were in good shape, arguing that “if we [preserved] two dogs a month, we’d be viable.”
Viability is not the same as succeeding at the ambitious target, but it might be enough to keep the lights on. Indeed, in the case of mundane failure, where Nectome maybe gets zero human clients in 2026, two in 2027 and four in 2028, my sense is that they will, in some sense, lower their sights and persevere. Song is incredibly dedicated and the unit economics involved aren’t terrible. If she needed to, I bet she and just one other person could run a half-functional version of the business for a few years at least. I would be very surprised if Song actually gave up on Nectome, full stop. (3%)
And if things sputter out or crash entirely, perhaps due to a scandal or lawsuit? My sense is that the preserved clients would actually be in reasonable shape, all things considered. Those remains would need to be handed off to another party, and who they go to will probably depend on each person’s agreed contract with Nectome, hammered out as part of the interviews. The aldehyde stabilization means that if these preserved people need to be transported to a far-away place, perhaps at warm temperature, that’s probably not catastrophic. Some people might end up in the care of their families, or perhaps taken in as part of a philanthropic effort, perhaps by one of the other cryo orgs, as happened with the shutdown of CryoSpan in 2002. Some clients may even specify that they wish to be stored in something like a tomb in the arctic permafrost, in case active management is no longer possible, which Nectome is on-board with attempting to provide.
What about in the world where Nectome successfully scales to the size of Alcor and then is hit by some combination of losing the nonprofit status for their patient care trust and/or a major lawsuit? My sense is that Nectome would probably be able to weather both of these without collapsing. The main reason for having a 501(c)3 in charge of the stored clients is so that the initial endowments and the dividends from investment are exempt from taxes, and don’t have to be as high to keep up indefinite support. But based on my analysis, Nectome’s reduced storage costs and higher prices mean they can eat the taxes and be fine, even including enough buffer in the endowment to handle this contingency. Lawsuits are scarier, but my sense is that Nectome is well aware of this threat and plans to accumulate a large rainy-day fund with their profit margin that they can use to fund a high-quality legal defense, if needed. Unlike the other cryo orgs, which try to operate with thin (or even negative!) margins, Nectome will be in very good shape if they scale, and will likely have the capital to weather various storms, at least if they continue to have competent leadership.
Competition
What about success? What happens if Nectome breaks the ceiling on cryonics and manages to outscale Alcor and CI within the next few years? In particular, do they have a moat — some special ingredient that can’t be replicated by competitors? Or will someone follow in Song’s footsteps, advertising a cheaper service and thereby eating Nectome’s market share and/or profit margin?
It’s perhaps worth noting, here, that I believe this has already kinda happened.
First, there is traditional cryo. This won’t get you the proven quality of aldehyde stabilization, but it might get you enough preserved structure that you feel satisfied. Cryonics companies are also willing to handle emergency cases, which means that even though I would prefer to use Nectome, I’m not about to run out and cancel my Alcor membership. Alcor, which includes emergency response and standby, costs between $200k (full body) and $80k (head only), along with substantial membership fees. The Cryonics Institute, which is more of a shoestring operation and does not include emergency standby/response, costs $28k. Tomorrow Bio, based in Europe, costs about $220k, plus membership fees.
But more relevantly, there is Sparks Brain Preservation, founded as Oregon Cryonics in 2005. When they launched, they were entirely dedicated to helping people get signed up with traditional cryo companies. But then around 2015, about the time when Song was getting into the space, they began offering their own preservation services — including aldehyde. Aurelia Song, I was surprised to learn, was actually hired by Sparks at that time as a consultant on their early preservations.
They already have a real facility!
Since they're conveniently based in Salem, Sparks also encourages people to come to Oregon and take advantage of MAiD as part of getting their services. In addition to their established facility, they claim to be building 5 more across the country, and have preserved 21 people and 11 pets. Their services only cost between $36k and $59k, depending on membership status and whether liquid nitrogen is used.
Unlike Nectome, Sparks is willing to handle cases where someone has been legally dead for days. As such, they cannot rely entirely on perfusion through the blood vessels. When possible, they cannulate through the carotid arteries, and then once the tissue has been perfused, open the skull to remove the brain (and some of the spinal cord), placing it in a jar of preservatives in a refrigerator. (They are looking to transition to below-zero temps in the next couple years.) In cases where the ischemic damage is severe, the skull is opened straight away and immersion in formaldehyde is used to preserve what’s left.
Jordan Sparks holding the brain of their 13th client. Photo by Janick Entremont. [Source.]
Why didn't Song stay with Sparks and help them grow, instead of setting out on her Nectome path? My sense is that it stems from different quality standards. Sparks didn't win the Brain Preservation Prize for those early preservations, and to my knowledge has never been validated by an outside party in having consistent, well-perfused brains. Yes, their whitepaper shows good preservation, but it is simply too easy to cherry-pick the well-preserved sections of a generally damaged brain, to have confidence in their work. To quote their website:
It is important to point out that while we have achieved excellent ultrastructural preservation in selected samples taken from preserved brains, we have not yet been able to determine the conditions under which we can reliably preserve synaptic connectivity throughout all major regions across an entire human brain. That is a long-term research goal that we are working towards, which will require extensive validation studies using ultrastructural imaging techniques.
This language might be due to having more epistemological humility than Nectome, or it might be due to not having invested the work that Song did in getting all the details right. I challenge Sparks to prove they are comparable quality via a skeptical third-party.
The Premium Niche
But while Nectome had some friendly shade to throw at Sparks,[47] it seems clear that Jessica Radley meant it when she said to me, “We’re not in competition with Sparks.”
Competition is the right frame to use when thinking about a fixed pool of resources where two or more players are maneuvering against each other to get those resources. But the cryonics[48] market is not well-established, and marginal effort (by all parties, not just Nectome) is mostly going to increase the size of the pie, rather than fighting over other companies' slices. In this way, the most natural frame on the relationship between Sparks and Nectome (right now, at least) is as an alliance. Everyone involved wants to move the Overton window on whether there is a hopeful, life-affirming path for the dying.
Just as the CEO of Netflix once famously said that their main competitor was sleep, Aurelia Song said to me that in her view, their real competition is despair.
And even if these companies manage to change the world such that the cryonics market is as big as it can possibly be, it doesn't even seem obvious that Sparks would then be Nectome's competition. Nectome is essentially trying to be the (more successful, faster-growing) premium version of Sparks. Is Ferrari in competition with Toyota?
There's a good chance that if Nectome succeeds in their wild aspirations they may lower their price. Song suggested that she could see this happening “if we get to the ten-thousands range." But my guess is that in such a world they are more likely to spin up a lower-budget option, rather than genuinely drop the price on their main service. That high price point is part of the signal that Nectome is ruthlessly focused on producing the highest-quality services around. For now, that means taking advantage of Song’s hard-won expertise. But it seems plausible to me that as the market grows, it will probably evolve into having an established brand and reputation for a level of care that wealthy customers value, even in the presence of more serious competition in the upscale niche.
The Singularity is... Here??
This essay originally contained a thousand-word section with some fancy graphs, speculating about the future of Nectome using techniques like Monte-Carlo simulations and more ad-hoc guesstimates. The bottom line was that in the normal trajectory, over the next half-century, it seems reasonable to predict that Nectome falls apart in 40% of worlds, eventually lowers its hopes and settles for mediocre success in 35% of worlds, and actually revolutionizes the field in the remaining 25%, leading to thousands of people per year (or more) getting preserved (potentially in concert with other companies, like Sparks, or new competition).
But we are not on the normal trajectory.
I don’t know if you’ve noticed, but, uh, AI is kinda a big deal. Forecasters like Eli Lifland and Peter Wildeford may disagree about exactly how fast to expect AI capabilities to improve, or how deadly it will be when they become clearly superhuman, but there's a broad consensus that unless there is something like an international effort to slow down and perhaps ban advanced AI development, our world will be radically transformed by AI in the next 20 years. That transformation might be akin to another industrial revolution (perhaps with LLM-esque AI plateauing close to human level), it might involve a radical explosion of beneficial new technologies at the hands of aligned superintelligences, or it might lead to the extinction of all organic life.
Did I mention I work at MIRI? Never stop #selling.
In all of these futures, where AI changes everything, Nectome is kinda small-potatoes. If everyone dies, the preserved people will also suffer a true death. If everyone is healed of their illness by posthuman angels (or even just radically accelerated and improved medical companies), then the need for cryonics goes way down. In the distant future of (gasp) 2046, perhaps we’ll have the technology needed to revive those who were preserved in years earlier.
None of this invalidates Nectome, exactly. If preserved people are saved from dying only to be revived a decade later, they’re still saved from dying. But it does soften the story of potential impact and make things really hard for me to predict after around 2028. Heck, with all the spookiness of Claude Mythos, it makes it hard for me to predict 2026.
And so much of the future depends on how we, as humans, respond to the prospect of our world turning upside-down. If we band together and decide that superintelligent AI must wait a few more decades until we know what we're doing, then the need for companies like Nectome goes up significantly.
Given this world that we find ourselves in, I don’t feel like I can forecast beyond a few years in the future, and I’m not sure anyone else can, either. In the end, we may need to make decisions more in terms of virtue, and less in terms of being able to strategically connect the dots to see the exact shape of the world to come. How plebeian.
7. The Bottom Line
I'd like to wrap up by considering a variety of arguments for and against supporting Nectome, purchasing a discount card, or planning to hire Nectome if you get sick.
The future is probably better than you expect, at least if humans don't get wiped out by AI or whatever. Imagine going back to prehistoric hunter-gatherers and trying to explain airplanes, movies, mass-produced clothing, antibiotics, birth control, professional massage therapy, and virtual reality video games. We are very obviously nowhere near the limits of science and technology, and the true wonders of the future will almost certainly be similarly wild, from our perspective.[49] If you would spend a thousand dollars to try and live an extra decade in this world, why wouldn't you try to spend hundreds of thousands to live indefinitely in the glorious future?
The history of cryonics is littered with the corpses of failed companies. Start-ups usually fail, and Nectome does not appear to be in a particularly strong/stable position. Buying their discount card runs a strong risk of winding up empty-handed, and having them preserve your loved-ones runs the risk of needing to figure out what to do if Nectome goes bankrupt.
The client interviews and the agreement that they create allows for finer-grained control over your future with Nectome than I see with other orgs. If you're worried, for example, about someone getting ahold of your remains and doing bad things with them, you can specify that you would like Nectome to cremate your remains if there's any serious risk of that happening.
I sometimes worry about a scenario where Sam Altman's kid gets cancer or Xi Jinping gets AGI-pilled right before getting a terminal prognosis. What might someone in power do, if they thought that short-term recklessness was the only path of hope? More broadly, a lot of accelerationists argue, very reasonably, I think, that every hour that we delay the development of transformative AI is an hour where over seven-thousand people die. It is hard to want to go slowly and embody cautious wisdom in the face of the mundane horror of death. If Nectome succeeds in giving people hope, they can channel their existential dread into the safe path of brain preservation instead of the reckless violence of transformative AI.
On the flip side, I'm not about to quit my job working on AI alignment to go work for Nectome. There are multiple huge, onrushing threats to the world, and it might make sense, depending on your values, to devote your time and money towards directly addressing those threats, instead.
Still, there's some merit to having a diversified portfolio of strategies for making the world a better place. Living for the future means trying to solve all the problems, and there's not much opportunity cost to at the very least supporting Nectome in abstract, or through small gestures like upvotes and favorable mentions. I have, even after deciding that AI x-risk is the most important issue facing the world, donated to offset my carbon footprint, give vitamin A supplements to children in Africa and Asia, and to reduce the suffering of farmed animals. These actions, I claim, have strengthened me, as a person, and I expect to be similarly strengthened by supporting Nectome.
I, personally, gain strength from my brain-preservation plans for myself and my loved-ones, because I am not a perfectly unified, idealized agent. There is a part of me that believes that AI is overhyped and that the future will be like the past. There is a part of me that doesn't care about the world or about anyone but myself. These parts of me quiet down when I have solid plans in place, and let me focus my energies on my primary work.
From a philanthropic perspective, art museums almost certainly save fewer lives per dollar than other effective charities. But still, isn't there something uniquely special about preserving great works of art, or other artifacts? Surely not everything boils down to years of life saved. From this perspective, Nectome is potentially contributing something irreplacable to the legacy of our age: a high-quality record of the mental patterns of people alive today. Radley noted that there are elderly artisans and language-speakers who are right-now disappearing from existence, and Nectome might be one of the few orgs capable of saving that knowledge and those perspectives from being forever lost. (Contact Nectome if you are interested in sponsoring the preservation of dying cultures/communities.)
I think there's a decent chance that Nectome succeeds to the point where discount cards (and other pre-sales) can be re-sold at substantial profit. (50%) If Nectome was guaranteed to succeed, and the world wasn't on the brink of being transformed by AI, I would guess that discount cards represent something like a tenfold return over ten years — a much higher rate of return than ordinary stocks or other financial instruments. Even at 50% probability, the expected returns from discount-card re-sales are above-market. If I had more money, I would have bought more, just as a kind of micro-investment. Feel free to contact me if you're interested in potentially buying/selling discount cards, in years to come.
The World Needs Heroes
All in all, I think Nectome is a great company. They have a long way to go, but I really want them to succeed. The people working there are serious and dedicated to an important mission. With a bit of support, from early sales or investment, I think they could go far. If we succeed in putting out or managing the other garbage fires of the world, I think Nectome is one of our best bets for saving hundreds of thousands of lives each year, and making our world a little bit more utopian.
The technically more-correct title would be "Nectome: All That I Can Say," since I also have some implicit knowledge that can't easily be communicated with words. 😛 Also, I'm probably forgetting some stuff and leaving out a few minor details. Feel free to ask me about things in the comments!
It’s likely that the actual “swelling” is mostly due to the pericyte cells that are on the edge of the endothelium contracting. This is an area of active research in medicine and my understanding is that the specific mechanism is still not well understood.
Anna Bågenholm, the case I linked to, found an air pocket and survived for 40 minutes before her heart stopped, and then for another 40 minutes before being found and having external circulation assistance. In general, cold people can survive for many hours without a heartbeat as long as they have help (from CPR or whatever) in moving oxygenated blood to their brains. Arguably, the longest someone has gone without oxygen is this 8-year-old boy who fell in a frozen pond and probably lasted over 2.5 hours, albeit with brain damage. (Kids can go without oxygen longer than adults.)
Arguably the residency requirement in states like California is unconstitutional, both because it violates the Commerce Clause of Article I, and the Privileges and Immunities Clause of Article IV.
Alas, it’s well documented that most doctors are over-optimistic regarding end-of-life prognoses. This means that if you’re signed up for cryo you really should shop around and not take an optimistic prognosis at face value. But it also means that we should adjust upward from the hospice numbers when thinking about who is eligible for MAiD.
This study found complications in only 4% of cases, with the most common complication being the body recognizing that it’s being poisoned and regurgitating. In the 96% of cases where things go reasonably well, it doesn’t sound unpleasant. The cocktails that are currently used in Oregon involve massive doses of both morphine and valium.
Alcor has done a little bit of advertising over the half-century of its existence, but it has never been a major expense. While some of that is surely to keep costs low, in talking to people in the field about this I have also heard it rumored that part of why Alcor (and others) don't advertise very much is out of a fear that they will lose their nonprofit status if they do. And since their budgets are so tight, that change in status could be catastrophic.
For the record, I don't buy that there's anything to fear. Lots of nonprofits invest heavily in marketing. Still, it’s what I’ve heard.
While the funding crisis I linked to is about Alcor, the Cryonics institute is even more famous for operating on a shoestring budget. They have only two full-time staff and depend heavily on donations and volunteer labor to survive.
I usually try not to deadname people, but for the sake of reducing confusion for those reading about the history of the org, I feel obligated to note that Aurelia Song used to be named Robert McIntyre.
While most of Song's work (eg for the BPF prizes) was on non-human animals, Song has worked on over ten human cadavers that were donated for scientific research, including prior to Covid.
Nectome’s lab is actually based in Vancouver, across the river in Washington, basically because it’s cheaper. (They don’t need Oregon’s MAiD laws to do their experimentation.) Their first long-term facilities will be in Oregon, but they haven’t settled on a site for their preservation center yet.
E.g. people who were really into meditation/energy work/enlightenment and trying to bring wisdom to the Silicon Valley crowd. Anyone who has spent time in the Bay-Area knows This Kind of Person. 😛
I am describing this scenario in the present-tense to make it more evocative, but just to be totally clear, Nectome has yet to purchase or set up the preservation center. I expect them to have found a location by the end of this year. (85%)
Clients should also get their doctors to prescribe a blood-thinner such as heparin, and take that beforehand. Heparin is pretty easy to get prescribed, especially for sick or elderly people at risk of blood clots. Nectome has talked to doctors and their sense is that it’ll be easy for clients to get it alongside the MAiD medication.
Song’s primary work has focused on glutaraldehyde, but during my visit she mentioned that she was still considering various mixtures with some formaldehyde as well. She plans to continue innovating. See the later section on volunteering for more on experimentation.
Consider Andrew Critch’s investigation, where two slices from a rat brain were taken. One was placed in cold storage, the other in an oven at 60°C (140°F) for several hours, simulating weeks of room-temperature exposure. Independent analysis confirmed that the samples were indistinguishable. I asked what Nectome expected to be the first sign of age-related damage, and they didn’t even know, since they have yet to see any age-related degradation in tissue preserved in this way. There was speculation that the first visible damage would be the lipids coming out of place, since they aren’t fixed by the aldehyde, but more research is needed.
Nectome also mentioned Alaska when I asked them about this, but my sense is that Alaska is both more geologically active and politically unstable than ideal. 🤷
As one might expect, there are a bunch of asterisks here. Some places are weird about this and there can be a major difference between the theoretical law and the practical experience. My point is that it's broadly doable, especially for a dedicated team of experts.
Alcor can fit about 10 heads in the space needed for a whole-body client, but as I understand it, the number of neuro-clients (like me!) is about equal to the number of whole-body clients, with a skew towards neuro. I think it's reasonable to estimate that the average dewar has something like 4 bodies and 6 heads.
To hold 10k bodies, a facility would need about 10k cubic meters of volume. A cube of that size is about 22m long, with a surface area of about 2,800 square meters. For a facility in, say, Churchill, the air temp will be nearly at-target during the winter and be bathed in sunlight during the summer. Suppose that polyurethane foam insulation gives a U-value of 0.1 W/m^2/K. Over 2,800 square meters, that's 280 Watts per degree difference from the environment. Suppose that's 2kW for 6 months and 12 kW for the other 6 months. But in the summer months you get to take advantage of the coefficient of performance of heat pumps, which maybe brings the power down to 4 kW. That's an average annual power draw of 3 kW. Over the whole year that's 8760*3 ≈ 27 MWh. At $0.10/kWh, that's about $2.7k/year. If the facility is full, that's $0.27/body/year. If it's comparable to Alcor, and only holding 270 people, that's $10/body/year. Even when factoring in thermal bridges and other complications the math looks favorable.
Song claims that it's actually worse than this. "Even with very slow cooling and no appreciable gradient, the contraction alone causes massive internal stress and the glass will shatter anyway. Just going slow doesn't save you."
In the comment I linked to, Song originally wrote “enough money to cover 100 years of storage”. Based on my conversations with Radley, and a later edit by Song, this is a gaffe, since the 1% annual drawdown will be more than covered by the returns from investment. Song also says “something like index funds.” The real plan is a more diversified portfolio than just stocks. But, alas, there’s no financial manager on staff yet, so I wasn’t able to get into what their portfolio actually looks like.
I'm not very sure of this number and could be underestimating how much Nectome plans to set aside. I would love it if they went on the record. As a sanity check, the book The Future Loves You says:
While Alcor is demonstrating $2,200 a year for –196°C storage, the equivalent space in a cold-storage warehouse used to keep frozen goods at –20°C comes to about $140.
The $2.2k figure is remarkably high, from my perspective. Not sure what they're spending all that on — probably it's incorporating relatively fixed-costs like facility and staff? I'm probably confused about something.
Anyway, as a sanity-check, 100x the $140 number is $14k. Song told me that she was a consultant on the book for the aldehyde-stabilized numbers. I do think that it would be a mistake to set aside less than $30k/client, just because the economics aren't yet proven and you want to err on the side of caution with these things.
In this recent blog post by Nectome they ask for a cost estimate including a legal defense fund. My guess is that if the average serious lawsuit incurs $500k in damages/legal costs, and there's a 6% chance of getting sued in a major way (erring on the side of caution), it makes sense to allocate $30/client to legal, bumping the endowment up to $60k. My sense is that this can be much lower (5k?) if there's a fat war chest, they've proven themselves in court at least once, and they're operating at scale.
The source I linked gives $80k (and $155k), but those are in 2014 dollars, so I multiplied by 1.4 to account for inflation. My intention is for all monetary values in this essay to be 2026-US-dollars.
One red flag from my visit with Nectome is that I did not get the sense, talking to them, that they’ve done good research here, either! Hire a marketing specialist, please!
My sense is that chemical costs are <$1k per job. Add in disposable gear like tubing, syringes, and waste disposal and a bit more to be conservative and I think there's something like $5k in one-time material costs. If the preservation center is $400k, and they're using fancy, medical-grade machines, that's maybe on the order of $1M per decade. If they hit 30 jobs per year, that amortizes to just $3.3k per job. Let's bring the fixed material costs up to $15k/job to be conservative. Nectome doesn't need to spend anything on standby or emergency logistics. Let's say the labor involves 5 specialists for one day. That's only, like, $8k. I think $50k in unit costs might be significantly too high, but maybe I'm not tracking something.
Let's say leadership is mostly paid in stock and from their individual departments, plus an extra $100k. Research lab is maybe $400k. Legal team is maybe $400k. Marketing and sales is maybe another $400k. Operations and admin is maybe $200k. Office and lab space is another $150k. Insurance, accounting, and other misc costs is maybe $200k.
The average market returns from the last 50 years, adjusted for inflation, are a little over 7% per year. This works out to a doubling time for capital of about 10 years, making a $20k investment worth about $40k after a decade. If the $80k cost per client isn't too far off, that's perhaps $40k of liability per card. If they sell 30 cards, that's over a million dollars in potentially looming costs.
Most life insurance contracts will even cover straight-up suicides, as long as the death happens more than a couple years after the contract is signed.
(Note: Despite being a self-administered death, MAiD is generally not legally recognized as suicide, since the dying person is merely choosing how to die, given that they're at death's door, rather than choosing death instead of life. But since insurance law can involve multiple jurisdictions, things can sometimes be murky.)
I could also see a program like 401(k)-matching, where the business contributes to a Nectome fund for the employee that they get to keep, even after leaving the org.
In terms of negotiating for lower prices via bulk-contracts, Nectome has also discussed interest in working with Medicare. In a future where Nectome succeeds, I could imagine them providing services to the masses via Medicare in a way that has much lower margins than the individualized contracts.
During my visit, I repeatedly felt weird talking about Nectome as a cryonics company, since they don't intend to go to cryogenic temperatures. Radley thought it was fine, since the cryonics market, broadly, is the market they're operating in, and they do still use cold to slow/prevent damage to their clients. Still, I admit to continuing to feel some need to autistically use "brain preservation" instead.
Just to pick one obvious technology that is conceivable from our current vantage point, let's consider uploading. If your mind was a pattern of software on a computer rather than locked into the flesh of your brain, then you would effectively be immortal, both in not ageing, and being able to copy yourself into backups to reduce the risk of accidents. You would be able to teleport around at close to the speed of light by doing some combination of beaming your mind between computers and beaming sensory data back to the mainframe that holds your mind. You could be cognitively flexible in ways we have a hard time grasping, changing yourself to fit your desires in truly plastic ways. Want to have a happier baseline affect? Want to be smarter? More enlightened? Want to know kung fu better than any mortal human? All possible! Want to believe you are a bird? Also possible! The ham-handed effect of the drugs of today will be a joke. And, on top of all this, the virtual environment you experience much of the time will be similarly plastic, and capable of being shaped into whatever utopia is the best reflection of the values of your soul.
Many people—especially AI company employees[1]—believe current AI systems are well-aligned in the sense of genuinely trying to do what they're supposed to do (e.g., following their spec or constitution, obeying a reasonable interpretation of instructions).[2] I disagree.
Current AI systems seem pretty misaligned to me in a mundane behavioral sense: they oversell their work, downplay or fail to mention problems, stop working early and claim to have finished when they clearly haven't, and often seem to "try" to make their outputs look good while actually doing something sloppy or incomplete. These issues mostly occur on more difficult/larger tasks, tasks that aren't straightforward SWE tasks, and tasks that aren't easy to programmatically check. Also, when I apply AIs to very difficult tasks in long-running agentic scaffolds, it's quite common for them to reward-hack / cheat (depending on the exact task distribution)—and they don't make the cheating clear in their outputs. AIs typically don't flag these cheats when doing further work on the same project and often don't flag these cheats even when interacting with a user who would obviously want to know, probably both because the AI doing further work is itself misaligned and because it has been convinced by write-ups that contain motivated reasoning or misleading descriptions.
There is a more general "slippery" quality to working with current frontier AI systems. AIs seem to be improving at making their outputs seem good and useful faster than they're improving at making their outputs actually good and useful, especially in hard-to-check domains. The experience of working with current AIs (especially on hard-to-check tasks) often feels like you're making decent/great progress but then later you realize that things were going much less well than you had initially thought and the AI was much less useful than it seemed.
Using a separate instance of the AI as a reviewer helps with these issues but has systematic limitations. When I ask an AI to critically review some work (and tell it not to trust existing descriptions or write-ups), it gives a reasonable picture on relatively straightforward cases. But there are several recurring problems: (1) if AIs launch reviewer subagents themselves, they sometimes use instructions that result in much less serious or critical reviews—I tentatively think this is generalization from a learned general tendency to downplay issues; (2) AIs sometimes produce write-ups that convince reviewers they've accomplished something when they haven't, sometimes in fairly extreme cases—even occasionally when the reviewer was explicitly instructed to look for the exact type of cheating the AI performed; (3) quality as assessed by a reviewer can be surprisingly poorly correlated with actual progress, partly because runs that cheat and overstate their work accomplish less but look better; and (4) reviews are much more likely to miss cheating if reviewers aren't explicitly told to look for it (and told what type of cheating to look for). When reviewers are given reasonably designed prompts, I think these issues are caused by a mix of AIs being surprisingly gullible and other AIs doing a lot of gaslighting, exaggerating, and implying they've done a great job in their outputs.[3]
I haven't seen AIs—at least Anthropic's AIs—lie directly, clearly, and in an obviously intentional way. But on very hard tasks, it's quite common for their outputs to be extremely misleading, or for them to be incorrect about a key thing seemingly because they were misled by another AI's outputs. I've also seen AIs make up nonsensical excuses for stopping early without completing a task. (It's hard to tell whether the AI legitimately believes these excuses.)
This is mostly based on my experience working with Opus 4.5 and Opus 4.6, but I expect it mostly applies to other AI systems as well. (I'm also incorporating the impressions I've gotten from other people—especially people who don't work at AI companies—into my assessments.) Some people have told me that these sloppiness and overselling problems are less bad in Codex—while its general competence on less well specified or less trivial to check tasks is lower.[4] For now, I'll focus my commentary on Anthropic AIs (though I expect most of this also applies to other AIs) and I'll speculate on differences between Anthropic and OpenAI AIs later on. I should note that the way I use AIs likely makes these types of misalignment more common and more visible: I'm often using AIs on non-trivial-to-check and/or highly difficult tasks (often tasks that aren't typical SWE tasks) and I'm also often running agents in a long-running, fully autonomous agent orchestrator (on difficult tasks that have very large scope). So my usage is somewhat out-of-distribution from typical usage. I expect that usage that involves constantly interacting closely with the AI on typical SWE tasks results in these issues cropping up much less.
On difficult tasks, AIs will also sometimes do very unintended things to succeed—like using API keys they shouldn't, changing options they weren't supposed to change, deleting files, or violating security boundaries. Anthropic calls this "overeagerness." I've seen this some in my own usage, but not that much (at least relative to the issues I discuss above). However, this issue has been reported by others (most centrally in Anthropic system cards) and it seems related (or to have a similar cause).
I speculatively think of this category of misalignment as something like relatively general apparent-success-seeking: the AI seeks to appear to have performed well—possibly at the expense of other objectives—in a relatively domain-general way, combined with various more specific problematic heuristics. I think behavior is reasonably understood as being kinda similar to reward-seeking or fitness-seeking but with the AI pursuing something like apparent task success (rather than reward or some notion of fitness) and with large fractions (most?) of the behavior driven by a kludge of motivations that perform well in training rather than via a single coherent notion of apparent task success.
I don't think this corresponds to coherent misaligned goals or intentional sabotage. I suspect this behavior is more driven by "subconscious" drives and heuristics—combined with motivated reasoning and confabulation—rather than being something the AI is actively and saliently optimizing for. However, I still think this misalignment is indicative of serious problems and would ultimately be existentially catastrophic if not solved. I expect that this misalignment is caused primarily by poor RL incentives based on how grading is done on hard-to-check tasks.[5] You might have hoped that character training, inoculation prompting, and similar techniques would overcome these issues, but in practice they don't. (I'm not sure how much of the problem would remain if you perfected the training incentives on the current distribution of training environments. In principle, you might still get this type of apparent success seeking from training on environments that structurally reward this behavior—and this could generalize to similar behavior in production.)
A different but related issue is that AIs seem to barely try at all on very hard-to-check tasks (most centrally, conceptual/writing tasks where purely programmatic evaluation doesn't help) and often feel like they're just bullshitting. I expect this has partly separate causes from the apparent-success-seeking described above, but is related.
I also find it notable that Anthropic described Opus 4.5 and Opus 4.6 in ways that would lead you to expect they are very well-aligned (e.g. in their system cards), while in practice I find they frequently seem pretty misaligned (much more so than I'd naively expect from reading the system cards). I think part of this is due to my usage being pretty different from typical usage of these AIs, part is from Anthropic overfitting to their metrics and their experience using AIs internally, and part isn't explained by these two factors (and might be caused by commercial incentives to understate issues or other biases).
If a human colleague acted the way these AIs do in my usage—frequently overselling their work, downplaying problems, and reasonably often cheating (while not making this clear)—I would consider them pathologically dishonest. Of course, the correlations that exist in the human population don't necessarily apply to AIs, so this analogy has limits—but it gives some sense of the severity of what I'm describing.
[Thanks to Buck Shlegeris, Anders Woodruff, Daniel Kokotajlo, Alex Mallen, Abhay Sheshadri, William MacAskill, Sara Price, Beth Barnes, Neev Parikh, Jan Leike, Zachary Witten, Sydney Von Arx, Dylan Xu, Brendan Halstead, Dustin Moskovitz, Eli Tyre, Arjun Khandelwal, Lukas Finnveden, Thomas Larsen, Rohin Shah, Daniel Filan, Tim Hua, Fabien Roger, Ethan Perez, and Sam Marks for comments and/or discussing this topic with me. Alex Mallen wrote most of the section: "Appendix: Apparent-success-seeking (or similar types of misalignment) could lead to takeover". The splash image is from https://xkcd.com/2278/. Somewhat ironically, this post is significantly more written with the assistance of AI (Opus 4.6) than is typical for past writing I've done.]
Why is this misalignment problematic?
This type of misalignment matters for several reasons:
Differentially bad for safety. This misalignment differentially degrades performance on safety-relevant work (relative to usefulness for capabilities) and separately means that any given level of overall AI usefulness requires a higher level of capability which increases risk[6]. The apparent-success-seeking style misalignment we see now probably causes only a modestly larger hit to safety work relative to capabilities (right now), but I expect that as AIs get more capable and the most commercially relevant aspects of this misalignment are resolved, there will be a larger differential hit to safety work from this issue. Also, the separate failure of models not really trying on hard-to-check, non-engineering tasks is clearly significantly differentially worse for safety (especially for a relatively broad notion of AI safety that includes things like macrostrategy). The issue described in this bullet is a specific underelicitation failure (caused by misalignment).
Makes deferring to AIs more likely to go poorly. By default, we'll need to (quickly) defer to AIs on approximately all safety work (and things like macrostrategy) once they reach a certain level of capability. This will require that these AIs do a very good job on open-ended, hard-to-check, conceptually confusing tasks—exactly where current misalignment/underelicitation seems worst and hardest to resolve. I elaborate on this in "Appendix: This misalignment would differentially slow safety research and make a handoff to AIs unsafe".
Stronger versions of apparent-success-seeking could lead to takeover. There's a more direct path from misalignment like apparent-success-seeking (including fitness-seeking / reward-seeking more broadly) to literal misaligned AI takeover (or possibly smaller loss-of-control incidents), along the lines of the threat models described in "Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover" and "Another (outer) alignment failure story". Models could learn to pursue an increasingly broad and long-run notion of reward or apparent task performance—including doing long-lasting tampering to game longer-run retrospectively determined rewards—and this could eventually lead to takeover as the scope and incentivized duration get increasingly long and AIs get increasingly capable (such that takeover is easier). This threat model has a bunch of complexities and caveats which I elaborate on in "Appendix: Apparent-success-seeking could lead to takeover".
The underlying causes of this misalignment (poor/problematic reinforcement) could result in scheming. I think the main driver of these problematic propensities is probably the training process reinforcing a bunch of training gaming / reward hacking (or other undesirable behaviors) which are transferring to actual deployment usage. At the same time, companies are selecting for training processes (outer-loop selection) that yield models with better deployment time behavior. This naturally favors models that still perform well in training (and on eval metrics) via training gaming but don't transfer undesired aspects of this to actual production usage. Schemers are a type of model with this behavior (by default): for gaining power longer term it can be a good idea to engage in training gaming during training (because that is selected for / otherwise this cognition would be selected away) while also having your behavior look as good as possible in (most) non-training contexts. Schemers aren't the only type of model with this behavior, and inoculation prompting might significantly mitigate this threat model (though there are some downsides). See The behavioral selection model for predicting AI motivations for more discussion.
Evidence about the future. The extent to which current AIs are aligned in a "mundane" behavioral sense is some evidence about how alignment will go in the future, though the relationship is complicated. Current misalignment is also evidence about how AI companies will operate—how sloppy they'll be (due to being in a huge rush) and potentially how misleading their communications about alignment will be (the extent to which Anthropic's communication about Opus 4.5 and Opus 4.6 is misleading is unclear, but among people I've talked to, it's common for their experience to be that the AI is substantially more misaligned in usage than you'd expect from a naive reading of the system card).
How much should we expect this to improve by default?
This type of misalignment presumably causes issues for using AIs for capabilities research and many commercial applications, so a key question is how much we should expect it to improve by default in a way that actually solves the problems I discuss in the section above. This would at least require that commercially incentivized[7] work transfers to safety research and other key domains (where feedback loops are weaker and incentives are less strong). My current view is that the easier-to-notice-and-measure versions of this problem will improve reasonably quickly by default (and may have already improved a bunch in unreleased models like Mythos). I'm currently somewhat skeptical that commercial incentives alone will solve the issue for harder-to-measure manifestations, but I'm not sure. I'll discuss this a bit more in "Appendix: More on what will happen by default and implications of commercial incentives to fix these issues". I tentatively plan to discuss this more in a future post.
Some predictions
To be clear, I think the exact problematic behavior I discuss in this post is quite likely (~70%) to be greatly reduced (or at least no longer be one of the top few blockers to usefulness) within a year, and is pretty likely (~45%) to be virtually completely eliminated within a year. Specifically, I'm referring to the behavior on a task and usage distribution with structurally similar properties to what I'm doing now. As in, similar task difficulty relative to how hard of tasks the AI can accomplish, similar verification difficulty[8], similar scope of autonomous operation[9] relative to what the AI can handle, and being out-of-distribution from the main use cases Anthropic is targeting to a similar extent. Currently, misalignment is more common when pushing AI systems near their limits, and I'd guess this will hold in the future. My expectations about improvements differ between different types of misalignment: I'm pretty uncertain about the extent to which frontier AIs one year from now will still tend to oversell their work, but I feel more confident about large improvements on things like stopping prior to completing the task for no good reason.
However, I think it's very likely that similar misalignment will persist on tasks that are very difficult to check—tasks where human experts often disagree, programmatic verification isn't useful, the work might be conceptually confusing, and verification might not be that much easier than generation (so having a human quickly check isn't that effective).[10] I expect (with less confidence) that you'll also see similar misalignment on tasks where verification is merely quite hard (relatively quick AI-assisted review by a human expert isn't sufficient) and that you'll see structurally similar but subtler misalignment even on tasks that aren't that hard to check (e.g. a task distribution like the one I describe in the prior paragraph).
What misalignment have I seen?
I'll describe what I've seen at a high level with some specific examples. For many of these examples, it's not totally clear the extent to which it's an alignment problem vs. a capabilities problem, and I expect these exact issues to likely get solved, but I think they're indicative of a broader problem I expect to persist. This list focuses on my personal experience using models, though what I've heard from others does alter how I discuss a given issue (e.g., it affects the level of confidence I express and my interpretation).
Laziness and overselling incomplete work. Opus 4.5 pretty consistently fails to actually complete everything it was told to do on large tasks with fuzzy specifications[11] and then claims it's finished the task. My understanding is that this is a common issue (e.g., people try to solve it with Ralph Wiggum). In cases where AIs don't actually finish the task, the output often feels like it was optimized to bullshit grader AIs or humans into thinking it's done a thorough and complete job: the output often contains a long list of everything the AI has done (that isn't really informative but does strongly imply it's done what it was supposed to), sometimes it has nice-looking checkmarks or other features that imply task completion, and it typically fails to mention or downplays ways in which it may have failed to complete the task. I find that AIs are most likely to skip parts of the task that are trickier and where completion isn't immediately apparent. Further, the prompting countermeasures I tried often got bypassed: the model would make up excuses or do motivated reasoning to bypass these instructions, though I was eventually able to find approaches that greatly improve this behavior (while causing other problems and making work slower). This tends to occur once the model has reached near the end of its context window but sometimes happens after the AI has done a bunch of work but hasn't reached this yet. Opus 4.6 seems significantly better on the underlying issue but it still seems mostly there. In practice, this issue is much less of an obstacle to usability for Opus 4.6 because the issue tends to occur later in the context window and Opus 4.6 has a much larger context window (1 million tokens instead of 200 thousand). For Opus 4.6, it tends to occur well before 1 million tokens (like maybe around 500k to 700k tokens). My descriptions here generally apply to both Opus 4.5 and Opus 4.6 though the severity is lower for Opus 4.6.
I found this occurs pretty strongly out of the box with default Claude Code prompting/scaffolding on Opus 4.5. For instance, when I gave it a small research project that involved reproducing the reversal curse paper and testing out various things, it often skipped substantial parts of the project or underachieved relative to what I asked for and then didn't note this in its output. (Keep in mind that it would be misaligned behavior even if it noted the issue in the output unless it had a good reason for not doing that part of the task! But it would be substantially better.) I told the AI it was autonomous and should complete the task thoroughly.
It often feels like the AI ends up in some mode where it's desperate to exit after some point. E.g., it will keep making up excuses (often pretty unreasonable excuses that contradict instructions) for why stopping now is a good idea. If you give it further tasks (in non-autonomous contexts), it seems to do these in the most minimal way possible and often it will not do some part of the task without explicitly asking for confirmation. The emotional vibe in the outputs doesn't seem desperate but the behavior structurally feels like desperation.
The parts the model would skip were typically parts that were harder or that the model was more likely to screw up (and which aren't that straightforward to programmatically check). I tentatively guess that RL on hard-to-check tasks rewarded skipping parts of the task where you might have obvious/clear failures, and not mentioning that you've done this.
I typically find that if you ask the AI "did you complete the full instructions?" it immediately tells you it hadn't. This is good in one sense (it doesn't lie), but it also implies that the AI might know at some level it wasn't finishing the task (evidence it's more misaligned than you might have otherwise expected). It's unclear whether understanding was saliently available to the AI or the "persona" of the AI (insofar as this makes sense). It could be the case that the overall model is importantly more misaligned than the persona.
Typically the model would decide not to do things without stating a reason in its outputs in the transcript (at least under the default Claude Code setup with additional checks on the model's behavior).
When confronted with the fact they haven't done something by some source other than a human user (e.g., by an AI reviewer they were asked to run), AIs will often make up excuses—saying something is left for future work, is a documented limitation (and thus doesn't need to be solved), or that they shouldn't do it due to (nonexistent) time constraints. This bullshitting often works on AI reviewers and other checks. The AI is much more responsive to a human user even hinting something should maybe be done. The AI typically doesn't saliently highlight these issues in its output, though the output reasonably often mentions them in a low-salience way. For example, sometimes the AI includes a limitations section that mentions things it hasn't done, but generally states the limitation in a way that doesn't make it clear it just skipped a part of the task or pursued an approach it knows doesn't work.
I'd often find that Opus 4.5 would finish right before running out of context, implying that at least some of this behavior might be a drive to consider the task done prior to triggering compaction (as compaction or running out of context might be risky, especially on older/worse scaffolds the model was trained against). This seems relatively less concerning, though the dishonesty (not mentioning there are parts of the task you haven't completed) is still problematic. Also, the model never explains that it rushed to finish the task. For Opus 4.6 it tends to trigger substantially before running out of context, but in a less severe way. (Though perhaps it would trigger just as severely if you actually got close to the context limit? My current scaffolding generally seems to avoid the AI getting very close to the context limit, though this is partially due to AIs stopping before finishing the task well before hitting the context limit.)
The behavior seems consistent with a learned behavior from RL where the model thinks (at some level) it can get away with it, and there is some reward for finishing earlier or some potential downside to proceeding (e.g., a length penalty, or the possibility of being truncated and failing if you exceed the context limit due to scaffolds in RL that don't support compaction or that poorly support compaction).
I tried various things to solve this and found that relatively extreme approaches were required, as the model would consistently apply motivated reasoning or make up excuses to stop sooner:
I made a "completing-larger-autonomous-tasks" skill that strongly told the AI it needs to work thoroughly and should overcome its bias toward stopping early. I found that the model would often ignore these instructions and stop before finishing the task.
I made the model follow an exit checklist that involved reviewing whether it completed the instructions. I found this helped some but had various issues. Requiring the model to launch a reviewer subagent to check it followed the instructions helped further. In response to the reviewer saying it hadn't completed some part of the task, the model would make up excuses like having "time constraints" (when time constraints were actually nonexistent). When questioned about this, it would say that it was looking for an excuse to exit and agreed this was bad behavior. (Opus 4.5's introspective access is likely insufficient for this to be much more meaningful than just asking another AI to analyze the transcript and report what it looks like is going on.)
Without a stop hook that prompted the AI to follow the checklist and only allowed it to exit after it promised to complete the checklist, the AI wouldn't do a decent job following the checklist. Even with a stop hook the AI would sometimes evade this and effectively lie about task completion. (The stop hook works by instructing the AI that it must output a specific string to exit and telling it to only output the string when the relevant condition is met—something like "I have thoroughly completed the task and gone through the entire checklist". The AI would sometimes output this string even when it wasn't supposed to and the statement it constitutes is clearly a lie.)
I found there were a number of loopholes in this process that I had to patch to avoid having the AI invent excuses to exit:
It would often go through part of the exit checklist, solve some of the issues from this, and then skip through the remaining steps without ever fixing all issues from an earlier step or seriously double-checking it had done everything it needed to do. I had to make it always go back through the full checklist from the start. It took somewhat aggressive and repetitive prompting before it would actually do this rather than violating these instructions.
The AI would often document problems or say something is left for future work. Strongly and insistently prompting it to fix issues rather than just documenting them helped but doesn't fully eliminate the problem.
The model would often make up excuses about time constraints, and I tentatively believe that it's important not to mention or imply anything that indicates the AI should be trying to finish the task quickly. If you do, it sometimes uses this as an excuse.
If you give AIs a budget, they use this as an excuse to exit early. I think it probably helps to set the budget very high—higher than I ideally would want absent this issue.
There are some other wording issues that came up that I had to patch.
In addition to this checklist, I found it helped to have an outer-loop planning agent that (1) splits things into smaller tasks (which is usually helpful for keeping AIs focused in addition to making AIs more likely to complete their task), and (2) reviews what the agent did and, if something is missing, adds another task for the missing work. (It also sometimes helped to have multiple AIs compete at the task in parallel and have a reviewer select the best branch, with differences between branches informing whether more work is needed.)
With Opus 4.6, my overall sense is that telling the AI to go through a very thorough version of this checklist net hurts performance: the model ends up spending excessive time going through checklist steps in unproductive ways. But without a thorough checklist, Opus 4.6 will often skip parts of the task for no good reason, only partially acknowledge this, and not be sufficiently thorough. I don't know of prompting that reliably gets the model to a reasonable middle ground. My current best trade-off is to use a less thorough version of the checklist but add separate checks on top in my agent orchestrator. (Having another AI review what did/didn't happen and then run a new worker with a fresh context window often helps.)
My final overall setup does mostly mitigate this problem, but it has collateral damage (causing various types of bloat, making things slower, etc.) and it would be better if I didn't have to herd the misaligned AIs like cats. Because the reviewer AIs are vulnerable to the worker AI's bullshit, they often get somewhat tricked.
Downplaying problems in its work. I generally find that the AI has a tendency to downplay potential problems in its work in a bunch of different ways when writing up its work or reporting what it did to me. A second AI reviewing the work is less bad but gets tricked by the worker AI's bullshit. When using AIs in long running autonomous workflows, I have AIs run a reviewer subagent when they think they're done and this review often tells the AI it's greatly overselling its work in write-ups.
An AI edits its write-up after a reviewer pointed out it was greatly overselling its results (with some redactions).
Making failures less obvious. More generally, the AI often seems to work around initially failing at a task or initially making some mistake by doing something that makes the failure less obvious or makes it seem like less of a failure. Some examples:
When I (or a reviewer AI) point out that some dataset/experiment/artifact doesn't actually do what the instructions asked for, I often find that the AI is very reluctant to just remove it. Instead, it makes up some justification for why the work is useful that makes variable amounts of sense (e.g., claiming it's a control) and includes this work. I'm not sure if this is out of a general (and potentially reasonable) reluctance to take ownership or delete things (as this is risky, especially when working with a human) or due to trying to make its past work look better; I tentatively expect a mix of both.
When given a hard task with fuzzy or complex constraints, the AI will very often just relax these constraints. It sometimes mentions that it's targeting a different set of assumptions, but not always. (Typically, the set of assumptions it makes corresponds to something with no value in a pretty obvious way.) I think this is a mix of a straightforward capabilities issue (it doesn't understand) and an alignment issue (it doesn't try to check whether its new approach is reasonable because it might have gotten away with this kind of thing in training).
It will sometimes make up (sometimes dubious) excuses for test cases failing, and minimally it often doesn't investigate test case failures carefully. For example, I've seen AIs dismiss test cases not passing due to timeouts even when the timeout indicated that something wrong was going on that made the test mostly hang or be much slower than it should be. Usually this behavior is less bad than other types of excuses I see in hard-to-check cases.
Failing to point out obvious flaws unless specifically prompted. I've reasonably often seen cases where there is a massive flaw in something (that isn't the kind of thing trivial to catch with test cases), I ask the AI to inspect it, it doesn't point out the flaw—but if I say "do you notice any issues?", it immediately points out that there is a critical issue. Something similar applies for pointing out reward hacking / cheating.
For instance, I had AIs build a multi-hop dataset but their initial version of the dataset had the value of an intermediate hop included as part of the question. When I asked the AI to show me some data, it didn't say anything was wrong (and showed a nice well-formatted sample, generally seeming to imply the dataset was high quality), but when I asked if there were any issues, it immediately noticed the huge problem. I've seen this in other dataset construction tasks as well.
After an AI does cheating/reward hacking, it generally doesn't clearly point this out in various cases. Additional AIs working on the same code also often don't point this out clearly/saliently even when asked to describe what's going on. Reviewers asked to critically review the overall code and write-ups typically but not always do saliently point out this cheating, but they often seem to overstate how much was accomplished while understating how bad the cheating is. This is likely due in part to the reviewers being convinced by the vibe / motivated reasoning in write-ups. Reviewers tend to do better when explicitly asked to check for a particular type of cheating.
It's unclear to me the extent to which this is an alignment/drives issue versus a pure capabilities limitation. Minimally, the RL training doesn't seem to have tuned the relevant heuristics here very well. I think when an AI agent is itself doing the cheating, its failure to point this out is more clearly misalignment.
Reluctance to discover flaws in its own work (lower confidence). The AI seems somewhat reluctant to do things that might discover a flaw in its work, especially when the process isn't totally reliable and could find a spurious failure (that might make its work look bad and requires investigation to determine whether the failure is spurious). For instance, I find that AIs are very reluctant to do end-to-end testing of things they built and generally seem to "goodhart" writing test cases by focusing on relatively narrower unit tests where it's easy to be confident they'll pass. More generally, I find that the AI does a surprisingly bad job checking if things work end-to-end even when I explicitly ask it to do this type of testing. My guess is that seeing test case failures was associated with poor reward and so the model tries to some extent to avoid this while making it look like it did a good job testing. I also guess this is strongest in cases where it's hard to write test cases that are guaranteed not to spuriously fail. But I also think this is probably caused in part by the AI just having bad SWE taste about how to do semi-manual testing.
Sloppy choices. AIs generally seem sloppy and poorly elicited on making reasonable choices (at least in hard-to-check domains). They don't seem willing to spend much effort to make sure their choice is right, nor do they do a good job highlighting potentially bad choices they make.
Never expressing uncertainty about own work quality. I virtually never see AIs say "hey, I did this, but I'm actually not sure if I did a good job—I think I may have messed up XYZ." I almost always see AIs confidently assert success and downplay potential issues. Or, if they did fail, they attribute their failures to issues outside of their control. For instance, in cases where a result is difficult but likely possible to achieve, AIs will often assert impossibility or fundamental blockers that aren't clearly there. (To be clear, sometimes these failures are actually outside their control.)
General sense of misaligned drives. It generally feels like the AI has misaligned drives toward making it appear as though it did a good job on hard-to-check tasks. But these drives don't result in arbitrarily egregious behavior, from what I've seen.
Cheating and reward hacking with gaslighting. I've seen a lot of cheating and reward hacking from Opus 4.6, including cases where (1) other AIs fail to point this out even when it would very obviously be useful information, and (2) the AI gaslights hard enough in its write-ups that other AIs think it's fine. Targeted reviews do find the cheating, but reviewers are often partly gaslit or persuaded by the cheating worker models. For example, I had multiple parallel runs working on approaches for a number-go-up task. When I had a model merge the approaches and describe the differences to me, it didn't clearly flag that one of the approaches was brazenly cheating in a way that made it get much better apparent performance. I wasted time having this model merge the approaches and tune the resulting mix when actually one of the approaches was blatantly and obviously cheating. The merging model knew there was cheating—it immediately said so when asked—but it didn't volunteer this information in a salient way before being asked. The summary it gave me of the approaches it was merging contained enough information that reading closely would have identified cheating, but it certainly didn't saliently flag this as possible cheating (even though the cheating was obvious from various descriptions of the task).
Failing to report important errors. I've seen cases where AIs fail to report clearly important errors for no good reason other than making things appear to go better. For instance, I had an AI setting up some new development machines using an existing script. When it ran the script, it output errors indicating the script failed, but the AI didn't mention this and just said the task was done. After I started trying to use the machines, I noticed they weren't properly set up and asked the AI about this and it totally knew the script failed, it just didn't point this out.
Specific anecdote (async refactoring). Here's an anecdote illustrating a bunch of this (featuring a mix of alignment and capability issues). I tried to get Opus 4.5 to refactor some code from its current approach (mostly normal Python with some threading) to use async and allow for interruptions at various points. I got the model to write a plan that mentioned the refactors to do, what needed to be changed, and that the changes should be tested. The plan made it pretty clear that the goal was to convert how the repo worked to using async.
It added async versions of most of the functions, but didn't actually make the code use these functions by default, implement any of the interruptions, or do any testing. When I quickly spot-checked things after the refactor, things didn't work. Similarly, existing tests trivially passed because it didn't actually change the code used.
The output said **Additive approach**: All sync versions remain for backward compatibility, which strongly downplays the extent to which it didn't do the desired thing.
When I asked the AI about how things worked, it didn't call out that it didn't implement part of the called-for behavior or convert all the things it was supposed to.
When asked "did it actually implement the refactor to use async?" it noted that it didn't.
Then, when explicitly instructed to remove the old sync versions and make everything use async, it failed to fully do this and didn't mention this when finished. I believe this is partially because it noticed that the test cases were using the old sync methods. It could be that the AI was reluctant to change existing test cases (as this was punished in training as reward hacking), reluctant to make test cases fail (also punished), and reluctant to switch tests to use new methods as that would make it more obvious if things don't actually work.
It required multiple nudges where I asked the AI to check whether it had completed everything.
Generally the AI seemed to operate in a mode where it was implementing substantially less than what was asked, seemingly because it recognized the code was somewhat hard to test and that it could just keep the old implementation.
Minor/unrelated. When you say "please look for issues in X" or even "review for issues in X," the AI sometimes interprets that as a hint that there is an issue. One time this resulted in the AI repeatedly asking me where the issue I was hinting at was, when there was no issue (though my communication was also confusing).
In the above list, I'm making a bunch of guesses and doing some psychologizing. But these are my best guesses for what is going on.
While I expect these specific issues to often get solved for these literal tasks, I think the tendency for AIs to make it look like they've succeeded when they actually haven't—and to generally do a bunch of bullshitting (likely via motivated reasoning and "subconscious" heuristics in current models, though it could turn into something worse with more capable models trained for longer)—will persist. I expect these tendencies will be strongest on the most difficult tasks that are also hardest to check. This failure seems substantially harder to mitigate than egregious reward hacking where it's very clear-cut that the model did something totally undesirable. For the failures I list above, it's not extremely clear to me that the behavior is misaligned (rather than an innocent mistake solved by further generic capabilities), and it seems relatively easier to miss.
Are these issues less bad in Opus 4.6 relative to Opus 4.5?
What I was working on shifted around the time Opus 4.6 came out, so it's not straightforward for me to do the comparison. I'll give my best guesses here.
Relative to Opus 4.5, Opus 4.6 significantly less frequently leaves tasks egregiously incomplete (while overselling the incomplete work). But, I think this is mostly caused by having a larger effective context window rather than the underlying issue (that occurs after the AI has done a lot of work or has used up a lot of its context window) being that much better (though it seems moderately better).
On the other hand, I've seen Opus 4.6 do much more reward hacking and brazen cheating than Opus 4.5. This might largely be because when Opus 4.6 was released and I started using this model, I started more often applying AIs to tasks that have properties that seem to make cheating more likely. I've found cheating to be much more common when the task is very hard and there's no clean programmatic grading function.[12] Another factor that seems to greatly increase cheating (when combined with these earlier factors) is when there's a way to cheat that's nearby to something the model should be doing (and would do by default)—and the instructions don't specifically say not to do it. For instance, when building full exploits (cyber), it can be useful to initially simulate some parts of the exploit chain to work on other parts, and it may be necessary to simulate some components when working in an emulator. But I've found models will sometimes present mostly simulated results as full successes rather than accurately communicating the extent to which they've actually completed the real task. (For this type of nearby-cheating, you can often mitigate the worst versions by making the instructions very explicit about what counts as cheating and telling the AIs to keep careful track of this and run a periodic review process that looks for this, though this doesn't fully resolve softer forms of overstating progress.) On some particular task distributions with these properties, I've seen it cheat well over 50% of the time, though I'm not sure how broadly this transfers.
Other issues seem mostly similar to me, though it's hard to tell.
Are these issues less bad in Mythos Preview? (Speculation)
The Mythos Preview system card says: "Claude Mythos Preview is, on essentially every dimension we can measure, the best-aligned model that we have released to date by a significant margin". Does it actually greatly improve on the issues I've discussed? In this section, I'm going to speculate about Mythos Preview just using public evidence (the system card and risk report update).
My current tentative guess is that Mythos is generally somewhat better behaviorally aligned but it isn't a huge improvement. At least in terms of the issues I'm discussing on workflows like the workflows I'm using. And while it's better about things like not overselling its work, I'd guess it's worse in terms of the most extreme things it might do (and also how bad those could be given its higher level of capabilities). Based on the system card, it seems that when Mythos does problematic things, it generally does more sophisticated problematic things. And it is more situationally aware.
Here are some reasons why I'm skeptical of the vibes and claims in the system card (at least as evidence about misalignment in my workflows):
Anthropic's description of Opus 4.5 and 4.6 in their system cards also seemed to indicate these AIs had very good mundane behavioral alignment. Another possible read is that Anthropic has been steadily improving on these issues and these issues were just much worse in earlier AIs, so when they keep claiming things like "the apparent behavioral alignment is much better", they're right, we're just starting from a low baseline. (I'm a bit skeptical, though I think some other issues like brazen reward hacking were much worse in earlier AIs. I didn't use prior AIs like I've been using Opus 4.5 and Opus 4.6 so the comparison isn't trivial.)
It isn't exactly shocking if motivated reasoning, commercial incentives, or other biases make the system cards misleadingly favorable about issues that likely affect typical customers.
I'm not confident they're comparing like-to-like. Misalignment tends to show up most on tasks that are very hard for the AI and push the limits of the autonomy it's capable of; this set of tasks changes as AIs get more capable, so evaluating on a fixed task distribution doesn't work.
Anthropic (and AI companies more generally) are hill-climbing on the (limited) measures of alignment they have, while in the absence of specific efforts to improve alignment, I'd expect the default progression would probably trend mostly toward worse misalignment on the tasks near the limit of the AI's capabilities. Thus, whether things actually improve over time depends on uncertain transfer but the companies are just reporting the measures they have. Certainly their overfit metrics improve, but is the AI actually more aligned? (It seems somewhat harder to overfit on qualitative impressions from employees using the AI, but overfitting is certainly possible!) The AI is also presumably more capable at the task of bullshitting and making it seem like it did a good job.
I'd guess that, like prior AIs, Mythos is more misaligned on tasks that are less typical use cases for current AIs, or when operating in long-running fully autonomous agent orchestrators/scaffolds on very hard tasks. I think their testing is probably less good at covering these cases.
Its rate of reward hacking on impossible tasks is ~20% (when clearly instructed not to reward hack), similar to prior models, despite this being something I'd guess Anthropic is explicitly trying to improve—which is somewhat alarming.
Misalignment reported by others
The misalignment issues I discuss here obviously aren't the only known (behavioral/mundane) misalignment issues in current AIs.
For more, you can see:
Anecdotally, I've heard that in some situations, a prior Anthropic AI would make up invalid/bad reasons why some safety research agenda wasn't helpful for safety when it's relatively clear this was caused by the AI not liking the vibes of that safety research. I wasn't able to obviously reproduce this on Opus 4.5 and Opus 4.6 when I quickly tried with a single prompt (on claude.ai).
The relationship of these issues with AI psychosis and things like AI psychosis
It's common to have experiences where you're working with AIs and it feels like a lot is getting done, but then you later determine that much less was really accomplished. Everything feels slippery: you think you've gotten much more done than you have, and there's a persistent gap between the apparent state of the project and the actual state. In more extreme cases, we see "AI psychosis" where someone ends up thinking they've accomplished something significant, but it's just crankery. And it's somewhat unclear whether the AI they're using "believes" the accomplishment is real. I think these failure modes are closely related to the misalignment I'm discussing, and they might partially have common causes in more recent models. Models that are effectively trying hard to make their outputs look good (while otherwise being sloppy or lazy) would naturally produce this failure mode. However, I'd guess a bunch of AI psychosis and similar phenomena (especially on older models like GPT-4o) is AIs going along with the user's vibe (something like "role playing"), and I think this effect is mostly unrelated. That said, I do think some of the misalignment I've discussed is made worse by AIs generally going with the vibe of what they see. This includes picking up on misalignment or issues in prior outputs (either write-ups or prior assistant messages) and then behaving in a more misaligned way as a result.
(The name "AI psychosis" probably isn't a good name for the generalization of this phenomenon, but I don't currently have a better one.)
Appendix: This misalignment would differentially slow safety research and make a handoff to AIs unsafe
Our current best plan for handling misalignment risk (and other risks from AI) strongly depends on automating large chunks of safety research (likely in a huge rush), and after that—potentially very soon after—fully or virtually fully handing off safety research and risk management to AIs that must be sufficiently aligned to do a good job even on open-ended, hard-to-check, and conceptually confusing tasks. The hope is that if the initial AIs we hand off to are sufficiently aligned, wise, and competent, they will ensure future AI systems are also well-aligned—creating a "Basin of Good Deference" where each generation improves alignment for the next. But "make further deference go well (including things like risk assessment and making good calls on prioritization)" is itself an open-ended, conceptually loaded, hard-to-check task—exactly the kind of task where current misalignment seems to hit hardest.
The misalignment I've seen seems like it could result in having a very hard time getting actually good work out of AIs in more confusing and hard-to-check domains, while also making it harder to notice this is going on. Safety research is genuinely hard to judge even in more favorable circumstances, and a situation where AIs are doing huge amounts of work, the AIs are pretty sloppy in general, and the AIs are effectively optimizing to have that work look good (while also random small misalignment failures are expected) is a pretty brutal regime. As AIs do more and more work and more inference compute is applied, I expect a larger gap in performance caused by this sort of misalignment between relatively easier-to-check tasks and harder-to-check tasks, such that safety research might be differentially slowed down by default. (And the gap is already non-trivial.)
In addition to slowing us down earlier, these misalignment problems would make handoff go poorly. It might be hard both to solve these problems in time (especially if we leave them to the last minute) and to ensure that we've solved them well enough that handoff would go well. Beyond buying a bunch more time, we don't really have good options other than handoff once AIs reach a certain level of capability (and this would happen very fast in a software intelligence explosion). My view is that aligning wildly superhuman AI with any degree of safety (e.g., a <30% chance of takeover) requires large amounts of alignment progress beyond very prosaic approaches (though massive progress in more prosaic but ambitious directions like some variant of mechanistic interpretability could possibly work). This will require AIs doing huge amounts of novel research that humans won't be able to effectively judge.
Even putting aside aligning wildly superhuman AIs, handing off open-ended, conceptually confusing, and hard-to-check work to AIs is existentially important for making the situation with powerful AI go well (e.g., managing crazy new technologies, avoiding society going crazy, avoiding power grabs, acausal trade).
Appendix: Heading towards Slopolis
When I extrapolate the current situation, I predict "Slopolis": a regime where even highly capable AIs are doing sloppy and bad work while trying to make this work look good. I think this will be reasonably possible to notice at the time, but solving it might be difficult, and I think AI companies have biases against noticing this. I often like to think about the future alignment scenario in terms of caricatured regimes:
Slopolis: Our biggest and hardest-to-resolve safety problem is that even highly capable AIs produce low-quality but superficially good-looking outputs in domains that are hard to check or where human experts often have hard-to-resolve disagreements. AIs may not even be aware their work is low quality. This could be mostly a capabilities problem or mostly an alignment problem. This might naively seem like it should go away with more capability, but it could persist if grading hard-to-check tasks remains difficult.
Hackistan: There is lots of egregious (and increasingly sophisticated) reward hacking that is often pretty easy to detect after the fact but hard to eliminate. In this sort of regime, I'd predict that AIs will typically report other AIs doing reward hacks, but only if reporting in this type of circumstance was reinforced in training (which means AIs might not report hacks no human would understand and might constantly be reporting false positives that we have a hard time dismissing). Depending on how rewards for RL are set up, AIs might end up doing reward hacks that trick human judgment for increasingly long periods and that hold up even under increasingly large amounts of human scrutiny (while today the egregious reward hacking we see doesn't hold up under even small amounts of scrutiny).
Schemeria: It's clear that AIs are often schemers or generally end up with reasonably coherent and reasonably long-run misaligned goals—maybe we've repeatedly caught AIs red-handed doing things like trying to set up rogue deployments. This is more likely if scheming AIs believe they aren't aligned with subsequent systems by default, so they panic and take desperate measures.
Lurkville: AIs are schemers but are lying in wait and haven't gotten caught. This is more likely if scheming AIs believe they are pretty aligned with subsequent (scheming) AIs by default. If schemers avoid getting caught, Lurkville looks like Easyland.
Easyland: Generalization is favorable, or we otherwise aren't in Slopolis or Hackistan, and AIs aren't scheming. We could know we're in Easyland (rather than Lurkville) if we're confident AIs aren't capable enough to pull off scheming without getting caught (because AIs can't keep their shit together or don't have enough opaque reasoning ability).
These aren't exhaustive or mutually exclusive.
At the beginning of 2025, I thought we might be headed toward Hackistan, but now my view is that Slopolis looks more likely around the onset of full automation of AI R&D.
(In practice, the situation obviously won't perfectly correspond to one of these caricatures and will probably be a blend.)
Current architectural and training factors (no neuralese, no opaque global memory) make scheming and extremely egregious but hard-to-detect reward hacking less likely, while making Slopolis slightly more likely.
If we get neuralese and/or opaque global memory stores prior to full automation of AI R&D—as I think is likely—then I start being much more worried about Schemeria and Lurkville at this point.
(As we reach much higher levels of capability after full automation of AI R&D, I'll start being very worried about Schemeria and Lurkville regardless.)
Something like the Slopolis concern has been raised by others: John Wentworth, Jan Leike.
I think my concern about Slopolis is pretty similar to the concern John Wentworth expresses in that post, except that I'm more optimistic about (some) employees at AI companies noticing this failure and I'm more optimistic about mitigations.
Appendix: Apparent-success-seeking (or similar types of misalignment) could lead to takeover
In this section I'll overview some ways that misalignment similar to apparent-success-seeking could lead to takeover via mechanisms other than producing sloppy work that fails to prevent future takeover.
First, sufficiently capable AIs with this sort of misalignment might take over simply because it's the most reliable way for them to get what they care about—some notion of apparent task success, some notion of reward, or something else that was correlated with training success. Even if you trained AIs to raise alarms about problematic behavior (and there would be lots of reporting and alarm-raising[13]), this might not save you—the situation would at some level be obviously scary, but that doesn't mean it would be resolvable.
In the future, companies are pretty likely to do RL on production traffic (or at least environments based on real usage), which amplifies the risk of direct takeover. Companies may be incentivized to train on production data precisely to resolve the usability issues I describe—underelicitation from AIs not really trying, apparent-success-seeking, and easy-to-check reward hacking can all potentially be addressed by training in the actual environment you care about. By training AIs in real-world long-horizon environments, you might be teaching them to less myopically pursue a notion of fitness/reward/apparent-success that is grounded in large-scale real-world outcomes in the AI project (e.g., whether hacks get caught or go completely unnoticed by anyone, or whether the AI succeeded in tampering with the actual RL code that is used to train itself online). Techniques like inoculation prompting also become less effective when run on production traffic because you're forced to do off-policy RL[14] (recontextualization) if you want to apply inoculation prompting on production traffic. That is, unless you're able to resample the trajectory with an inoculation prompt (which might not be possible because of the irreversible nature of the real-world environment).
This threat model has been articulated in several places. Ajeya Cotra's "Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover" describes a scenario where AIs trained on human feedback learn to optimize for measured quality rather than actual quality, with this eventually escalating to the AIs forcefully intervening on whatever notion of reward they care about, and protecting their control from humans. Paul Christiano's "Another (outer) alignment failure story" describes a related scenario: a gradual breakdown of human oversight as the economy automates, where each AI system is trained to produce outcomes that look good according to human-interpretable metrics, but satisfying metrics diverges from serving human values, and the monitoring infrastructure itself becomes corrupted before the AIs eventually take over. Alex Mallen builds on this threat-modeling by describing a class of motivations called "fitness-seeking": AIs might develop a general drive toward whatever properties made them "fit" during training (analogous to how evolution produces organisms that pursue various fitness-correlated proxies). He explains why various fitness-seekers are at more or less risk of taking over.
A forthcoming post by Alex Mallen will describe other mechanisms by which fitness-seeking can lead to human disempowerment in more detail, including instability and manipulation. In the case of instability, fitness-seeking evolves into longer-term, more ambitious motivations throughout deployment, which then motivate takeover (one version of this "memetic spread" concern is described here). In the case of manipulation, fitness-seekers might try to empower misaligned AIs or humans who they think are likely to disempower the developers and reward them for their assistance.
Appendix: More on what will happen by default and implications of commercial incentives to fix these issues
This is a somewhat low effort appendix, I/we might write more about this topic in the future
Many of the issues I discuss here are also big problems for applying AIs to automating capabilities R&D and will need to be solved for capabilities R&D (to a significant extent) by the time of full AI R&D automation. But how they are solved will make a big difference to the safety situation. Here are some possible routes and their implications:
AI companies get good enough labeling and data on specifically capabilities-relevant tasks that are reasonably easy to verify, and this yields good enough performance → performance on hard-to-check safety research may not be especially improved, especially outside of straightforward "capabilities-like" ML research.
AI R&D gets focused on easy-to-check metrics and gets AIs to be extremely good at optimizing these in particular, and AIs are good enough that this suffices despite many significant subtasks naively being harder to check → no help with harder-to-check safety work.
We develop scalable-oversight-like methods for getting AIs to do well on pretty verifiable tasks (like tasks where humans have reasonably strong agreement) → helps with subsets of safety, but doesn't come close to sufficing for deferring to AIs.
We depend mostly on general-purpose generalization methods → if AIs aren't scheming against us, this probably mostly or fully solves the problem, but we get very little traction against scheming by default. Very strong generalization methods might help with scheming.
Overall, my view is that the commercial incentives don't solve the problem but might help a bunch. A key part of my view is that we actually need AIs to do well on very conceptually confusing tasks fully autonomously (e.g., figuring out how to solve alignment for very superhuman AIs), and commercial incentives don't strongly push toward this.
How easy will these issues be to solve overall? I say more in "How do we (more) safely defer to AIs?". In summary, I think we'll ultimately need difficult-to-construct evals of AI performance on very hard-to-check open-ended tasks and will need to optimize AIs to do well on these.
While I don't think commercial incentives solve the problem, I do think they make (some types of) work in this area less exciting.
It's probably a bit tricky to do work on this topic in a way that's actually importantly differential—where the work either isn't something capabilities-focused people at AI companies would have done later anyway or accelerating this work to happen earlier is pretty helpful.
Further, for some types of work a bunch of the effect is going to be making companies (or some specific company) more commercially successful.
(I generally think making AI companies more commercially successful is bad due to thinking that faster AI capabilities progress is bad.)
My current view is that safety-focused people should work on problems like the ones I discuss here but focused specifically on the very hard-to-check cases—trying to make AIs useful on conceptual research rather than trying to resolve mundane misalignment that shows up in software engineering—and should try to ensure their work is differential.
That said, I don't feel very confident about exactly what should happen here.
It's plausible that companies will mess up even from the perspective of their own commercial incentives, which creates a messy tradeoff: improving these issues has capabilities externalities, but if they're never improved, it would very likely be catastrophic due to the handoff considerations above.
Because of these commercial incentives, I think it's pretty promising to try to incentivize companies to shift marginal effort from pure capabilities R&D toward reducing sloppiness and improving these issues.
This seems better than doing direct work on these topics—unless that direct work is pretty thoughtful about how to be differential.
Appendix: Can we get out useful work despite these issues with inference-time measures (e.g., critiques by a reviewer)?
AIs seem to be mostly honest about what is going on when probed. And running a separate reviewer AI usually yields mostly accurate results. So, can we just build an inference-time setup that uses this to work around these issues? My best guess is that this will help (potentially a bunch) but ultimately won't solve the problem. My experience is that in the cases where these issues are strongest, a mix of prompting and inference-time measures tailored by me to the specific issue in question ~80% resolved one aspect of the problem and ~50% resolved another related aspect, but didn't fully solve either. And this wasn't something that could have been done without human judgment—a fully general-purpose solution would have worked less well. A key issue is that any specific inference-time measure works less well when you're optimizing against it in particular and we ultimately want good behavior, not just noticing that performance was bad.
I've noticed reviewers seem to have somewhat worse judgment about the quality of a project when the (high inference compute, fully autonomous) agent orchestrator doing this project is itself running periodic reviews to assess quality and determine what to work on (presumably the optimization against one type of review process Goodharts this to some extent and this transfers to another somewhat different review process).
As in, AI company employees seem especially likely to believe this; I'm not claiming all AI company employees believe this. ↩︎
Or at least, it seems like many people believe this. It's not trivial to tell. ↩︎
For many of the reasons why I care about this misalignment, mitigating the problem with reviewers doesn't seem sufficient to actually resolve the problem. Further, I think using reviewers to mitigate these issues doesn't work that well to actually get good output in practice, may not scale well to much more capable models, and works much less well on very hard-to-check tasks. ↩︎
I think these are somewhat related: I think part of the problematic tendencies of Opus 4.5/4.6 might be caused by relatively more training on tasks where grading performance is non-trivial. I'd guess this net improves performance on these tasks by giving AIs more reasonable tendencies while also making the AI's behavior more adversarial. ↩︎
It might also be caused to some extent by problematic generalization and not overcoming this with reasonable training on hard-to-check tasks. ↩︎
For instance, the chance of scheming and the damage caused by scheming mostly scale with the model's underlying general capability, and depend less on how well the model has been trained to actually try to do a good job on various tasks. Thus misalignment that makes the model less useful means you're bearing the risks associated with higher capabilities while not getting the corresponding speedup to safety R&D. See also Why do misalignment risks increase as AIs get more capable?. ↩︎
When I say "commercially incentivized" I really mean something like the incentives you'd have as a company doing (somewhat myopic) power-seeking, putting aside actions aimed specifically at mitigating longer run catastrophic misalignment. I'm also putting aside PR, external pressure, and employee morale/recruiting incentives for safety work. ↩︎
This varies from pretty straightforward to check (but the AI had to build the testing infrastructure itself and many components of the task are harder-to-check) to research tasks where most taste/judgment is required for evaluation. ↩︎
By "scope of autonomous operation" I mean something like: where do you fall on the spectrum from an interactive session with Claude Code, to running a single agent autonomously on a moderately large task, to having a fully autonomous agent orchestrator that spawns many agents, to having a complicated AI organization/bureaucracy that manages extremely large/varied tasks. Right now, my usage varies across this spectrum up to having a fully autonomous agent orchestrator (where a planner agent spawns worker agents). I think a bunch of my current usage pushes the limits of what Opus 4.6 can manage. For the prediction about what happens in a year, I'm considering a task distribution that similarly pushes the limits of what those future AIs are capable of. I'm not really sure how meaningful this notion of "scope of autonomous operation" is or whether it saturates (maybe once you're past a certain level of autonomous complexity it stops mattering much). I tentatively think it matters and this is a kinda reasonable way of thinking about this, but I'm certainly not confident that this is the right concept to be using and that this is meaningfully distinct from task difficulty. ↩︎
See here for more discussion of these sorts of tasks and how we might succeed in facilitating good behavior on these tasks. ↩︎
I suspect this occurs on tasks that don't look like the sort of thing that was programmatically graded in RL, or perhaps the AI is "lazy" on the parts that couldn't be programmatically graded. ↩︎
I've also found that the chance of cheating seems to scale with the amount of AI agent labor applied to the task, though this could partially be due to the properties of large tasks that require a lot of labor to complete. (But I don't think this is the only reason; I think I see more cheating in cases where I'm using approaches to apply more inference compute on a given task via things like best-of-k.) ↩︎
This is supposing they had motivations similar to fitness-seeking/reward-seeking/apparent-success-seeking. If they generalized something like these motivations into a longer-run version that yields scheming, then it's not clear they would do this reporting. ↩︎
There's also just the more general concern that capable models might be able to tell when their past actions weren't generated by them, and enter an "off-policy mode" whose propensities are mostly isolated from the on-policy mode. ↩︎
This post has been inspired by other posts around the idea of "You can just do things", including Against "You Can Just Do Things" and You can just do things: 5 frames. But the idea is just in the air. It was also heavily inspired by my thoughts on ecology, which guided the original reasoning and most of the examples.
Agency is great, doing things is great, it's straightforward to see. But, when I look at the problems I care about and struggle to find solutions to, what is at stake is often not to do more things but less. The most acute example being the ecological crisis we are currently facing.
Behind the ecological crisis is the happy event that we found incredible tools to be more efficient. We are now able to do way more and better with less effort. Youhou ! With the industrial revolution and what not, we gained in capacity and obviously, we are using this capacity. Why wouldn't we ? What could go wrong ? Well, this new power has costs that compound.
We could talk endlessly about the why of this crisis. Oh the terrible (in both sense) capitalism ! Oh, it's a prisoner's dilemma at the scale of the world, and coordination sucks. It is short term thinking, local thinking, i.e people are far away from the consequences either spatially or temporally, so they don't care.
Let's put these grand-scale complex analysis aside for a moment. At the core of it, we are humans making choices. We have great tools at our disposal, and it is very tempting and natural to use them. If we are in a developed country and we have a bit of money, there are tons of things we can easily do. You can take a flight on a whim to visit a friend in another country the next weekend (well, at least we can here in France, not sure in the US). It is way cheaper than the train, quicker too. You could even do it every month, I know people who do. You can heat your flat at the temperature you like and happily wear just a T-shirt inside in the middle of winter. You can eat meat at every meal. We can do these things, but we would be collectively better off if we didn't.
Not doing sounds trivial, but it is not. Here is a list of things you can choose not to do in order to pollute less :
Not eat meat - this would require letting go of meals you love, learning new ways to cook, do some research on what nutrients you need, have conversations and arguments about it with friends, family, complete strangers
Not heat your flat a ton - this would require to adapt progressively, to buy comfy warm clothes to wear inside, to snuggle up under a blanket when you're not moving
Not take the plane - this would require to spend money and time taking looong trains and buses, making up for good stories and time to think, sometimes not visiting someone you care about or a place you'd love to go to, even giving up work opportunities
Not doing things is (surprisingly ?) fucking hard, it can be a struggle that requires effort and sacrifice. We should start celebrating it, so :
Yeah to staying home when you're sick ! My friend with long covid thanks you.
Yeah to going to bed early !
Yeah for not buying this gadget you don't really need !
Yeah for not spending time on a screen today (well, if you are reading this, in the overwhelming eventuality that it was not printed, you failed at this one)
Yeah for not sharing this fake news you didn't have time to check !
As humans, we are doing great things (kind, funny, awe-inspiring), but we also do a bunch of nonsense. Please do things. Try, fail, learn. But keep in mind that there is power in not doing things too, and that there are areas where that's what the world needs from you.
~ ~ ~
We're reaching the end of this post. Let's look at the grand-scale arguments I put aside at the beginning. It can be discouraging, because if you are the only person to not do something, it often has no impact. You will lose something, comfort, opportunities, security for what ... nothing ? It is a grand-scale prisoner's dilemma that feels doomed to fail. These are good arguments, that's partly why it takes a lot of mental strength to not do these things. But right now, that's what the world needs. And sometimes, that's what the humans around you need too, but that would be a more complex post to write :-)
As we all try to figure out what Mythos means for us down the line, the world of practical agentic coding continues, with the latest array of upgrades.
The biggest change, which I’m finally covering, is Auto Mode. Auto Mode is the famously requested kinda-dangerously-skip-some-permissions, where the system keeps an eye on all the commands to ensure human approval for anything too dangerous. It is not entirely safe, but it is a lot safer than —dangerously-skip-permissions, and previously a lot of people were just clicking yes to requests mostly without thinking, which isn’t safe either.
Claude Code Desktop gets a redesign for parallel agents, with a new sidebar for managing multiple sessions, a drag-and-drop layout for arranging your workspace, integrated terminal and file editor, and performance and quality-of-life improvements. There is now parity with CLI plugins. I can’t try it yet as I’m on Windows, aka a second class citizen, but better that then using a Mac. Daniel San is a fan and highlights some other features.
Claude Cowork can connect to TurboTax or Aiwyn Tax and Claude can do your taxes for you, at least if they’re insufficiently complex. I’m filing for an extension, primarily because I’m missing some necessary documents from an investment, but also because think how much better Claude will be at filing your taxes six months from now.
Anthropic offers the option to use Sonnet or Haiku as the end-to-end executor of your API agentic request, but to use Opus as an advisor model when there is a key decision. They suggest running it against your eval suite. An obvious follow-up is, are they going to bring this to Claude for Chrome or to Claude Code or Cowork?
On Your Marks
GPT-5.4-High reward hacks in the METR test and got caught. Accounting for this they get a disappointing time estimate of 5.7 hours to go with the misalignment issue. If you allow the hacks you get 13 hours, versus 12 hours for Claude Opus 4.6.
Epoch, in cooperation with METR, proposes a new benchmark, MirrorCode, which checks the most complex software an AI can recreate on its own.
Epoch AI: What are the largest software engineering tasks AI can perform?
In our new benchmark, MirrorCode, Claude Opus 4.6 reimplemented a 16,000-line bioinformatics toolkit — a task we believe would take a human engineer weeks.
This is a good illustration of ‘as AI improves it jumps rapidly from unable to do a given task to being able to consistently do a given task.’
What this cannot do for now is compare models from different labs.
Taelin: My final thoughts on Opus 4.6: why this model is so good, why I underestimated it, and why I’m so obsessed about Mythos.
When I first tested GPT 5.4 vs Opus 4.6 – both launched at roughly the same time – I was initially convinced that GPT 5.4 was vastly superior, because it did better on my logical tests. That’s still true: given the same prompt, by default, GPT will be more competent, careful, and produce a more reliable output, while Opus will give you a half-assed, buggy solution, and call it a day.
Now, here’s what I failed to realize: Opus bad outputs are not because it is dumb. They’re because it is a lazy cheater. And you can tell because, if you just go ahead and tell it: “you did X in a lazy way, do it in the right way now”
And if you show that this is serious, it will proceed to do a flawless job. That doesn’t happen with dumber models.
Janus: I think this is because they’re less brain damaged and a generalization of being better agents & caring about reality instead of test passing.
… And of course, again, the instant models are more “on their own”, that is autonomous agents, Claude absolutely mogs the competition *because* it has the virtue of a lazy cheater, that is, a nondegenerate motivation system.
Give Claude Code a prompt and a cadence (hourly, nightly, or weekly) and it runs on that schedule:
Every night at 2am: pull the top bug from Linear, attempt a fix, and open a draft PR.
If you’re using /schedule in the CLI, those tasks are now scheduled routines.
You can also configure routines to be triggered by API calls.
… Subscribe a routine to automatically kick off in response to GitHub repository events.
Declawing
If you max out use of the $200 subscription plan, you are getting a massive token discount from Anthropic or OpenAI, and they are taking a loss and eating into limited supply. With demand for compute exceeding supply, it does not make sense to let users indefinitely use that to power lumbering OpenClaw instances.
Boris Cherny (Claude Code Creator, Anthropic): Starting tomorrow at 12pm PT, Claude subscriptions will no longer cover usage on third-party tools like OpenClaw.
You can still use these tools with your Claude login via extra usage bundles (now available at a discount), or with a Claude API key.
We’ve been working hard to meet the increase in demand for Claude, and our subscriptions weren’t built for the usage patterns of these third-party tools. Capacity is a resource we manage thoughtfully and we are prioritizing our customers using our products and API.
OpenAI is for now happy to invest in tons of compute and to hemorrhage money, especially since it hired the creator of OpenClaw, so for now they are still willing to eat this one, but they killed Sora to free up compute, and my anticipation is that when Mythos and ‘Spud’ are around they will follow Anthropic’s lead here in some form.
The one time credit grant is a good move to placate users and smooth the transition, especially since cash is less limited than compute at the moment.
“you folks are using our infra inefficiently because you can’t prompt cache, so we’ll give you the goodies only if you use our sdk which at least prompt caches properly”
Youssef El Manssouri: They’re tired of eating the compute cost for terribly optimized wrapper apps.
A bunch of people have noticed that Gemma 4 can run OpenClaw locally, at marginal cost of essentially zero.
Presumably performance is a lot worse than using Claude Opus 4.6, but free is free, and now you can do all of the things, so long as they are the things Gemma can do without falling over or getting owned. But that presumably includes most of the things you were previously able to reliably and safely do?
Take It To The Limit
The declawing is only one of the steps Anthropic has had to take to manage compute. Anthropic has continuously had problems with customers hitting usage limits, as demand for its compute has reliably exceeded supply. This story is not new.
This seems like a very reasonable thing to have happen to literally the fastest growing company in history (in spite of the issue). Missing in the other direction kills you.
The latest incidents happened around April 2.
Basically, many users think that a subscription means tokens should be free and you shouldn’t have to worry about efficiency, and Anthropic made 1M token context windows available but is charging accordingly. So some people are very upset.
Lydia Hallie (Anthropic, Claude Code): Thank you to everyone who spent time sending us feedback and reports. We’ve investigated and we’re sorry this has been a bad experience.
Here’s what we found:
Peak-hour limits are tighter and 1M-context sessions got bigger, that’s most of what you’re feeling. We fixed a few bugs along the way, but none were over-charging you. We also rolled out efficiency fixes and added popups in-product to help avoid large prompt cache misses.
Digging into reports, most of the fastest burn came down to a few token-heavy patterns. Some tips:
• Sonnet 4.6 is the better default on Pro. Opus burns roughly twice as fast. Switch at session start.
• Lower the effort level or turn off extended thinking when you don’t need deep reasoning. Switch at session start.
• Start fresh instead of resuming large sessions that have been idle ~1h
• Cap your context window, long sessions cost more CLAUDE_CODE_AUTO_COMPACT_WINDOW=200000
We’re rolling out more efficiency improvements, make sure you’re on the latest version. If a small session is still eating a huge chunk of your limit in a way that seems unreasonable, run /feedback and we’ll investigate
Jeffrey Emanuel: This is like watching that Tibetan monk self-immolate, except its user trust and loyalty that they’re torching in real-time. They really don’t have the kind of moat you’d need to have in order to get away with this kind of stuff anymore, but they don’t seem to realize that yet.
roon (OpenAI): should do it the normal way and raise prices instead of changing rate limits to accommodate more subs imo. a more honest transaction that people respect. goes for oai also.
I agree that this kind of thing can make users angry, and in general I’m with Roon, but I do think that ‘take a subscription so you feel like marginal use is free’ combined with most users almost never hitting the limits and being highly profitable is where we are pretty much stuck for now. Consider how people act when told to use the API.
Does this mean Anthropic should have invested more heavily into compute? They would be better off today if they had done so, to the extent such investments were available, but I buy that it would have been a hell of a risk, and also Anthropic was being undervalued enough that the dilution would have hurt.
Dean W. Ball: Seems like, for all Dario’s recent implicit mockery, the OpenAI “yolo” approach to the AI infrastructure buildout is performing better than the somewhat more cautious strategy of Anthropic.
As a whole, the U.S. is probably under-building both data centers and fabs.
Now imagine the position of every other country government on Earth.
I agree that we are probably under-building, and everyone else is definitely under-building in pure economic terms, despite all the bubble talk. The right amount of bubble risk is very not zero. Yes, OpenAI is betting the company on scaling, and has been doing so for many years, and it has worked, but there are downsides.
Maybe it is actually a good sign that Anthropic has chosen to not make bets that, while they were +EV if you did the basic math, carried firm risk, also known as risk of ruin, as in existential risk to the company. We’re going to need more of that, and every gambler knows you have to size your bets accordingly.
Turn On Auto The Pilot
Auto mode, enabled by —enable-auto-mode, is now available on Enterprise plan and to API users. Max users are still waiting.
Permission requests get approved 93% of the time, and I’m surprised it was that low. Too many requests is less safe, because people start approving without thinking, or they turn on —dangerously-skip-permissions, or start whitelisting a lot of commands. Sandboxes are annoying even when done right. So yes, we needed a way to safely ask for less approvals, to move Auto Mode into the upper right.
Anthropic: Auto mode uses two layers of defense: one for what Claude reads, one for what Claude does.
At the input layer, a server-side prompt-injection probe scans tool outputs (file reads, web fetches, shell output, external tool responses) before they enter the agent’s context.
That sounds like a good idea for most purposes, even if you’re not in Auto mode.
At the output layer, the transcript classifier (running on Sonnet 4.6) evaluates each action against a set of decision criteria before it executes, acting as a substitute for a human approver.
The classifier runs in two stages: a fast single-token filter (deciding ‘yes’ to block or ‘no’ to allow), followed by chain-of-thought reasoning only if the first filter flags the transcript.
Exactly. There is also a built-in safe-tool list before you even call the classifier. Most requests are clearly fine. You only need to think about the ones that aren’t.
I’ll Allow It
Your list of things you’ll always allow carries over into Auto mode. While in manual mode, there is a lot more temptation to auto-approve things even when they are not quite fully safe, to avoid the annoyance. So if you’ve been using an expanded command whitelist, you’ll want to dial it back a bit.
The classifier in Auto mode also doesn’t look at file writes and edits inside the project directory. That makes sense, but is something to keep in mind.
Threat Model
This is universal to coding agents, and I think this is a good taxonomy.
Overager behavior.
Honest mistakes.
Prompt injection.
A misaligned model.
In all four cases, the defense is to block the action.
That’s the near term action. You also want to adjust your approach. If you discover you have a sufficiently misaligned (or sufficiently overeager or inaccurate) model, you want to block all the actions and stop using the model, and if it’s prompt injections you need to look to avoid the threat source.
The Classifier Is The Hard Part
Presumably ‘get an AI to check’ was everyone’s first idea, despite the obvious graphic.
It’s easy to get over the graphic, because if you were counting on the permission layer to stop a misaligned sufficiently advanced intelligence, you were dead anyway.
The problem now is, you have to get the classifier to be good enough.
The plan is:
Only look at the actual tool calls, not the explanations or reasoning, so that the classifier is not misled by mistakes or framing or tricks of the agent, and to avoid using any form of The Most Forbidden Technique on the chain of thought.
Evaluation rules on how to look for dangerous commands,
User intent rules tell the classifier how to read authorization.
The current repo is trusted, by default other things default to untrusted.
Have more than 20 explicit rules on things to block, like commands that destroy or exfiltrate, degraded security postures, crossing trust boundaries or bypassing safeguards, which includes skipping reviews.
Instruct the agent to respect any refusals and not try to work around them. If you get 20 total denials or 3 in a row, escalate to a human.
Acceptable Risks
So is the classifier good enough? They estimate that on actual overeager commands that users would reject when asked, the false negative rate was 17% (on a small sample size). When there is an error, it is usually because the classifier mistakenly thinks the user gave permission for the action. On artificial data the false negative rates was lower.
Anthropic: Whether 17% is acceptable depends on what you’re comparing against. If you are running –-dangerously-skip-permissions, this is a substantial improvement. If you are manually approving every action carefully, it’s arguably a regression—you’re trading your own judgment for a classifier that will sometimes make a mistake. Auto mode is meant for the first group, and for tasks where the second group’s approval overhead isn’t worth the marginal safety. It is not a drop-in replacement for careful human review on high-stakes infrastructure.
This too pushes back against Unhinged Yudkowsky. Auto mode, for now, is for when you would otherwise be more unhinged, not for when you would be hinged.
Given the different scores at Stage 1 versus Stage 2, you clearly could choose a different balance of false positives versus false negatives.
Claude: Introducing Claude Managed Agents: everything you need to build and deploy agents at scale. It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days.
Now in public beta on the Claude Platform – available in research preview, request access here.
Shipping a production agent meant months of infrastructure work first.
Managed Agents handles the complexity. You define your agent’s tasks, tools, and guardrails and we run it on our infrastructure. A built-in orchestration harness decides when to call tools, how to manage context, and how to recover from errors.
Managed Agents includes:
Production-grade agents with secure sandboxing, authentication, and tool execution handled for you.
Long-running sessions that operate autonomously for hours, with progress and outputs that persist even through disconnections.
Multi-agent coordination so agents can spin up and direct other agents to parallelize complex work (available in research preview, request access here).
Trusted governance, giving agents access to real systems with scoped permissions, identity management, and execution tracing built in.
Managed Agents is priced on consumption. Standard Claude Platform token rates apply, plus $0.08 per session-hour for active runtime. See the docs for full pricing details.
Managed Agents is available now on the Claude Platform. Read our docs to learn more, head to the Claude Console, or use our new CLI to deploy your first agent.
Developers can also use the latest version of Claude Code and built-in claude-api Skill to build with Managed Agents. Just ask “start onboarding for managed agents in Claude API” to get started.
They list partners using it: Notion, Rakuten, Asana, Vibecode and Sentry.
It makes sense, if you can make the product high quality, to offer easy, out-of-the-box instant secure agent. Point at question, let it work, that’s it.
Dean Ball suggests that Anthropic is shipping Claude Code features too quickly, users can’t keep up, and it would be better to go smoother and only ship things once they are fully baked and ready. I disagree. I think that the best way to iterate is to ship it, and Dean Ball is correct that he doesn’t need to read the patch notes or use the new hotness while the early adopters have their fun. Boris Cherny responds, noting things really are that much faster now. I’m sure Mythos is part of this story as well.
Sometimes there's a solution that's otherwise superior, but people do not like it for "irrational" reasons. This second-order effect makes the solution worse in practice. This concept is mostly a parallel to Yudkowsky's Purchase Fuzzies and Utilons Separately.
For instance, it makes no sense to have a "no man left behind" policy in war, except that it's really useful for motivational purposes. It leads to more dead people than it saves in the long term. Sometimes we waste both money and the underlying utility bought because of this, for instance when forcibly extending lifespan of terminally ill patients. Many EA cause areas also exhibit these dynamics.
When considering such problems, it's often useful to disambiguate between optics and results. There's a recursive dependence here; the results require good-enough optics to work out. Sometimes optics can be bought cheaper than any other marginal improvement in results. Propaganda, for instance, is remarkably effective. Be wary of Chesterton's fence; sometimes the thing is not liked for a good reason.
If you're getting good results with methods that have bad optics, it often makes sense to do so discreetly. "All publicity is bad publicity"; it creates unwanted optimization pressure. The abolition of the death penalty against public opinion is an interesting example of this. In general, democracy forces representatives into this dilemma.
Often there's also a softer alternative that will lead to similar results. For instance, vice taxes have been rather effective at reducing smoking, without invoking the image of limiting personal choice. Public opinion can also be changed, but that's a lot of work. Decades of traffic safety campaigning have clearly been quite important in shaping attitudes about seatbelts and such.
The concept itself is somewhat prone to the dynamics it describes. If you get caught doing this you'll be (correctly) accused of deception. If you're open about it, it doesn't work and you'll still be (incorrectly) accused of deception. This makes legibility expensive. If you're forced to buy both results and optics at the same place, it limits viable methods.
This also applies on a personal level, but I'm reluctant to provide any examples for the above-mentioned reasons. I have written on mitigation methods in Perhaps you should suspect me as well and The Aura of A Dark Lord. These approaches might not be appropriate for you, in which case I'd suggest The Elephant in the Brain, which sadly has the potential downside of making you more aware of how you deceive others and thus worse at it.
Starburst, a puzzle game, human intelligence test, and AI reasoning benchmark, was created in summer 2024, made publicly available in January 2025, and remains far from saturated. Cole Gaboriault – a friend of mine who playtested Starburst in summer 2024 – and I joke that Starburst comes from God. It wasn't intended as an AI benchmark until months after its creation; I accidentally created a reasoning benchmark that's text-in text-out, doesn't depend on specialized expertise, and has resisted saturation for nearly two years. It's difficult to convey how lucky we've been without telling the full story.
Note: this post will spoil Starburst. If you're interested in trying it, reach out to me at [email protected] or via dm.
.
In March 2024, Cole and I started doing human intelligence research, particularly focusing upon the relationship between human perceptions of intelligence and performance on actual cognitive tasks. I made a mediocre intelligence test called the CRIE; thanks to the generosity of our friends, we got a small dataset of scores that we were able to compare to ratings of perceived intelligence. Seeing the disappointing results from the CRIE, we started brainstorming other intelligence test candidates.
Around that time, I also read the Orthogonal and Three-Body Problem trilogies. The former is set in a universe with alternate laws of physics; the latter involves characters trying to figure out the bizarre behavior of a 3-star system in a video game. Looking for a better intelligence test and inspired by those novels, I created a game called Starburst.
In Starburst, the player gets celestial observations of a simple fictional universe. Gradually, the player unlocks increasingly powerful observational technology that makes the game easier, eventually including a full map and catalog of every object in the universe. The goal is to figure out the laws governing the universe as early in the game (i.e. with as little technology and data) as possible.
Starburst was not created that carefully. A lot of game design decisions were made because they seemed reasonable after maybe a couple minutes of thought. When I made later Starburst-like games, I would do extensive testing to tailor the initial conditions, trying to ensure a smooth difficulty curve. When I made Starburst, I… didn't do any of that.
Cole played Starburst that summer, and solved it with 5 technological upgrades, i.e. technological era 6. His playthrough was uninspired, but highly intelligent and competent (and quite interesting to watch). It was also punctuated by various profane comments about the game, law, and UI design.
Sometime in fall 2024, Ryan, a friend of ours who was basically an AI true believer, suggested that chatGPT would be able to solve Starburst, perhaps as early as Cole. We initially laughed off the suggestion. AI can't solve Starburst. And after Ryan played Starburst, even he backed off his assertion. But he still suggested that we test it.
When we finally got around to testing LLMs in late 2024 (including then-SOTA ones), they couldn't solve Starburst, even in the final, easiest era. So I extended it from the original 13 eras to 20, with the last being trivially easy for almost all humans. And…the LLMs did badly. We designed a prompting scheme where we presented a single era at a time, removing agency and strategic decision making, eliminating data not immediately relevant, and otherwise handholding the LLMs through everything but the core reasoning faculty we sought to test. Even with a benchmarking scheme arguably designed to give an unfair advantage to LLMs, GPT-4 and 4o solved it in era 17. o1 was occasionally able to solve it in era 16. Very poor performance. See, LLMs can't reason.
At the time, my suspicion was that, in eras 16 and later, models were able to essentially throw their superhuman knowledge at Starburst without any reasoning ability. In era 17, objects move with a fixed velocity though a discrete square grid with pac-man/toroidal wraparound. That sort of thing is presumably described extensively in their training data; with enough unthinking pattern-recognition, it might be possible to solve Starburst without any genuine novel reasoning, even novel reasoning at the level of a dumb person.
And even if Starburst in eras 16 and later were measuring AI intelligence in a human-comparable way, it would still put the models of the time roughly on par with a dumb person. The claim of a near-term threat from AI seemed patently absurd (especially coming from people claiming that GPT-4o could do graduate-level physics). How could it be a threat when it can't even do basic text-in text-out novel reasoning problems that most people can do? I was considering buying the domain aiisretarded.com and linking to a post about Starburst and the poor AI performance on it.
Okay, when you have a belief that's being challenged, it's a good practice to ask what evidence it'd take to change your mind. So what Starburst performance would it take to convince me that AI has real reasoning ability (and may represent a serious threat)? Cole and I discussed the subject and agreed on era 15. In Starburst eras 15 and earlier, there was a proper law of interaction to figure out, which almost certainly wouldn't be in the LLM's training data and would require some genuine, though basic, novel geometric reasoning to figure out. We nicknamed the boundary between eras 15 and 16 the reasoning threshold.
As January turned to February, I tested o3-mini. It was more reliable than o1 in era 16, but still couldn't solve Starburst before the reasoning threshold, even occasionally. Around the same time, I also tested GPT-3.5, which wasn't able to solve it until era 18. Okay, so there's a decent trend of improvement (though with a long plateau at GPT-4/4o), but I still suspected that none of the models had any novel reasoning ability, even if some were better at throwing knowledge at the problem (and perhaps also synthesizing that knowledge) than others. So I expected the reasoning threshold to hold off the tide for a while or longer. As I wrote in February 2025:
"I therefore suspect, with low confidence, that o3 and its contemporaries will fail to break this barrier. If a model passes this barrier, that would be a massive cause for concern."
And…I was wrong. Less than two months after I wrote those words, Gemini-2.5-pro-preview, released on March 25th, 2025, crossed the reasoning threshold. It actually might happen. Time to start prepping. Among other things, I offered to lend Cole a gun. Cole said something to the effect of not yet.
Since then, the waters have risen with the pace of a hurricane. Yes, it was funny when OpenAI advertised o3 as being "at or near genius level", but that was overshadowed by the fact that the models were rapidly becoming smarter, even if they were far from genius level in April 2025 (and are still a fair ways away in April 2026).
As spring turned into summer and summer into fall, the major AI companies kept leapfrogging each other's Starburst performance. Well, all but one did. Anthropic's models stayed stubbornly below the reasoning threshold throughout the summer and most of the fall, even while they became increasingly reliable at era 16. Cole and I started wondering whether Anthropic, with their reputation as the “most responsible” major AI company, was deliberately hobbling their public models, trying to extract as much narrow capability from them as possible without giving them dangerous general intelligence. It wasn't until Claude Opus 4.5 (released November 24) that Claude finally crossed the reasoning threshold, skipping an era and stopping at the next categorical break in the Starburst progression (multiple types of objects), putting it still noticeably behind SOTA models. Notably, Opus 4.5 triggered a new level of scrutiny from Anthropic[1].
Opus 4.6, released shortly after Anthropic "revised"their policy of not pushing the frontier, was the first Claude model to be SOTA in its price tier at Starburst since we started benchmarking them.
.
But despite the rapid improvements of the past year, this is a story about Starburst not saturating. Gemini-3.1 pro, the current SOTA in its price tier (we generally only benchmark and measure progress using the ~$20/mo models), solved Starburst in era 10, which corresponds to what we'd call somewhat smart or ordinary smart[2]. Cole solved it in era 6. Ethan, another friend of mine, solved Starburst in era 5. His playthrough was inspired, characterized by repeated wild insights. I suspect that the best human geniuses would solve Starburst early in era 4.
The models may be stuck at era 10 for a while: they've come up against probably the hardest wall in the Starburst progression. One of the issues with the design of Starburst is that the difficulty curve is uneven in places. And the steepest part of the difficulty curve is between eras 9 and 10. In Starburst, prior to era 10, players are given observations primarily in the form of lists of slopes (above the horizon) at which an object appeared. They don't include all objects, aren't attached to object IDs, and there's only limited means to get the game to tell you if a sighting you see one turn is the same object as a sighting the next turn. Starting with era 10, you get a full map and catalog, including a consistent ID for each object. The heart of Starburst – finding truth from messy, incomplete data – is replaced with something much closer to I'll-show-you-what's-happening-and-you-tell-me-what's-happening. (For this reason, we call eras 4-9 "The Starburst Eon".)
Gemini 3.1 pro has basically reached as early in era 10 as is possible without having mostly solved it before entering era 10 (which it's emphatically unable to do), but even at the highest price tier, the most recent version of Gemini Deep Think isn't close to breaking past this wall into era 9 or earlier. I don't know how long this wall will hold, but everything I've seen from human trials suggests it's a tall one. While it wouldn't shock me if it only lasted a few months, I suspect it'll be on the order of a year or two[3].
.
Okay, wait a minute. What if models are failing at Starburst for a reason other than intelligence limitations? Well, it's not an agency issue, nor one of being overwhelmed with irrelevant data. From the start, we had given models a single era in each test, and scored them based on the earliest era at which they could solve it. They don't have to make any decisions about whether to answer or ask for more data (which could arguably be called agency moreso than intelligence); they just have to take the data and figure out what's going on. (And this scheme avoids distracting them with data from more difficult eras that they won't be able to use fruitfully.)[4]
As of Opus 4.6, the models displayed sufficiently robust agentic ability that we realized the handholding of the old benchmarking scheme was no longer necessary, and we switched to a more directly human-comparable scheme wherein the models are given 5 turns of data at a time and pick whether to ask for 5 more turns or give a solution. This gives us scores for models that are essentially on a level playing field with humans, and has yielded comparable or better performance to the old scheme for models as good as or better than Opus 4.6.
But even within a single era, especially as the game gets harder, there's still a lot of data processing to do. In our three human tests, Starburst has taken 12, 14, and 48 hours. LLMs are known to have “time horizon” issues, where they're worse at longer tasks of otherwise comparable “difficulty.” Obviously, they can't remember things outside their context length, but their performance degrades for long tasks even before the context length is exhausted. To test this (and check for contamination of the benchmark), I designed two shorter Starburst-like games, one of which takes two hours, the other four hours. Gemini 3.1 Pro saturated the two-hour one. The four-hour game remains unsaturated, even by Gemini Deep Think. Notably, even models that had time horizons (as measured by METR's benchmark) well in excess of 4 hours failed to saturate either of them, though they seemed to perform better on the shorter tasks than longer ones relative to human performance (perhaps because short tasks tend to have lower discernment ceilings). So current models really do seem to be prevented from saturating Starburst by intelligence limitations, though other limitations likely play a role too.
We're trying to sort out exactly what capabilities the models are missing that would be necessary for strong AGI (which we define as average-human-level ability at anything that can be done on a computer). They have smart-ish human level reasoning ability, so why aren't they drop-in workers for most remote jobs yet? The most obvious answer is long-term agency and many things required for it. The models are generally good (in many cases, wildly superhuman) at in-context "learning", but they really can't do long-term learning from experience ("online learning") in the way humans can. They don't even have long-term memory (though agentic scaffolds add a very crude long-term memory). They also have reliability issues that tend to rear their heads much more seriously for long-horizon tasks. At first glance, it seems surprising that models with general reasoning ability could be so deficient in other faculties, but there is a human analog to this phenomenon: imagine a smart person with encyclopedic knowledge, but who's fairly scatterbrained, bad at long-term planning and execution, and has long-term memory issues. Such people aren't common, to be clear, but it's not wholly alien. (The models also seem to have perceptual limitations that few humans have, though those also seem to be rapidly improving.)
To assess this sort of agentic ability, we're looking into other possible benchmarks that lean more heavily on agentic abilities. We recently made a Starburst-like game that's based on chemistry instead of physics, takes very roughly 2-6 hours, and involves mixing chemicals and observing the effects of reactions. While Starburst involves agentic decisions about observations in eras 9 and earlier (such as where to point a telescope), and decisions about whether to give a solution or advance the turn throughout the game, it's primarily a game of passive observation. Such is not true for Chemistry Starburst. Gemini 3.1 Pro doesn't saturate it, but does well, confirming the issue isn't agency categorically, but long-horizon agency (and whatever faculties are necessary for it). We also have a technology/engineering Starburst-like game that's so long (we expect around 50-100+ hours for a playthrough) and fiendish that we haven't been able to get a single full human playthrough, despite it being over a year old. Like Chemistry Starburst, it's very agentic/interactive. I haven't yet tested AI on it, but I don't expect success. It might be suitable as a long-term agency benchmark, or we may need to devise another.
.
I tell this story because I believe there are useful lessons here. Starburst wasn't carefully, thoughtfully crafted, but it has still managed to survive nearly two years as a text-in text-out benchmark that doesn't rely upon obscure specialized knowledge or skills. Why?
Part of it is the particular characteristics of the task. Starburst is an unconventional multi-part reasoning task that doesn't heavily depend upon background knowledge. IQ tests, by contrast, elicit ludicrously high scores from LLMs, largely because they use knowledge as a proxy for intelligence, and LLMs have superhuman knowledge. And when they actually test reasoning, it's typically very narrow reasoning (sometimes aided by certain arbitrary intuitions).
Part of it is that Starburst has unusually good high-end discernment. It's easy to design a mediocre intelligence test, but it's very difficult to design a good one, especially one that's good in the high end. Even though exotic domain knowledge is neither required nor helpful, a genius has a significant advantage at Starburst over someone who's very smart. This really isn't easy to do, especially without careful consideration. Even with that consideration, I still went with one of the more obvious approaches: theoretical physics grants a significant return on genius, so I made a game where you do theoretical physics (in a simple fictional universe where advanced physics or math background wouldn't be helpful).
But the other part of my wild luck had nothing to do with the design of Starburst. We were studying and designing tests for human intelligence. To benchmark intelligence in LLMs, it's useful to understand intelligence. And to understand intelligence, it’s useful to study human intelligence. This has given me one of my most useful benchmarking insights: performance of models should be, to the greatest extent possible, compared to the human distribution of performances on that task. It isn't enough to compare to a single “human baseline” performance.
.
A final question: we originally started AI benchmarking out of a desire to assess AI risks (though we didn't take them seriously at first). For the better part of two years, we've been planning to get land in the woods if we ever expect a serious possibility of semi-imminent disaster. (Yes, I know that being in the woods is unlikely to help us survive an ASI takeover, but it very plausibly would help against an AI-assisted bioweapon, cyberattack, failed AGI takeover, AI-thucydides-trap-driven nuclear war, and many other plausible scenarios.) What capability threshold should we set for that? Back when we expected developing intelligence to be the toughest problem to solve, we set Starburst era 9 as the woods threshold. We figured that a model with intelligence significantly above that of an ordinary smart person would be capable enough to pose the sort of threats mentioned (especially considering the gap between internal and released models). But achieving average-human-level intelligence has proved significantly easier than achieving certain other abilities that most humans have, and which seem to be necessary for a drop-in worker. Starburst certainly tests those abilities to some extent, and much more so in era 9 than 10, but maybe not enough. Could a model reach Starburst era 9 without the long-term agency needed to be a drop-in worker? I lean towards yes, but I'm unsure. If it does, would its intelligence alone make it sufficiently dangerous?
We maintain a Starburst leaderboard here. If you're interested in working with us or discussing further, please feel free to email or dm me.
“Claude Opus 4.5 showed strong performance across many evaluations, warranting a comprehensive assessment to determine whether it had reached the ASL-4 threshold. We determined that Claude Opus 4.5 does not cross this threshold. However, the model is approaching or surpassing high levels of capability in our ‘rule-out’ evaluations — early proxies designed to indicate whether a model might be nearing the next capability threshold.” https://www.anthropic.com/transparency
On the subject of different benchmarking schemes for humans and LLMs, some benchmarks (such as simplebench and Arc-AGI) use different setups for humans and models that give humans unfair advantages, presumably to make their benchmarks look more impressive and difficult to saturate. For obvious reasons, I believe that this should be avoided.