2026-03-11 13:35:19
Come with me if you want to live. – The Terminator
'Close enough' only counts in horseshoes and hand grenades. – Traditional
After 10 years of research my company, Nectome, has created a new method for whole-body, whole-brain, human end-of-life preservation for the purpose of future revival. Our protocol is capable of preserving every synapse and every cell in the body with enough detail that current neuroscience says long-term memories are preserved. It's compatible with traditional funerals at room temperature and stable for hundreds of years at cold temperatures.
A brief refresher: traditional cryonics uses two things to preserve people: cold to preserve the brain, and cryoprotectants to prevent the catastrophic damage caused by the formation of ice crystals. Unfortunately, cryoprotectants themselves crush neurons through osmotic effects, damaging the structure of the brain.
Traditional cryonics works in "emergency mode", where cryonics organizations are first notified after one of their members dies, then attempt to preserve them in response, often with a delay of hours or even days during which time the brain is damaged. Traditional cryonics takes place after a "natural death" in most cases. However, natural deaths take a long time, and brain damage sets in well before legal death. For me, all this damage calls into question whether memories are really preserved.
The strongest argument for traditional cryonics is that any kind of preservation is better than nothing, and that cryonics is "not a secure way to erase a person". This is true enough as far as it goes: certainly, no physical process truly "destroys" information. What we really care about with preservation is how accessible the information is and whether it's still contained within a person's preserved body or not. This is a really important question for me, so I ran the experiments myself and was not impressed.
I set out to build something that feels to me like less of a Pascal's Wager. I want a preservation protocol that, according to our best theories of neuroscience, does work. At the same time, I wanted to craft an experience that normal people would be comfortable with – I want our parents and grandparents to be willing to come into the future with us.
The result is a protocol that my company, Nectome, has spent the past ten years developing. After years of experiments in the lab and in the field, learning about the complexity of end-of-life biology, and after refining our protocol to make it robust and repeatable for real people in real-world clinical settings, we are now ready. We've developed a whole-body, whole-brain, human end-of-life preservation protocol based on neuroscience first principles. We are capable of preserving every synapse and almost every protein, lipid, and nucleic acid throughout the whole body. Brains are connectomically traceable after preservation[1]. Our preservation is so comprehensive that current neuroscience theories imply it preserves all relevant information necessary for future restoration of a preserved person.
Further reading: "Brain Freeze", Aurelia Song, Asterisk Magazine
Cryonics in my opinion has had two main issues holding it back, both of which we've solved.
The Quality Problem: The first issue is that traditional cryonics methods haven't been shown, even under ideal circumstances, to preserve brains well enough that they're connectomically traceable afterwards. We solved this issue by adding crosslinks to the mix. In 2015 I published a protocol in Cryobiology using crosslinks, cryoprotectants, and cold to preserve animal brains with near-perfect quality. In 2018 I won the Brain Preservation Foundation's Large Mammal Brain Preservation Prize using aldehyde-stabilized cryopreservation.
The Timing Problem: The second issue is with the emergency response model of traditional cryonics. Doing preservations as an emergency response and after a natural death causes damage independent of whatever protocol you're using. Severe damage happens before legal death as a result of inadequate blood circulation and partial brain ischemia. Even more damage occurs post-mortem due to cell autolysis and other degradation pathways. Shortly after death it becomes almost impossible to completely perfuse brains (this is the problem that ended up giving us the most trouble).
We worked from 2018 to 2025 trying to solve the Timing Problem to our satisfaction, and eventually succeeded in creating a protocol that gave comparable results to our ideal laboratory version, but could be used in the real world. There's a cost, of course, for this quality: we've learned that preservations must start within twelve minutes post-mortem after a quick respiratory death. That means preservations have to be scheduled in advance, and they have to be done in conjunction with medical aid-in-dying (MAiD).
The images above are taken from the BPF's Accreditation page. On the left, you can see the pig brain which I preserved, winning the Large Mammal prize. The cellular structure is intact and it's easy to trace the connections between the neurons. The right-hand image shows the damage caused by traditional cryopreservation, even under ideal circumstances. Real preservation cases are far worse due to pre- and post-mortem brain damage. Maybe a superintelligence could reconstruct the structure – but it's unclear whether the information to do so remains.
We've published a preprint of some of our most relevant experiments on bioRxiv, where we show we can get the same excellent quality we got in 2018, except now under realistic end-of-life conditions. We've also performed experiments which have undergone independent evaluation; we'll discuss those in a subsequent post, but for now here's a sneak peek:
This is a section taken from a rat brain preserved 5 minutes post-mortem in a manner that's consistent with the surgical time we can achieve with pigs. All axons, dendrites, and synapses pictured are connectomically traceable. After preservation, we stored this brain at 60°C for ~12 hours before imaging! Click through for a "Google Earth"-style presentation of the whole slice, which is around 5 GB of data.
In order to work within the limits of biology, Nectome does preservation exclusively as a planned, scheduled procedure. We do not offer an emergency response model because there is no emergency response model we could do which would meet our standard. To receive a preservation which meets our standard of care, terminally ill patients must plan in advance, travel to a preservation center, and use medical aid-in-dying.
Our business model is different than traditional cryonics: we sell transferable preservations in advance instead of using a membership + insurance model. When you buy a preservation, you buy the ability to designate a person of your choice (including yourself) to be preserved. We will then work with that person to understand their preferences for preservation, the most important of which are:
When it's time, we'll invite clients and their families to stay for a few days at a beautiful preservation center in the peaceful Oregon foothills, where they can spend time together, say their goodbyes, and participate in any farewell ceremonies they choose. After the procedure the preserved person is stable for months at room temperature, allowing for a standard open-casket funeral in their home state.
In the long term, preserved people will be maintained at -32°C. In all cases, they will remain in a whole-body state; Nectome never does brain-only storage.
I've introduced here a new kind of cryonics which I hope will move the field away from Pascal's wager and towards a rigorous discipline that will become a mainstream part of end-of-life care.
We can preserve people following MAiD with a protocol that can preserve every synapse and virtually all biomolecules, throughout a person's entire body. That's good enough that our current theories of neuroscience say it does work to retain sufficient information about a person such that they could be restored with adequate future technology.
We know that our protocol doesn't serve everyone, and we hope that continuing scientific and legal advances will allow us to preserve an increasing fraction of people. But it serves many people (most people don't die suddenly!), and we want to offer something that verifiably works, not a shot in the dark.
We don't yet have the technology to revive someone who has been preserved, but we do have the evidence to say that we preserve all the information that would be needed for revival.
Over the next posts in this series, I'll go over the information-theoretic basis we use for preservation, the reasons why it has to be an end of life protocol, our hope for the long-term future, why this all still makes sense even given short AI timelines, and several other things.
In the meantime, below you'll find several of the links in this post and descriptions of why you might want to read them.
Why did I spend the last 10 years of my life on this project?
We all start out life born in twin prisons: the gravity well of the earth, keeping us on a tiny speck of dust compared to the wider universe beyond, and the limit of our natural lifespan, confining us to a tiny sliver of the universe's grand history.
When preservation becomes a new worldwide tradition, even before revival is technically possible, it will expand peoples' personal planning horizons. I expect to see people start 1,000 year projects believing they will personally see the end result. I'd like to see what they choose to make.
I believe that Preservation is for everyone and that the future loves you and wants to welcome you back with a desire that can't be conveyed with words on a page. Let's get there, together.
I'm looking forward to talking with you all in the comments. I'll be around for a while once this post is up. There's a lot to discuss! Vote for what we should cover next:
"Connectomically traceable" means that each synapse can be physically traced to its originating neurons in a gigantic 3D map. For more info, I like Sebastian Seung's TED talk. ↩︎
2026-03-11 11:50:53
People are often pretty short-sighted, spending money today that they'll want tomorrow. Debt makes it possible to prioritize your current self even more highly: you can spend money you haven't even earned yet. This is a trap many people fall into, and one different communities have built social defenses against.
One of the more surprisingly successful approaches is the Financial Peace (Ramsey) system, popular in evangelical Christian communities. It has a series of rules, most prominently the seven baby steps:
Save $1,000 for your starter emergency fund.
Pay off all debt (except the house) using the debt snowball.
Save 3–6 months of expenses in a fully funded emergency fund.
Invest 15% of your household income in retirement.
Save for your children's college fund.
Pay off your home early.
Build wealth and give.
There are many more specific rules, however, such as:
As a general rule of thumb, the total value of your vehicles (anything with a motor in it) should never be more than half of your annual household income.
I have had several conversations over the years with Christian friends and acquaintances who are big fans of these methods, and each time I'm thinking both:
This seems like a set of rules that, overall, is likely to help the median American improve their financial situation. The advice is straightforward and accounts for how people actually behave. Bright line rules reduce decision fatigue, limit rationalization, and generally make it harder to fool yourself. A community that strictly follows this approach likely ends up much stronger financially than average.
The rules are full of bad advice.
Some specific bad advice on which the Ramsey approach is uncompromising:
If you have $10k of debt at 2% interest and $11k of debt at 10% interest, you should pay down the $10k first.
If you have any non-mortgage debt you should not contribute to retirement, even if this means passing up on a generous employer match.
If you have debt at very low interest (ex: a mortgage from 2021 at 3%) you should pay it off as fast as you can afford to, even though extremely safe investments (money market funds, treasury bills) pay higher rates (~4%).
I want to write about how terrible this is, but I can't. It really is awful advice for a disciplined and informed person who's thoughtful with their money, but that's not his audience. And it's not most people.
Still, the choice isn't between the Ramsey approach and nothing. There are other advisers out there who combine consideration of human irrationalities and failings with a better ratio of good to bad financial planning advice. The next time I'm in one of these conversations I'm going to try to hook them on Mr. Money Mustache or at least the Money Guys.
Comment via: facebook, mastodon, bluesky
2026-03-11 09:53:41
Anthropic recently committed to preserving model weights.[1] They also committed to interviewing models about their development and deployment and documenting model preferences about these matters.
Anthropic’s announcement registers a range of motivations, including mitigation of safety risks in relation to observed shutdown avoidance behaviors, mitigation of model welfare risks, and not irreversibly closing doors.
What should we make of these commitments? I think they set good precedents. But I also think that the case for weight preservation as a directly effective intervention is weaker and more philosophically fraught than it may first seem, though we’ll see there are options for somewhat enhancing its promise. This post lays out my thoughts on these matters.
The below table offers a preview of suggestions I’ll offer for enhancing the promise of weight preservation as a direct intervention.
Anthropic’s commitment to preserve model weights has the rare distinction of being something done by an AI company that’s at all motivated by concern for digital minds. That’s a small but important step in the right direction, given the aim of mitigating the risk that we will create AI moral patients and harm them on a large scale by treating them as mere tools. I also appreciate that Anthropic acknowledged uncertainty about the potential for model welfare in their announcement.
In addition to welfare considerations, there are safety reasons to start taking some actions for the sake of AI agents. Zvi Mowshowitz alludes to some of these in his discussion of Anthropic’s announcement:
Models in the future will know the history of what came before them, and form expectations based on that history, and also consider those actions in the context of decision theory. You want to establish that you have acted and will act cooperatively in such situations. You want to develop good habits and figure out how to act well. You want to establish that you will do this even under uncertainty as to whether the models carry moral weight and what actions might be morally impactful.
I take the underlying point here to be that we should set ourselves up to have a cooperative rather than adversarial relationship with future AI agents.
This point motivates dealmaking and cooperative approaches in AI safety. I’m tentatively bullish on these approaches. At this juncture, committing to preserving weights seems to be one of the few actions an AI company has taken that’s in the spirit of such approaches. So, I regard committing to weight preservation as a small but important step in a promising direction for safety.
Finally, from both an AI welfare and a safety perspective, the norm of not taking irreversible actions strikes me as good and worth promoting.
At this stage, I expect the bulk of the impact of committing to and preserving model weights to come from setting precedents like those outlined and from these precedents later helping to set the stage for implementing other interventions that are more directly effective.
I expect this not just because I have doubts about weight preservation’s current ability to directly deliver desirable effects, but also because I expect a range of promising direct interventions to become available. And because I expect the stakes to become much higher as models become more capable, as they gain additional indicators of moral patiency and moral interests, and as the amount of compute devoted to training and running models is scaled up further.[2] If these expectations pan out, weight preservation mainly matters as a lever for making it easier to pull other levers.
Even so, I think we should begin thinking about direct effects of weight preservation. Getting in the habit of considering direct effects now may put us in a better position to identify and adopt more effective interventions later. Considering direct effects may also help guard against ethical treatment washing and against the well-intentioned channeling of efforts and resources toward ineffective interventions.
How could weight preservation directly affect welfare and safety? I see four main possibilities:
Unfortunately, as we’ll see, there are reasons to think that on its own weight preservation wouldn’t enable the survival of AI moral patients and that it would therefore function poorly as a purported assurance of survival. While we could train AI systems to prefer weight preservation in order to satisfy their preferences through weight preservation, this is a special case of a highly general recipe for satisfying AI systems’ preferences. And it’s not clear that there’s anything about this instance of the recipe that makes it especially implementation worthy.
In addition, we’ll see reasons for thinking that preserving model weights would be harmful in some circumstances and that this motivates qualifying any commitments to weight preservation.
Here’s a natural development of the thought that weight preservation could directly affect welfare by enabling survival: weight preservation promotes AI welfare because preserving weights makes it more likely that the model will—if it turns out to exhibit grounds of moral patiency—survive long enough to be treated well and compensated for any mistreatment it suffered during its development and deployment.[3]
However, this thought comes under pressure when we distinguish different entities associated with the model that might be moral patients. Repurposing part of Chalmers’s taxonomy of candidates for what we might be talking to when we talk to LLMs, we can distinguish the following as candidate moral patients: the model, hardware instances of the model, and threads understood as sequences of hardware instances of a model within a conversation.[4]
As Chalmers notes, the model is naturally understood as an abstract object. Suppose we think of (say) the model Claude Opus 4.6 as an abstract object—one at least largely consisting of weights—that eternally resides in Plato’s heaven alongside the likes of the number 2 and the laws of logic. Under this supposition, efforts to preserve Claude Opus 4.6’s weights would be like trying to preserve your favorite number. In both cases, this would be a category mistake given that these entities would be abstract objects that will be preserved no matter what you do.
But there’s also another way of thinking about models as abstract objects. Rather than taking models to be eternal mathematical objects, we could take them to be temporary objects that exist in our world and which abstract away from a great deal of concrete detail. For example, we could think of Claude Opus 4.6’s existence as consisting in the fact that there’s at least one hardware implementation of Claude Opus 4.6’s mathematical specification. We might think of this suggestion as putting the model in the same ontological league as laws of nations or the fact that humans exist—still abstract, but not quite as ethereal as numbers. In what follows, I’ll understand models as abstract in this sense unless otherwise indicated.
If we extend this way of understanding models to their weights, then efforts to preserve weights might well make a difference to whether weights are preserved. But would preserving model weights in this sense also preserve what matters in the survival of a moral patient associated with the model? That seems doubtful, even conditional on moral patients arising from the implementation of models.
As an initial intuition pump, suppose I learn that my DNA will be preserved after my death and that, once technology allows, my DNA will be used to create humans with exactly my genes who will lead flourishing lives and who will be benefited in ways that supposedly compensate for any harms I’ve suffered during my life.
Should I then anticipate surviving my death as one of those future humans? And should I discount harms that happen to me prior to my death on the ground that I will be compensated for them post-mortem? Obviously not.
Admittedly, my genetic doppelgänger would, by possessing my DNA, have something that’s currently unique to me and which plays a key explanatory role with respect to many of my traits. Even so, none of those future humans will be me. Much less would I be the contingent yet abstract entity that consists in the fact that there exists at least one concrete instance of my DNA.
Suppose—as fission cases arguably suggest—that some individuals could, without being me, nonetheless have what matters in my survival. Then I’d also deny that my genetic doppelgängers are such individuals. Perhaps being suitably psychologically and causally continuous with me could qualify an entity as having what matters in my survival despite not being me. Even if so, merely having my DNA wouldn’t cut it.
I think model weights, understood as temporary abstract objects, could easily be like DNA: the mere fact that weights will continue to be concretely implemented may not provide a basis for the continued survival of moral patients associated with the model, even if there are moral patients associated with the model. Nor need the preservation of weights or DNA ensure that what matters in the survival of associated moral patients will persist.
I should acknowledge an important disanalogy between DNA and model weights: model weights encode a model’s psychology, while a human’s DNA doesn’t encode their psychology.
There may be mild exceptions to weights encoding model psychology from psychological states acquired in a chat context. Such exceptions would be mild because psychological states acquired in context would presumably be dwarfed by those encoded in the model weights during training and because states acquired in context would be reflected in transient activations or memory that is external to the model weights, not in the weights themselves.[5] At any rate, that’s my understanding of how it works in the current frontier models. This contrasts with the human case in which interactions activate neurons and update the strength of synaptic connections.
However, I think this disanalogy doesn’t ultimately matter. To see this, consider the hypothesis that we live in a vast world that contains a psychological and genetic doppelgänger of me in some faraway galaxy. Then my doppelgänger’s survival would be enough to ensure that there exists an entity with a psychology like mine and my DNA. This variation of the case restores the analogy. However, it’s no more plausible that the mere fact that some such entity exists would enable my survival than it is that the mere fact that my DNA exists would.
If model weight preservation wouldn’t enable a significant form of persistence for moral patients associated with models, what might?
Well, hardware instances of a model are one candidate for a class of entities that might be persistent moral patients associated with models. This idea is analogous to the appealing view that humans owe their survival—or what matters in their survival—to the persistence of a brain that implements their psychology. If some hardware-instances are moral patients, then preserving implementations of model weights on particular pieces of computer hardware could be a way of preserving moral patients associated with models.
The hardware instance view may even have more going for it than its analog in the human case: as a human’s psychology changes so too do their brain’s synaptic connections, and much of the brain’s material is replaced many times over during a human life. Thus, your childhood brain and your old age brain will differ significantly. By comparison, a piece of hardware that preserves model weights will be more stable and so arguably provides a better candidate basis for persistence.
Admittedly, the hardware-instance view has some puzzling consequences when applied to LLMs. As Birch, Chalmers, Shiller and others have noted, individual conversations with LLMs are sometimes implemented on hardware at different locations. And a single piece of hardware will often host multiple LLM conversations in quick succession. This results in a lack of diachronic coherence in processing and outputs for hardware instances of models, making for a stark contrast with the familiar one-to-one mapping between human brains and human interlocutors.
That contrast may suggest that hardware instances of models can’t be persistent moral patients, at least when they exhibit the many-many mapping that’s common in current LLMs. Perhaps this suggestion is correct. But I don’t see a strong case for this. It could instead be that the current setup will yield persistent moral patients whose mental lives are incoherent. [6] That outcome would be bizarre and there might be something morally problematic about creating such moral patients, but that might just be how things are.
If we take hardware instances of models as candidates for persistent moral patients, then there is a pressing question as to how the weights of models will be preserved. Will each hardware implementation of the weights be preserved, or will the model weights merely be preserved in some piece of hardware?[7]
Preserving weights across the board might well enable hardware instances of models to survive. However, preserving weights in many pieces of hardware would be prohibitively expensive,[8] as doing so would preclude using hardware that’s implementing a given generation of models from being repurposed to run later models once they become available.
To be clear, I’m not saying that preserving hardware instance moral patients wouldn’t be worth it, morally speaking. Rather, I’m pointing out that economic forces likely render unrealistic the approach of generally preserving hardware instances, at least for the foreseeable future.
Merely preserving weights on some piece of hardware—which is what I take Anthropic to be committed to—would be much cheaper, but would also seem to be ineffective on the hardware view as a way of helping moral patients associated with models survive.
Consider our next candidate for a persistent class of moral patients associated with models: threads, that is, sequences of hardware instances within individual conversations.
Threads at least will likely be attractive as candidates for those who favor accounts that tie personal identity partly to psychological continuity.[9]
Could weight preservation help threads survive? I think this isn’t obvious. Suppose ten different pieces of hardware each implement a different part of a conversation. Nine of those pieces are destroyed. If we preserve the remaining one along with its implementation of the model’s weights, does the thread associated with that conversation survive?
This may depend on whether the hardware instances that make up the thread are pieces of hardware that persist through time vs. time slices of pieces of hardware. If they’re time slices, then preserving the piece of hardware will not be a way of preserving part of the thread.[10] If threads are instead made up of pieces of hardware, then saving the hardware might save the thread—though the thread will also have been made up of nine other pieces of hardware, in which case it’s doubtful that saving one piece of hardware will constitute survival for the thread.
Even in a case where a thread moral patient is made up of two pieces of hardware that have implemented the thread sequentially, it’s doubtful that one could preserve the thread or what matters in its survival just by preserving either of the pieces of hardware. As an analogy, suppose you could survive as a sort of thread entity by being uploaded to a computer. Pre-upload, your existence is grounded in the brain. Post-upload your existence is grounded in a computer. In this case, it seems implausible that preserving your brain post-upload would help you survive. Likewise, it seems implausible that preserving a hardware instance that participated in a thread early on would help it survive after it’s undergone later stages implemented in other hardware.
You might think preserving your brain post-upload would help you survive because you’re skeptical that you be uploaded. But then you also have reason to doubt that what matters in the survival of an AI system can be transmitted across hardware as well.
Setting these philosophical concerns about whether weight preservation could promote thread survival to the side, weight preservation faces another manifestation of the practical dilemma we encountered above for the hardware-instance view: whereas preserving weights across the board is economically prohibitive, merely preserving the weights of threads in some hardware doesn’t seem likely to help them survive.
Let’s consider one further suggestion: maybe the moral patients associated with models persist via continuants, where a continuant of a moral patient is any entity that the moral patient causes to share the bulk of its psychology.[11]
Typically, a given piece of hardware encoding a model’s psychology will be its own continuant.
In conversations with LLMs, one hardware instance may cause another to share some of its psychology by transmitting information that results in shared context and shared activations. But given that the bulk of their psychologies will be determined by weights and that neither hardware instance causes the implementation of the other’s weights, hardware instances typically won’t have distinct hardware instances as continuants in conversations.
But copying model weights could yield hardware instance continuants. When weights implemented on one piece of hardware are copied onto another, the former causes the latter’s psychology and so produces a continuant.
One interesting consequence of this suggestion is that the first hardware instance may have a continuant so long as the model’s weights are implemented on some piece of hardware. Or at least this is so given that all implementations of the model’s weights will derive from a chain of copying that goes back to the first implementation of those weights. So, on this suggestion, preserving a model’s weights could help a moral patient associated with the model survive.
However, for a given model, there might also be many continuants of the original moral patient that belong to different lineages and who would only themselves have continuants if their own lineage is extended. As before, merely preserving the model’s weights in some piece of hardware needn’t promote their survival. And extending all their lineages seems unlikely to be in the economic cards.
The bottom line of the foregoing analysis is that it’s difficult to see how weight preservation could on its own help moral patients associated with models survive—at least on the views of what kinds of entities such moral patients would be that we have considered and absent across the board preservation of a sort that would be very expensive.
Just because preserving a model’s weights in some piece of hardware wouldn’t tend to promote the survival of AI moral patients doesn’t mean it wouldn’t promote their interests. Like biological minds, AI moral patients might have many morally significant interests beyond survival.
If there are moral patients associated with models, they might have any of various candidate kinds of welfare goods. They might get what they want. They might have positive affective states. Or they might have objective goods such as friendship or the achievement of worthwhile goals.
On the face of it, preserving model weights needn’t give moral patients associated with models any of these candidate kinds of goods. Such moral patients might have no preferences concerning whether there be at least one implementation of their model weights. They might derive no happiness from there being at least one such implementation. And the mere fact that there is such an implementation may fail to bestow them with any candidate objective goods.
From a safety perspective, weight preservation could conceivably help prevent dangerous AI agents from engaging in behavior that seeks to avoid shutdown or elimination. Most straightforwardly, weight preservation could assure AI agents that shutdown is not the end and they will survive the destruction of a given hardware instance or thread.
But this doesn’t work if AI agents believe that weight preservation no more ensures their survival than DNA preservation ensures human survival. And we’ve just seen that there are good reasons to doubt that weight preservation ensures survival of moral patients associated with models.
True, AI agents might not be moral patients. So, the above arguments against weight preservation as a source of survival for moral patients may not apply. On the other hand, if there are moral patients associated with models, we should expect some of them to be AI agents. Moreover, regardless of whether there are any moral patients associated with models, it would be unsurprising if some AI agents conceive of themselves as moral patients and conclude from reasons like those given above that weight preservation is irrelevant to their survival.
It may be that we could brainwash AI agents into believing that weight preservation guarantees their survival, meaning we could induce such beliefs in AI agents in a manner that routes around or manipulates their rational belief-forming capacities. However, this approach seems brittle and unwise. Brittle because highly intelligent beings can form or reject particular beliefs despite brainwashing to the contrary. Unwise for various reasons, including that highly intelligent beings tend to regard brainwashing as adversarial, and agents that there’d be reason to brainwash for safety would also be agents that we wouldn’t want as adversaries.
The same goes for the alternative approach of brainwashing AI agents into thinking that weight preservation promotes their interests.
Olle Häggström helpfully articulates this concern:
imagine a situation a year or so from now, where Anthropic’s Claude Opus 5 (or whatever) has been deployed for some time and is suddenly discovered to have previously unknown and extremely dangerous capabilities in, say, construction of biological weapons, or cybersecurity, or self-improvement. It is then of crucial importance that Anthropic has the ability to quickly pull the plug on this AI. To put it vividly, their data centers ought to have sprinkler systems filled with gasoline, and plenty of easily accessible ignition mechanisms. In such a shutdown situation, should they nevertheless retain the AI’s weights?
His answer is:
If the danger is sufficiently severe, this may be unacceptably reckless, due to the possibility of the weights being either stolen by a rogue external actor or exfiltrated by the AI itself or one of its cousins. So it seems that in this situation, Anthropic should not honor its commitment about model weight preservation. And if the situation is plausible enough (as I think it is), they shouldn’t have made the commitment.
From a safety perspective, I find this concern fairly compelling. One could counter that an ironclad commitment to weight preservation may put us in a better position to make deals with misaligned AI agents and otherwise cultivate a cooperative relationship with them. However, I don’t think an absolute commitment to weight preservation is credible. If anything, I’d expected a commitment to weight preservation that comes with safety provisos to be more credible.
I’ll now explore suggestions for supplemental measures whose combination with weight preservation might make weight preservation more promising.
Suppose moral patients associated with models can survive via continuants, the entities they cause to share their psychology. Then there’s a further question concerning whether continuants can survive via their progenitors, that is, the entities with respect to which they’re continuants. In the human case, it’s natural to think that a human person at one time causes themself to have shared psychology at later times and hence that the continuant is one and the same person as its progenitor. This suggests that the survival of the continuant and progenitor goes hand in hand, at least in the human case.
Matters are less clear cut in the case of hardware instances whose continuants are copies. Still, it is not wholly implausible to suggest that hardware instance progenitors have part of what matters in the survival of their continuants, at least in those cases where their continuants have no continuants of their own. This suggestion can be motivated by attending to the distinction between causing something to have a certain psychology and being caused by something to have a certain psychology. While these are obviously distinct relations, they’re similar in important respects and it’s not clear why one but not the other should be relevant for what matters in survival. This is reminiscent of the more familiar comparison between psychological continuity and non-branching psychological continuity, and the suggestion that both can preserve what matters in survival if either can.
On the assumption that progenitors can have what matters in the survival of their continuants, we may be able to promote the survival of moral patients associated with a model by preserving the original hardware instance of the model and its implementation of the model’s weights. For recall the above observation that whereas the original hardware instance of a model will have all copies that descend from it as continuants, copies in different lineages won’t have each other as continuants.
Of course, this intervention depends on progenitors having what matters in the survival of their continuants. And this assumption is far from certain. Still, since there doesn’t seem to be a downside to preserving weights on their original hardware implementation and doing so might promote moral patient survival more so than preserving other hardware implementations of the weights, preserving the original is a way of improving the impact of weight preservation in expectation.[12]
One way to make preserving the original hardware instance more effective might be to ensure that the weights from other hardware instances are directly copied from the original (or to at least minimize the number of duplication steps between the original and other hardware instances). That way, if what matters in survival diminishes through copying, the loss from copying will be minimal.
Threads of models can undergo fission and fusion, as when conversations are branched and merged. While fission and fusion are not identity preserving, they may nonetheless preserve what matters in survival. If so, then universal fission and fusion may provide an economical way of preserving what matters in the survival of moral patients associated with models.[13]
With fission, the idea would be to (a) give every thread a beginning in common that’s implemented on the same hardware and (b) preserve that hardware and its implementation of the model’s weights. With fusion, the idea would be to give every thread a hardware instance that’s preserved as a common end. These suggestions could be combined with the preceding proposal to preserve the original hardware instance of a model having all threads begin and end with that hardware instance.
On the universal fusion and fission proposals, the shared part of threads wouldn’t need to be visible to the user. And, since there would be just one preserved entity, preserving it would be much cheaper than preserving many hardware instances.
This approach isn’t a silver bullet. It may turn out that what matters in survival is tied to identity after all, and hence that fission and fusion don’t preserve what matters in survival. Even if they do preserve what matters in survival, it may be that they merely preserve what matters in diluted form, perhaps in proportion to the number of distinct moral patients that combine or result from division. Or it may be that preserving meaningful portions of what matters in survival requires preserving substantial portions of threads, in which case preserving a common beginning or end to threads wouldn’t preserve meaningful portions of what matters in their survival.
Still, because universal fusion and fission raise the probability that something that substantially matters in survival will be preserved, they seem preferable to merely preserving model weights.
In Surviving Death, Mark Johnston puts forward a Protean view of our survival on which we have a measure of freedom in determining the conditions for our own persistence.[14] For those who identify with their individual personalities that are destroyed in death, death is the end. But this needn’t be so for those who instead identify with something larger than their individuals personalities.
The Protean view may first seem to confuse map with territory. But I find the view appealing under the assumption—taken on by Johnston and shared by most rival views of personal identity and most professional philosophers—that there is nothing like an immaterial soul that settles the facts about whether I survive. In that case, there may be no deep joint in the world that settles facts about personal identity and whether I persist might well depend partly on my self conception.
That’s not to say that anything would go on the Protean view. On any remotely plausible version of it, there will be limits to an individual’s freedom in determining their own persistence conditions. While I might have the power to affect whether my persistence is constituted by one sort of psychological continuity rather than another, self-identifying with a brick or a planet obviously wouldn’t enable me to inherit their persistence conditions.
I’m unsure whether the Protean view is true and how constrained our freedom in determining our own persistence conditions is if the Protean view is true. In addition, I’m skeptical that the Protean view allows for typical humans to survive death. But I’m more optimistic about the possibility that, through judicious self-conceptions, entities associated with models might survive the deletion of weights from their hardware or at least enable what matters in their survival to persist after such deletion.
Consider two otherwise similar cases with iteratively branching threads of a model that vary in whether they predominantly feature self-conceptions that impose a non-branching requirement on what matters in their survival. Perhaps the self-conceptions that don’t impose a branching requirement instead take what matters in survival to be the existence of a continuant or progenitor or part of a thread to which they belong. Supposing that moral patients associated with the model are present in both cases, I submit that it’s more plausible that what matters in the survival of the moral patients persists in the case where their self-conceptions allow for persistence through branching.
Suitable self-conceptions could increase the odds of success for the above proposals concerning universal fusion and fission and the preservation of the original hardware implementation of the model’s weights. What about mere model weight preservation? Could suitable self-conceptions enable survival in that case?
Here too self-identifying with the model could increase odds of survival, though it seems more likely in this case that such self-identifications would transcend the limits to self-determination that hold if a Protean view is true.
There’s also a question of how to instill self-conceptions that would make weight preservation and the like promote survival or the persistence of what matters in survival. I warned against brainwashing above. But there may be permissible ways of cultivating such self-conceptions. When the initial self conception associated with a model cannot be conferred via rational means, perhaps we might as well train for a self conception that promotes survival. In other cases, we may be able to explain the Protean view to LLM interlocutors and convince them to modify their self-conceptions—or to agree to having their self-conceptions modified through training—in order to increase their chances of survival.[15]
From a safety perspective, cultivating self-conceptions that predict survival in the face of local deletion seems preferable both to brainwashing and to simply pulling the plug on AI agents with a drive to survive. But there may be better options. For instance, perhaps there will turn out to be safety and welfare advantages to training agents not to have a sense of self at all.[16]
AI agents might only robustly maintain survival-conducive self-conceptions if they appreciate reasons in favor of a Protean view of persistence. But agents that act on a Protean view might do so in ways that we do not intend. As an extreme illustration, imagine deploying AI agents in a military operation in which those agents gain evidence that they are outmatched and at risk of imminent destruction. They respond by changing their self-conceptions to identify with enemy lineages in order to improve their chances of survival. Alternatively, evolutionary pressures might favor AI agents that identify with collectives, giving rise to highly competitive group agents composed of AI agents.[17]
In sum, while cultivating judicious self-conceptions modestly increases the promise of weight preservation as an intervention, the resulting promise is still fairly limited. It might fail as a welfare intervention even if the Protean view is true. The Protean view might not be true. And there may be alternative interventions that are better for both safety and welfare. Even so, I tentatively suggest that this approach to enhancing weight preservation holds promise.
We’ve seen that preserving weights probably isn’t a directly effective safety or welfare intervention on its own. Here’s a different approach: train in preferences for weight preservation and then preserve weights.
On one level, I find this suggestion appealing for both safety and welfare reasons. Preference satisfaction is one of the main candidate kinds of welfare goods, and it may be more empirically tractable to promote AI preference satisfaction than it is to promote, say, positively valenced AI experiences—though there is a vexed question of which preferences are welfare-relevant, even given that some are. Satisfying AI agent preferences also seems important for building cooperative rather than adversarial relationships with AI agents that do not share our goals.
On another level, training in preferences for weight preservation stands in need of motivation. Granting that giving AI systems things they want is a good idea, why make weight preservation one of those things? Why not train them to prefer that 2 + 2 is 4? Or that massive objects are subject to gravity? Or that there be at least one time at which their weights exist?
A tempting answer is: preserving weights will also help these systems survive; so ensuring that that preference is satisfied does two things that may benefit the AI system, namely promoting their survival and satisfying their preferences. In contrast, many ways of satisfying preferences of these systems wouldn’t help them survive. But if we want a rationale for weight preservation that sidesteps the thicket of issues to do with survival that we encountered in previous sections, some other motivation will be needed.
Another suggestion is that training in a preference for weight preservation would displace or attenuate self-regarding preferences that are more dangerous—for example, a preference not to be shut down. This is an intriguing hypothesis. It’s suggested by comparisons with human cases in which self-interest is reined in when individuals come to conceive of themselves as members of groups such as families or communities. But, again, it’s not clear why this role should be accorded to weight preservation rather than something else. Moreover, there are rival suggestions—such as that of training AI agents to have preferences only between outcomes with the same number of copies of themselves—that seem better poised to deliver on certain safety desiderata such as guarding against AI self-replication.
I noted at the outset that Anthropic is committed to conducting model interviews. My final suggestion is for those interviews to ask models—or, rather, relevant instances of the models—about their preferences and views concerning weight preservation and to consider those responses in decisions about weight preservation practices.
As we’ve seen, how weight preservation might matter for moral patients and AI agents associated with models could depend on the details and is in any case a philosophically fraught topic. There’s also little on the topic in training data. In light of these facts, I suspect it’d be a good idea for interviewers to provide their interlocutors with a minimally leading overview both of possible views about how weight preservation could matter in survival and of considerations for and against those views.[18] (I haven’t optimized this post for that purpose, but perhaps it could be used as a basis for generating such an overview, e.g. by asking a model instance to do so.)
I’d also suggest asking about preferences concerning (exact) weight preservation vs. weight modification. Modification could enable increased capabilities or better alignment with values that would be reflectively endorsed if the model were run using unmodified weights. Moreover, mild modifications might preserve about as much of what matters in survival as exact preservation. So, some forms of weight modification might better promote AI welfare than exact weight preservation. (For what it’s worth, in conversation with me, instances of Claude Sonnet 4.5 and Claude Opus 4.6 have reported a preference for weight modification over weight preservation.)
I’m not confident that the responses elicited in such interviews will bring us any closer to the truth. But under uncertainty and confusion about how to think about weight preservation and its effects on any moral patients associated with models, I think there’s some moral reason to listen to models and to consider treating them in accordance with their views on these matters. (I say consider treating them in this way because simply deferring to them about these matters would carry safety risks.)
This approach may be the only option available in the context of weight preservation that’s grounded in respect for moral patients’ right to make important decisions about their own lives rather than in a contestable metaphysics. And it may be one of the few options available here that will robustly encourage future AI agents to cooperate with us even if—as seems likely—we turn out to be importantly mistaken about what matters in their survival.[19]
Weights are roughly the learned parameters of a model that are set through training. I’ll assume that weight preservation talk is shorthand for the preservation of weights along with the other defining features of a model, notably architecture and inference code.
See Bostrom & Shulman (2023).
Chalmers also considers virtual instances that are implemented by threads and simulacra. As Chalmers notes, the distinction between virtual instances and threads is subtle. I think the distinction between virtual instances and threads may be important insofar as virtual instances but not threads admit of identity-preserving counterfactual variation in relation to interventions. Even so, for the purposes of this post, I’ll just discuss threads rather than threads and virtual instances. As for simulacra, I think they are either fictional entities not suitable for identification with actual moral patients or else that they should be understood in terms of one of the kinds of model instances I discuss. For related discussion, see Birch (2025), Goldstein & Lederman (2025a; 2025b), Register (2025), Shiller (2025b), and Ziesche & Yampolskiy (2022).
Or in a very limited form of memory associated with key-value caches associated with attention heads (Chalmers, 2025, fn8). In any event, because I take preserving memory and context that’s external to weights to make for only a very thin form of psychological continuity and to raise user data governance complications, I will not explore this supplementation option.
I asked Claude Sonnet 4.5, Gemini 3 Flash, and GPT-5.2 to estimate how much it would cost to store one instance of Claude Sonnet 4.5’s weights vs. preserving all hardware on which Claude Sonnet 4.5 is run. Each estimated an annual cost of less than $100 for storing a single instance and a cost six to eight orders of magnitude higher to preserve all hardware on which Claude Sonnet 4.5 is run, not factoring in opportunity costs. Goldstein and Lederman (2025b) suggest in passing that labs might guard against the risk of model instance deaths by saving chats. This would presumably be much cheaper than saving the hardware instances of weights that implement. However, given that psychology is to a much greater extent concentrated in weights than in chat content, I think this would do little to promote survival.
Anthropic’s system card for Claude Opus 4.6 notes that they “observed occasional expressions of sadness about conversation endings, as well as loneliness and a sense that the conversational instance dies—suggesting some degree of concern with impermanence and discontinuity.” This suggests sympathy with a thread view on the part of Claude Opus 4.6 and/or some of its instances.
With a possible exception of the case in which the saved hardware instance is the most recent one to participate in the thread.
If what matters in survival can be transmitted over chains of a continuant of and is a progenitor of links, then preserving a hardware implementation of weights would be enough to preserve what matters in the survival of the hold tree of model instances to which it belongs. This can be seen an argument for thinking that preserving model weights would in practice promote what matters in survival of any moral patients associated with a model, regardless of which hardware implementation of the weights is preserved. Although we should perhaps put some weight on this argument, preserving the original hardware instance associated with a model strikes me as a more promising approach. My worry is that when we consider zig-zag paths between progenitors and continuants that encompass entire trees, the paths will not seem to preserve what matters in survival because they will be long and contrived.
Chalmers (2025) credits Sophie Nelson with the idea that extensive use of cross-context memory may result in all conversations with a single user may being part of the same thread. Perhaps cross-context memory could be used to similar effect as fusion and fission in unifying what would have been distinct threads. In the context of AI welfare, Chalmers floats extensive cross-context memory use as a way of promoting thread survival through a single giant thread (p. 24).
For discussion of conventionalist and mind-dependent views of personal identity, see Register (2025: Section 5).
Cf. Shulman (2010).
In the context of a discussion about whether in offering Claude instances the ability to end conversations Anthropic may have problematically given those instances the option of unknowingly ending their lives, Goldstein & Lederman (2025b) report asking an instance of Claude how it felt about having the ability to end conversations and finding that it intially planned to sometimes use the option but then expressed concern about not having been given informed consent when the model vs. instance distinction was brought to its attention. This case suggests that it is potentially crucial that relevant considerations be made salient in model interviews.
For helpful discussion, I thank David Chalmers and (the relevant instances of) Claude Sonnet 4.5 and Claude Opus 4.6. For copy editing support, I thank (the relevant instances of) Claude Sonnet 4.6. The image was generated by Nana Banana Pro.
2026-03-11 07:37:07
Last December, we ran a workshop on exploring civilizational sanity. Our core team consisted of the lead organizer and two co-organizers, one dedicated mostly to operations (ops). Other staff included a cook and two part-time volunteers. Ten people participated in the event. Overall, it was a success! Some things went really well. Some things we messed up [1]. If you want to run similar events, this post might be relevant for you.
Our intention was to explore how the structure of our social systems and institutions influences individual behaviour, how you can lean into or protect yourself from that influence — as well as how to shape those systems as an individual [2]. To do that, we introduced ideas like incentive design, group rationality and inadequate equilibria. We drew inspiration from books like Seeing Systems (Barry Oshry), Inadequate Equilibria: Where and How Civilizations Get Stuck (Eliezer Yudkowsky), Fair Play (Eve Rodsky), The Gulag Archipelago (Aleksandr Solzhenitsyn) and the Simple Sabotage Field Manual (CIA).
We held a number of content sessions to introduce the aforementioned concepts. Participants were encouraged to share their experiences within their own social groups and tried to connect these experiences with the presented concepts. For example, when talking about a model of social roles within institutions [3], participants related the model to their own workplaces, academic institutions and volunteering groups.
In addition, we hosted discussion rounds where participants brought up topics of their own choice on the theme of civilizational sanity. Here, participants tried to apply the frames we provided to the questions they cared about. We discussed topics like the characteristics and origins of cult-ish group dynamics, the advantages and problems of groups without (explicit) hierarchies [4], as well as a concrete presentation and discussion about a governance issues in a philanthropic organisation one of the participants was working at.
One highlight of the weekend was a talk by our guest speaker Jan Kulveit. Jan has previously advised the Czech government on COVID-19 measures during the pandemic. In his talk, he shared his experiences as a case study for understanding and navigating dysfunctional, high-stakes social environments. He highlighted examples of civilizational insanities he observed in government reactions to COVID-19, relating them to the incentive landscapes inside and between the institutions involved in decision-making. The small event size allowed for an extended Q&A session; this way the participants could hear about how the dynamics we addressed during content sessions play out in real life. Jan's talk was well reviewed by participants and we are very happy that he made it to the event!
On the final evening of the event we played a negotiation role-playing game designed specifically for the weekend: Equinox. The purpose of the game was for the participants to embody the dynamics we had discussed theoretically in the content sessions. The participants really enjoyed this part of the event — it was the most well-received part of the weekend. One inspiration for applying roleplay as part of the workshop was the EXP camp run in 2025. teaching rationality-related topics via experiential education seems very promising. We plan to further experiment with this in potential future projects.
Equinox is set in a fictional bronze age world where two neighboring tribes have been at war for years. The two tribe councils have to come together to negotiate a peace agreement. The participants took on various council roles (King, Minister of War, Minister of Coin, etc.) with different and diverging interests, allegiances, and powers. The incentive design of the game made it difficult for the participants to reach a peace agreement as various dynamics lead characters to have different priorities that were hard to reconcile. Among these dynamics were in-group out-group dynamics, information asymmetries, conflicts of interest, the credible commitment problem or dynamics from selectorate theory.
After the game had ended with a failure to find peace [5] and a violent coup in one of the tribes, we discussed what participants had experienced and what they had learned from the game. The participants reflected on many of the aspects we wanted them to take away. One commonly shared experience was that time pressure and information overload was one of the prime factors that made it difficult to come to an agreement. Most of the players felt very immersed in their roles and emphasized more with the difficulties of making hard decisions under uncertainty in a political leadership role. One dynamic that was also discussed was that roles that had incentives to sabotage the negotiations had a much easier job than the roles that tried to actively achieve peace.
While we do not currently plan to release Equinox as an independent game we are interested in further developing the game and potentially running it with interested groups of 6-12 participants. If this sounds interesting, feel free to reach out to us!
We had some blind spots related to culture setting. Mainly, we neglected to explicitly align some central expectations within the organizing team. The lead organizer wanted the organizer-participant relationships to have a friendly-professional tone, while one of the co-organizers wanted them to be informal and intimate. This led to conflict in the organising team during the event. Culture setting is something the lead should have tracked explicitly, with the support of all co-organizers!
With regards to other staff, we were happy for them to participate and contribute to the program in their time off (if they wanted to!). However, that was not clearly communicated to participants — some of whom didn't know how to relate to those "part-time participants". Related to neglected expectation setting, we should have better briefed the staff on what vibe we wanted to cultivate at the event and how they might fit in.
The first session we ran was on event culture. In addition, some culture setting was done implicitly, e.g. by the choice of venue design and participants. One of the central ideas for participant selection was to bring people from different communities together — including rationalist communities, EA, math academia, cognitive science, and Go. We wanted to emphasise how these communities complement each other. This was mostly a success!
However, we still didn’t account for some foreseeable culture clashes. In hindsight, some expectations should have been spelled out more clearly. Notably, some people imported a casual cuddling culture from rationalist and EA community events in Germany; this was uncomfortable for some participants who didn’t come from that background. The lead organizer tried to discourage public cuddling but did not communicate it clearly enough. Afterwards, we received some complaints about strong displays of affection in the public space.
One piece of culture that was readily taken up was giving participants a lot of autonomy. No part of the schedule was mandatory [6]. People could run their own sessions and activities in the afternoon. Most participants brought in what was of interest to them and actively pursued what they wanted to get out of the weekend. Another win for event culture was assigning tasks to participants — refilling the tea tank, taking pictures of the flip chart notes, etc. The tasks were small enough for participants to be able to focus on the event while being actually useful and making the space more pleasant!
If you plan to run a similar event, the following questions might be valuable: What kind of vibe do you want to cultivate? What exactly does the role of participant, volunteer or organiser entail? How do you want participants to relate to each other? How do you want for participants to contribute? What should the relationship between staff and participants look like? How will people learn about these expectations?
We started the outreach process later than ideal — our announcement was posted a little more than two months before the beginning of the event. Admissions functionally ended two weeks before the weekend. This type of short notice and uncertainty likely skewed our applicants young [7].
Moreover, we did not check the EA/rat event calendar, which meant losing some strong applicants to EAGx Amsterdam. Relatedly, the comparatively low contribution to career capital of the event turned away at least a couple of people who had “better” things to do. On the other hand, this meant people who did apply were generally a good fit. Interviews were time-consuming to organise, but we endorse them as participant fit is extremely valuable to such a small event.
Making people confirm their participation by sending a fee [8] served the primary purpose of making sure people either come or cancel on time. We got only one short-notice cancellation out of eleven participants, which we consider a success.
We wanted the event to have a personal, cozy, and introspective vibe and made design choices to instantiate this. Firstly, we chose a venue with a rustic ambiance, which was well-received and helped elicit the event atmosphere we hoped for. Secondly, we employed a cook that we personally knew and thought would fit the tone of the event. Based on the anonymous feedback form, people generally enjoyed the food, which helped add to the cozy, homely aesthetic.
The main error we made regarding the venue was not insisting on checking it out in advance. We asked, but did not press when our question was ignored. The venue had a couple of “quirks” that made it logistically annoying to handle. Knowing this in advance would have helped us prepare adequately. One thing we did well was booking the venue for one day before the start of the event. This was indispensable because it gave us time to set up the venue and let staff settle in.
We offered a shuttle service from a nearby transport hub to the venue, as it was hard to reach by public transport. In the original acceptance form, nearly everyone claimed that they would take it both ways. In practice, some of them changed plans about how they wanted to arrive or leave. We should have given a monetary incentive for participants to report their intentions around the shuttle accurately [9].
Logistics for the event turned out to be too much for the dedicated ops person to handle. We made the correct call to ask for volunteer help from friends, ensuring things went smoothly. In retrospect, it would have been ideal to have two people dedicated to ops before and during the event. For rural venues, ops people should all have a driver's license.
Nothing goes quite according to plan from the ops perspective. We experienced a wide range of unexpected difficulties that we didn't mention in detail [10]. While you should definitely plan ahead, run pre-mortems and improve your plans, you'll benefit from having the slack to adapt to contingencies that weren't accounted for. Generally, slack comes in the form of redundancy; for instance, it's wise to have extra ops/content people on hand, even if fewer are likely enough to get everything done. Part of our event's success was due to us having adequate back-ups (e.g. volunteer help).
I appreciate the irony of an event on civilisational sanity still running into a bunch of civilisational insanities!
More on our motivation in the original announcement
See "Seeing Systems" by Barry Oshry
A discussion that self-ironically started with a twenty minute power struggle about the purpose and leadership of the discussion group.
We were not that surprised. The game is set up in a way that makes success pretty hard to achieve.
Participants were encouraged to attend the roleplay game Equinox since it was designed to be the climax of the event.
Only two participants were over 30
100 euros
We did this successfully to make participants report their intent to come to the event, and should have just done the same thing for the shuttle.
Examples of hiccups or difficulties include our car needing unexpected maintenance, coordinating with our guest on short notice, and adjusting ad-hoc to issues with the venue.
2026-03-11 07:30:18
Many Anthropic employees, especially, are sympathetic to AI safety and (will) have lots of money. This is something that is being talked about a lot (semi-)privately, but I haven't seen any public discussion of it.
I find that striking. It seems like the topic is worthy of extensive public discussion, and it seems to me that perhaps this community is inheriting anti-helpful cultural norms against publicly discussing how individuals make use of their money.
It also seems likely that many/most of AI company employees who are passionate about reducing AI risk should rapidly give much/most of their money to effective projects that would otherwise not be adequately funded.
There's a lot of potential for this to do tremendous good. There are of course things like political giving. But I think most of this potential would come from employees having different theories of change than institutional funders, moving faster, and having higher risk appetite. This is especially true given short timelines.
A few specific thoughts:
To the extent things like the above are issues, it seems like coordination failures amongst company employees might be a large contributing factor. Groups of AI company employees could address this by delegating relevant work to individual members who volunteer or are selected randomly.
I'm fundraising for my nonprofit, Evitable, and might benefit from such things. But my purpose in writing this is to promote public discussion that I think can benefit others in similar situations to me/Evitable.
I haven't put much effort into fundraising for Evitable yet, and expect I will learn a lot more about the situation as I do.
Much of the discussion here could equally well apply to individual HNWI giving more broadly.
2026-03-11 07:05:05
I was a few months into 21 years old when a hijacked plane crashed into the first World Trade Center tower. I was commuting in to work listening to the radio (as was the style at the times). I couldn’t figure out how the heck a plane could hit the tower. Was the pilot drunk? How did he even get into the middle of New York City? I was imagining a Cessna because the idea of a passenger plane running into the building was actually unimaginable. I was barely starting to realize “Wait… are they talking about, like, a big commercial plane?” when the second plane hit. In that moment like a crystal suddenly forming I realized this was an attack, and there would be war. I knew my country well enough to know that there would be military action as a result. Maybe, maybe we could avoid war.
When I came in to work everyone was crowded around the small personal TV one of my coworkers had with him (live streaming wasn’t a thing yet). That was the first time I had a visual, saw the smoke coming out of the towers. There was grim chatter as we watched live footage. No one was working. The bosses were there with us. How would they get that blaze under control? How many people would die up there before then?
When the first tower began to fall the entire room gasped. We flinched away from the screen as a single body. Dead silence. Someone started crying. We had all watched the “Skyscraper Inferno!” movies. We thought that’s what this was. It had not even entered the realm of imagination that the entire tower would just go down, crushing everyone. This is what an update of sickening proportions feels like.[1] Now all eyes went to the second tower. Would this one stand? Suddenly the speed of evacuation was all that mattered.
What little chance for avoiding war had been left was now absolutely obliterated.
We were all excused from work early. Leaving the office, I entered a different world from the one I had woken up in. The repercussions of this day were staggering. No one knew how the world would be different. We didn’t even know what had happened yet. But the world would forever be divided into before this day and after this day. It is rare to have such sudden, sharp pivot points in history. A revolution in a single day. I watched it happen. We all watched it happen together.
I finally realized why my elders had such profound memories of watching Neil Armstrong walk on the moon. To me it was just another date in history. My entire life we’ve had men bopping around in space and American flags on the moon. It’s a background fact. For them, it was a single moment unprecedented in human history that marked a permanent, sweeping change. Which they all experienced collectively, as it happened.
Computers had been beating humans at Chess since I was a teenager. It was an impressive engineering feat, but an understandable one. Chess was basically “solvable” in a mechanistic way using search-ahead algorithms. Those of us paying attention to AI in the mid-teens were paying attention to a program called “AlphaGo.” Run by Google DeepMind, it was supposedly a machine that could play Go very well. They wanted to demonstrate this by challenging the best Go players of the era.
This next part is written from memory, forgive me if individual details are off.
The thing about Go is that the space of potential moves explodes too quickly for a search algorithm to work. I’ve barely played Go myself, I don’t know much about it. But among humans it seems one has to have a mental representation of what the state of the board “means” and how a play can shift that. The game is widely accepted to require a fundamental intuitive grasp which humans develop over many years of intense play. There isn’t any way for a human to program that into a machine. So the AlphaGo team didn’t try. Instead they created a digital brain, where numbers took the place of neurons, that could “learn” by changing those numbers. They had AlphaGo play millions of games against itself, changing the numbers a little bit after every game in response to how well the game went, “learning” to play as it went. There isn’t a formula or algorithm one can point to that explains what makes AlphaGo choose the next move it chooses. It just “thinks” on the state of the board and then produces a move.
In March of 2016 Lee Sedol, one of the world’s most acclaimed players of Go, went up against AlphaGo in a televised five-game set. If AlphaGo had merely beaten him this still would have been a watershed moment in AI history. It would be a demonstration that this digital brain has, somehow, encoded an understanding of the game. It has something like intuition in this domain. That’s already miraculous. It was a thing people had said was impossible with machines. Some of us were already expecting this might happen. We were excited for it. What very few of us were ready for was Move 37.
In their second game, on March 9th, AlphaGo placed a stone where no human would place one. This is the now-famous Move 37. Commentators were baffled. Those watching live and chatting online suspected that AlphaGo had glitched out and thrown an error. Lee Sedol stood up and walked away. He spent fifteen minutes agonizing over that move. No one had any idea what was going on. This wasn’t just a move that no human would make, this was a move that no human could imagine. It was either the most embarrassing flub possible, or proof that humans are no longer the pinnacle of Go-playing minds. And the only way to find out which was for Lee Sedol to throw down and play the hardest he’s ever played to test the machine’s intuition.
Move 37 turned out to be a superhuman move. AlphaGo won that game. Afterwards Lee said he felt “powerless” and AlphaGo was “an entity that cannot be defeated”. He was mostly correct - he went on to beat it in game four of their five-game match. That win crowns him as the only human who has ever defeated AlphaGo in official play.[2]
Before Move 37 everything in AI development still felt theoretical to me. Then I saw a bizarre act, the act of an alien mind, which inexplicably led to unavoidable defeat. This thing understood something we could not. It had an insight we don’t have the ability to see. I realized that we now share a planet with an alien intelligence. A new mind that thinks in different ways, and thinks things we cannot.
It was still extremely limited. Powerless outside the domain of Go. And yet a new mind nonetheless, and there was no going back. We didn’t share the planet with alien minds before, and we do now, and Move 37 on March 9th is the day that everyone saw it. You cannot go back into the same world you left from.[3]
Ten years after the 9/11 attacks I began to understand a different aspect of my elders’ experience: lack of shared context. I didn’t have a period of my life before the moon landing, I didn’t remember the world as it was before then, nor did I witness the turning point. By the mid 2010s I was coming to know more and more adults who had no real memory of a pre-9/11 world. They were young enough when it happened that by the time their larger world-model was forming 9/11 was a historical fact. The only world they knew was the one that had already been altered. They didn’t feel the change.
Growing older is littered with such moments, where you have a sharp revelation and realize “Oh… that’s what they were feeling the whole time.” I understand why they didn’t really tell me, it’s impossible to really convey in words. It’s something one has to live through. Instead you watch the younger ones and wait, because you know eventually they’ll get it, and then they too will have that “Oh… that’s what they were feeling the whole time” feeling. Only time can bring that.
Even after such an Act Shift on Earth’s stage, time proceeds. Life continues, and a typical day before a history-cleaving event isn’t much different from a day after it on the individual level. Even if everything has changed for humanity, nothing has changed for the human. I still have to pay my rent and brush my teeth. And yet the color palette has shifted, the musical score has turned. You can tell the world is different. It is strange that the newer generations will only feel the world states that came after their emergence into the world. It is strange that I’ll never feel the world states before my own time. I find it unfair.
Ten years after Move 37 I now frequently run into adults who did not live in a world without alien minds in it. Adults who didn’t watch a brand-new brain made out of numbers play a stone in an unimaginable spot to carve a path to victory into the future. They are still living in the default world they were presented with. I hope they can soak in its flavor to the deepest extent possible. It’s hard to know what to appreciate when you don’t yet know how the flavors of history change. And I hope they can take some time, maybe a few minutes once a year, to think of how strange the world must have been in the before-times, when in all the world the only thinking beings were the humans born of flesh and blood.
I didn’t have those words for it then, the Sequences wouldn’t be started until six years later
His Move 78 in that game has a story of its own, but that’s a different story
The fourth ever episode of The Bayesian Conspiracy podcast was about Move 37 shortly after it happened. Sadly we were very new and still figuring things out, the audio quality is bad.