2026-03-20 12:09:00
What work in training models today could be reasonably automated by current or near-future AIs? What else could current or near-future AIs do to accelerate AI development? Where have we hit the point of diminishing returns?
This article is a list of ideas that answer the above questions, and a bit of a framework for thinking about it. I'm very curious to hear others.
We can think of the self-improvement by AIs as an S-curve -- the x-axis is the amount of intelligence you put in, and the y-axis is the amount of intelligence you get out. [1]
At the start the AIs are too dumb to help much, and at the end finding more improvements becomes very difficult [2], but in the middle the performance is super-critical -- for every unit of intelligence put into the problem of building a better AI, the performance that you get out goes up by more than one unit. [3]
We can break this down into a series of overlapping S curves, where each S curve represents a specific category of things that help to build the next AI. Early categories are things dumb AIs can help with, while later ones require more intelligence.
Then, we want to know: are there many categories of work that AIs are only very recently able to do? [4]
We might be able to predict imminent jumps in capability by looking at what AIs can do today, but which they couldn't do one pre-training generation ago, because those AIs are the ones helping build the next, yet-to-be released model. Put another way -- if the current capability is in the steep part of the S-curve for a given task, we'll see rapid gains. If it's in the steep part for many different tasks at the same time, we will see very rapid gains.
So, I'm going to name some task categories, then guess where AI capabilities are as of the end of 2024, 2025, and 2026 in that category by marking it on an S curve [5]. The categories aren't perfectly independent by any means.
This obviously doesn't give us anything objective -- and it doesn't help at all with thinking past the next few years -- but that's okay, it is a tool to help think through whether or not we expect algorithmic AI progress to speed up or slow down in the near future.
A little javascript tool to plug in your own estimates is available here. Give it a go!
Bug Fixes
Reading training-related code and fixing unintentional mistakes.
Models often succeed in training despite mistakes throughout the software stack that trains them. ("Look. The models, they just want to learn." [6]). AIs which can automatically find and fix these mistakes (of which many may exist) can have a non-trivial improvement on training speed.
Supporting Structures
Developing secondary tools, like programs which better visualise what is going on in a training run, or which surface more relevant info about the experiments, or which give clearer ideas of what data is being used, and so on [7].
Tools like this let you more easily figure out what changes would be beneficial, which is especially useful in coming up with ideas for improvements if you have 'research taste' worse than the best human researchers, as you would if you were, for example, a 2026 AI model.
Automated Experiments
If a researcher has an idea they'd like to try -- maybe a new architecture, or a specific ordering of data, or an RL reward structure -- how quickly can they run a minimal experiment and get indicative results? How much can AIs help with writing and running these experiments?
Taking a spec and generating software to implement it is a normal software engineering task, and this is arguably a subcategory of 'supporting structures', but it's major, and end-to-end automation of experiments might have a meaningful difference to 'helps build a visualisation tool one time'. Building environments for RL belongs in this category.
Closed Loop Software Optimisation Tasks
How significantly can the AI autonomously optimise closed loop, easy to verify tasks?
Stuff like kernel and compiler optimisations, or possibly hyperparameter search. The sort of thing evolutionary algorithms and pure RL agents excel at, where we can see the results quickly, and which doesn't require the same kind of deep conceptual, broad understanding as theoretical research.
Chip Design
How much better can AI designed chips get?
Projects like AlphaChip are already used to design floorplans for TPUs at Google, though there are lots of other parts of chip design as yet untouched, as far as I am aware. There are many layers, some requiring significant context regarding the manufacturing process, or what exact compute loads they will run, or how they interact with the power, network, cooling systems in datacentres, and so on.
Supercomputer Layout / Infrastructure
How much can AI optimise the physical infrastructure for supercomputers?
Can it help design the layout of the sites? Help optimise the designs for the networking? Power distribution? Cooling? Scheduling? I am very unsure here.
Data Cleaning & Ordering
How much can AI improve the quality, structure, or use of existing data?
A lot of data is really crap. It's difficult to clean because the scale is so far beyond what humans can accurately review, but a significant proportion of algorithmic improvements over the last few years may have been from better data, rather than actually creating better algorithms [8].
AI models are extremely well positioned to improve data quality because a lot of the work is simple enough for small models to help with, while being not valuable enough to have already spent that many human hours on [9]. Things like ordering data for curriculum learning also fall into this category.
Synthetic Data
How good is the AI at generating and assessing new data to train on?
This is done a lot right now, and it works because in lots of domains it is easier to (even approximately) verify quality than it is to generate it, so you can generate many outputs with current AIs (or have them try at tasks for a very long time) and then keep only the best results to train the next generation on, moving the average up.
It's trivial to see how this is done with maths and programming, or game playing, or other agentic tasks with some completion condition. Debate remains on how well we can score things like art and writing (while keeping those scores accurate to diverse human preferences), but I suspect there are many ways to automate this.
Surfacing The Best Existing Ideas
How well can AIs find and apply relevant ideas from existing literature?
A lot of things have been tried, but human researchers don't automatically know what work was already done because they haven't read every paper or blog post, or archives of experiments from inside labs. But, they can be helped by an AI which can do literature reviews (or which you can ask 'I had this idea, has it been done before?').
Qualitative Research
How good is the AI at generating new research ideas on its own?
These don't need to be complex, a slightly different way to connect layers in a transformer would qualify, but they must be original, and -- to distinguish this category from simpler optimisation tasks -- costly to verify.
While AI's research taste is worse than humans, and while there is a bottleneck on high compute for experiments rather than a dearth of ideas to try, then AI contributions here would be small. However, as soon as AI research taste gets beyond the best human capabilities, we might see a very sudden spike here. [10]
After writing this up and putting my best guesses for where we are on the S curve in each category, I think I am even more bullish on rapid very near-term AI capabilities gains than before, but have no strong opinions on the shape of the curve after that.
It seems like there are lots of things which only the most recent generation of AIs have started being able to do usefully, and which we are still far from saturating performance on. This means the current generation of AIs were (mostly) trained without this uplift, but the models in training right now will see some benefit from it, and the generation after that would be trained in a mostly automated way.
I haven't covered hardware bottlenecks at all, and since those are less easily automatable than these information-level tasks, it's possible that progress may slow in the following years after the low hanging algorithmic improvements are picked, depending on how capable near term AIs get, and whether all then-possible information tasks face diminishing returns (i.e. are all tasks AIs have a comparative advantage in are in the second half of their respective S-curves?). That said, I suspect progress will continue rapidly for quite some time.
This appendix is just a list of times various frontier labs, or employees therein, have made statements about their AI models helping build the next generation of AI models.
MiniMax - M2.7, 19/03/26 - "M2.7 is our first model deeply participating in its own evolution." "...we let the model update its own memory and build dozens of complex skills in its harness to help with reinforcement learning experiments. We further let the model improve its learning process and harness based on the experiment results."
Anthropic - Claude, 11/03/26 - "...70% to 90% of the code used in developing future models is now written by Claude" and "...one researcher described a colleague running six versions of Claude, each managing 28 more Claudes, all simultaneously running experiments in parallel." [Reported by Time, unclear if these experiments are directly related to training new models].
Karpathy - Autoresearch, 10/03/26 - "Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement)"
OpenAI - GPT-5.4, 05/03/26 - "GPT-5.4 Thinking did not meet our thresholds for High capability in AI Self-Improvement. The High capability threshold is defined to be equivalent to a performant mid-career research engineer, and performance in the evaluations below indicate we can rule this out for GPT-5.4 Thinking." [Note this is a quote against GPT-5.4's ability to self-improve.]
OpenAI - GPT-5.3-Codex, 06/02/26 - "A researcher will do a training run, and they'll be using Codex to 'babysit' or monitor that training run, or they'll use Codex to analyze some data about the training run, or they'll use Codex to clean up a data set or something like that." [Reported by NBC].
OpenAI - GPT-5.3-Codex, 05/02/26 - "GPT‑5.3‑Codex is our first model that was instrumental in creating itself."
Anthropic - Claude Opus 4.6, 05/02/26 - "...the ASL determination for autonomous AI R&D risks required careful judgment. Opus 4.6 has roughly reached the pre-defined thresholds we set for straightforward ASL-4 rule-out based on benchmark tasks. Thus the rule-out in this case is primarily informed by qualitative impressions of model capabilities for complex, long-horizon tasks and the results of a survey of Anthropic employees."
Anthropic - Claude Code, 24/07/25 - "...the team uses Claude Code to build entire React applications for visualizing RL model performance and training data." and regarding RL, "The team lets Claude Code write most of the code for small to medium features while providing oversight, such as implementing authentication mechanisms for weight transfer components."
Google DeepMind - AlphaChip, 26/09/24 - "Our AI method has accelerated and optimized chip design, and its superhuman chip layouts are used in hardware around the world."
This is obviously a very simple model. It's just a little tool to help think about the problem from a different angle. ↩︎
The end could be after the entire planet is disassembled and turned into compute, of course. We're not assuming that the S curve stops early. ↩︎
This graph makes things appear slower than they would be, because we don't spend equal time at each level of intelligence -- we take a step up equal to the output. If we graphed time on the x-axis instead of input intelligence, the S would look less curvy. ↩︎
How many categories are currently in the steep part of their S curve, especially early in the steep part of their S curve? Obviously our selection of categories is biased -- we naturally think about stuff that either we or the AIs currently do. I think that means there would be a lot of missing categories that come after AGI, where the AIs can do things that current researchers don't do. ↩︎
For clarity, the x-axis is the AI capability within that category ("How good is the model at doing X?"), and the y-axis is the improvement in AI capability that you expect to result from having the AI model help with training. The axis do not have scales, because the useful thing to get from this exercise is whether we are nearing or past the peak rate of change within each category, but it might be helpful to think of the axes as being log-log scaled. ↩︎
Ilya Sutskever, quoted via Dario Amodei on Dwarkesh. ↩︎
To give an example from 2019, I trained a vision model to detect apple ripeness, and built a tool to look for and categorise the classifications which were most wrong (i.e. said the apple was ripe when it wasn't) in order to figure out where my data set was weak, and figure out patterns like "it's most likely to fail when the lighting is over-saturating in the middle of fruit, which is reflective". Tools like this could be built much faster today, and could be built trivially with next generation AI models. ↩︎
Epoch mentions here, in their 'op-ed' style section, that "...most software progress might actually be due to data quality improvements (hence "algorithmic progress" may be a misnomer).". I haven't independently verified the claim, but it matches my impressions from other sources and from training CNNs back in the day. ↩︎
We do of course spend many thousands of human hours on data quality, but the scale is such that most of this work is at best done in a single pass, and is usually done programmatically with fairly blunt tools.
To give an example of data improvement from AIs, suppose we have some (very small) model read all of the pre-training data and label it in a way that lets you curriculum learn? In the simplest case, we could have that model output a score from 0 to 100 about how intelligent you need to be to understand the material, and then we could provide the training data in that order during pre-training. You could also do similar work with alignment relevant concepts, teaching the model how to assess good and evil before it sees too much 'possibly evil' data in training.
You could also have it score the data on its subjective "quality" (where 100 would look like a published book, and 0 would look like a random string of encrypted text), and throw out the junk.
Another interesting thing (more intelligent) models could do is try to 'straighten out' conflicting information -- either information they already know, or information they are reading -- and resolve the inconsistencies one way or another. This output could then be used to train future models more efficiently. ↩︎
How likely is the idea to be good? AIs might supersede humans here without necessarily being qualitatively smarter because they have the ability to read huge amounts of logs to find patterns, and the ability to remember what experiments have already been tried, and will at some point get better at reasoning over this huge amount of data than humans are. ↩︎
2026-03-20 11:19:31
Independent verification by the Brain Preservation Foundation and the Survival and Flourishing Fund — the results so far
Extraordinary claims require extraordinary evidence. In my previous post, "Less Dead", I said that my company, Nectome, has
created a new method for whole-body, whole-brain, human end-of-life preservation for the purpose of future revival. Our protocol is capable of preserving every synapse and every cell in the body with enough detail that current neuroscience says long-term memories are preserved. It's compatible with traditional funerals at room temperature and stable for hundreds of years at cold temperatures.
In this post, we’ll dive into the evidence for these claims, as well as Nectome’s overall approach to cultivating rigorous, independent validation of our methods—a cornerstone of the kind of preservation enterprise I want to be a part of.
To get to the current state-of-the-art required two major developmental milestones:
The rest of the post is dedicated to unpacking these results.
Five quick notes as we begin:
Ken Hayworth is a neuroscientist currently working at Janelia Research (part of HHMI, the Howard Hughes Medical Institute). In 2010, Ken started the Brain Preservation Foundation and launched the Brain Preservation Prize as a challenge to the neuroscience and cryonics community. He wanted to see researchers provide evidence that their preservation could work according to neuroscientifically reasonable standards.
As a connectomicist, Ken is used to looking at 3D models of brain tissue created with electron microscopy. These models are scanned from brains preserved with the kind of high-quality fixation that's been standard in neuroscience for many years. After much serious thought about neuroscience, Ken has come to the conclusion that this level of physical preservation is overwhelmingly likely to capture the information necessary to restore a person in the future, and I'm inclined to agree. Again, I'll get to this in an upcoming post.
But the electron micrographs coming from the cryonics community didn't look like what he normally saw in the lab. There was no 3D analysis, just single frames. Worse, the tissue was severely dehydrated, making it difficult or impossible to tell whether the tissue was traceable, that is, whether each synapse could be traced back to its originating neurons.
The images above are taken from the BPF's Accreditation page. The left image is what "typical" brain tissue looks like -- the kind that Ken and other neuroscientists are used to studying. The right image is a cryoprotected animal brain[1]. It looks more "swirly" because it's been dehydrated by cryoprotectants. Ken started the Brain Preservation Prize, in part, to challenge the cryonics community to produce images more like the one on the left, so they could better evaluate whether their preservation techniques worked.
To Ken and to me, this is an enormous issue. There are many ways a brain can be rendered untraceable, and comparatively few that preserve its structure. In the absence of evidence to the contrary, we have to default to the assumption that a brain is not traceable. That, in turn, calls into question whether the information preserved in the brain is adequate.
In addition to challenging the cryonics community, Ken wanted to extend a challenge to the neuroscience community. He hoped that, making use of their advanced protocols for preparing and analyzing brain tissue, they could design a technique to preserve people for later revival.
Ken was inspired by the successful Ansari X Prize to issue his challenge in the form of a prize. He raised $100,000 from a secret donor[2], and set out the prize rules: brains had to be preserved in a way that rendered them connectomically traceable, and had to be preserved so that they would very likely last for at least 100 years. There was a small version of the prize for a "small" mammal brain (think rabbit, mouse, or rat), and a "large" mammal brain (pig, sheep, etc) would win the whole thing.
I can't overstate how influential the Brain Preservation Prize has been in advancing the field of preservation research. That $100,000 inspired me to build my protocol and led to millions of dollars of investment in better preservation. I'd love to see more scientific prizes; I think they help young people in research labs justify spending resources on important projects they're passionate about. A young researcher, like me back in 2014, can go to her superior and say "it's not just a personal project, it's for this prize."
When I started seriously looking into preservation techniques, it seemed to me that cryonics and neuroscience had opposite problems. Neuroscientists could almost instantly preserve a brain using aldehydes[3], but didn't have a long-term strategy to keep that brain intact for a hundred years or more. Cryonicists, meanwhile, struggled to avoid damaging a brain when they perfused it with cryoprotectants, but knew how to cool a perfused brain to vitrification temperature and keep it there indefinitely.
The obvious solution was to combine the two methods. I could use fixation's remarkable ability to stabilize biological tissue, buying time to introduce cryoprotectants into the brain slowly enough to avoid the crushing damage caused by rapid dehydration. Then, it would be safe to vitrify the brain for long-term preservation.
It took me about nine months to iron out all the details. The most difficult part was figuring out how to get cryoprotectants past the blood-brain barrier: it turned out that even very extended perfusion times, on their own, are not adequate to prevent dehydration. Eventually, though, I got the technique to work on rabbits (the "small mammal" model I was using). Modifying the protocol to work for pigs took me a single day and worked on the first try. I published the results of that research, Aldehyde-Stabilized Cryopreservation, in Cryobiology, the first step towards winning the Brain Preservation Prize.
The next step towards the prize required direct verification by the BPF. If you're interested, you can read their full methodology here.
At this time, I was working at 21st Century Medicine. Ken Hayworth flew out to my location and joined me for a marathon three-day, dawn-to-dark session, during which I preserved, vitrified, rewarmed, and processed a rabbit and a pig. Whenever Ken wasn't personally observing the brain samples, he secured them with tamper-proof stickers to preserve the chain of custody. When I had finished preparing the samples for electron microscopy, Ken personally performed the cutting and imaging of the samples back at Janelia.
This was a level of rigor I'd never observed before, certainly far beyond the peer review for the Cryobiology paper. This is something I admire about Ken, and I was grateful for it here. Preservation is worth being rigorous about!
The BPF prepared images using high-resolution focused ion beam milling and scanning electron microscopy (FIB-SEM). This technology produces resolutions of up to 4 nanometers; Ken scanned the prize submissions at 8 nm and 16 nm isotropic resolution. Together with the 3D nature of the images, this is sufficient to examine a brain sample and determine whether the synapses (typically about 100 nm wide) are traceable.
Of course, imaging a whole brain is well beyond our current capabilities. Ken compensated for this by analyzing many samples, randomly chosen from different regions of the brains. The BPF released all of the images and the original 3D data files, and they're still available today. I've included the pig brains below – click through on the images to see youtube videos showing the 3D imaging in full. Each sample is from a brain that was preserved, vitrified, and rewarmed.
Ken Hayworth was joined on the BPF's judging panel by Sebastian Seung, a Princeton/MIT neuroscientist, author of the book I am my Connectome, and a major contributor to the FlyWire project. Together, they reviewed the 3D images, judged their quality, and traced neurons through the image stacks. In the end, they agreed that I had won the prize.
Relevant links:
I present this as evidence that it's possible to preserve large mammals brains in a traceable state, every synapse intact, and keep them stable for more than a hundred years (the 'hundred years' part we will address in a future post on the thermodynamics of preservation).[4]
But ASC is not the whole story, because it must be done pre-mortem. End-of-life laws throughout the world weren't designed with preservation of terminally ill clients, and don't allow ASC as an option. In order to create something workable, I had to either find a way to do preservation post-mortem, or work to incorporate ASC into end-of-life laws. I chose to make preservation work post-mortem.
Making preservation work in the real world turned out to be conceptually easy. The original protocol needs three modifications to work post-mortem.
My dad used to tell me a story of a biology professor he had in college. The first day of class, the professor had everyone open their textbook and read the first paragraph in one of the last chapters. The professor then told everyone that it had taken him 30 years to write that paragraph. I now better understand how that professor must have felt. It took me nine months to create ASC. It took me nine years to modify it to work in our current legal context and write those three modifications above.
I won't get into those nine years in this post. I do want to share an image, though, that I'm publishing here for the first time. As far as I know this is the best preserved whole human brain in the world, and it belongs to a 46-year-old man who died of ALS and chose to donate his body for scientific research. I perfused his body just 90 minutes post-mortem—much faster than typical emergency cryopreservation services, but well outside the twelve-minute ischemic window.
Electron micrograph from the best human preservation I've done to-date. ~90 minutes post-mortem time from a MAiD donation case. The large white space in the middle is a capillary. Here you can find substantial perivascular edema (the white area around the capillary), as well as neuropil that's concerningly indistinct. I asked Ken Hayworth to review these images; he does not think they're traceable. Additionally, some regions of this brain failed to perfuse entirely; this is from a well-perfused region.
It is the best-preserved whole human brain I’ve ever seen. It is also—like every other human brain I preserved with any appreciable post-mortem delay—not traceable. It's not a quality I (or the BPF) can accept. Looking at the degree of damage scares me.
I originally thought that humans might have a two-hour post-mortem preservation window. If that had been true, I would have probably worked to integrate preservation into hospices across the country. After reviewing the electron micrographs from animals and humans under various preservation conditions, it became clear that the hospice model was nonviable. We couldn't wait for a person to die on their own timeline and only then begin our procedure. We'd need them to undergo a full process involving Medical Aid in Dying (MAiD)—and before we could promise any benefits from such a process, we needed to perfect it on animals.
It took a lot of refinement and expert consultation, but eventually we pinned down the twelve-minute window and blood thinner through a series of experiments on rats. We then streamlined the procedure so it could be done in less than ten minutes on pig carcasses, and finally demonstrated excellent post-mortem preservation in a pig model. We've just recently published the results:
A 3D FIBSEM image of a pig brain preserved post-mortem. We were able to complete surgery in 4 minutes and 30 seconds, well within the critical twelve-minute window, and attained results that appear traceable. Additional results available as supplemental materials. Video linked below:
A H&E stained light microscopy image of a pig cerebellum preserved post-mortem. While the FIBSEM shows good nanostructural preservation, this much lower resolution image shows that a large area of brain is preserved well.
Figure from our preprint. H&E stained light microscopy images from a poorly-preserved brain and a well-preserved brain (E & F, respectively). Note the substantial white regions present in the poorly-preserved tissue on the left. This is strong evidence of inadequate perfusion and compromised preservation. The difference between these two images is only a few minutes delay in starting preservation.
About this time, I was chatting with Andrew Critch, cofounder of the Survival and Flourishing Fund (SFF). Born from Jaan Tallinn's philanthropic efforts, the SFF is dedicated to the long-term survival and flourishing of sentient life. They recommended $34MM of grants in 2025, including support for the AI Futures Project, Lightcone Infrastructure, and MIRI, among many others.
Andrew was interested in evaluating Nectome for an SFF grant. We talked it over and agreed on a third-party evaluation with real stakes: he'd travel to our lab in Vancouver, Washington to witness and evaluate a preservation first-hand, then bring the samples himself to an EM lab to scan them, and then ask a neuroscientist of his choice to review the sample quality. If he liked what he saw, he'd support our application to SFF's grants team. If we didn't live up to the quality we promised, he'd inform the team accordingly. (SFF uses a distributed grant-making process where each team member has a separate budget for making grant recommendations with substantial discretion.)
When Andrew arrived at our lab, we introduced him to our test rat[5], and he observed as I gave the test rat an injection of heparin (our blood thinner of choice), followed promptly by simulated medical aid-in-dying. He then timed us as I waited five minutes after the rat’s heart stopped, mimicking the time I would have spent performing surgery on a pig or a human.[6]
From there, we proceeded with the tedious 9-hour process: blood washout, fixation, and the slow ramp of cryoprotectants. Andrew watched from start to finish. It was late at night before the preservation was complete, and Andrew watched us remove the rat’s brain and perform a visual check for gross failures of perfusion. There were none.
At this point we could have simply placed the brain in cold storage and then handed off the tissue for further evaluation, but I wanted to demonstrate just how robust our current method is instead. I cut the brain into two hemispheres, put one in cold storage at -32°C (-26°F) as a demonstration of the effectiveness of the cryoprotectant at preventing ice formation, and put the other hemisphere in a laboratory oven at 60°C (140°F) overnight. Just as cold storage slows chemical processes, warmth accelerates them; twelve hours at 60°C is equivalent to, conservatively, a week at room temperature.
When we returned the next day, we sliced each hemisphere into paper-thin slices and Andrew spun up his quantum random number generator.[7] He used it to randomly select four slices from each hemisphere for analysis. We sent him home with an introduction to Berkeley's electron microscopy core facilities, which immediately started the week-long process of prepping the tissue for imaging including staining, resin embedding, and slicing into 90-nanometer sections.
After examining the electron micrographs and consulting with several neuroscientists, Andrew determined that our preservation was excellent, that the brain was connectomically traceable, and that both the "cold" and the "hot" slices were of near-identical preservation quality. He recommended us for a $550,000 investment, which we've since received.
We'd like to present this data to you as well. The overall dataset obtained from Berkeley was massive; a single image from one of our samples is around 5 GB and requires special software to view. I've prepared two representative images using deepzoom, here:
Sample from a rat brain preserved using Nectome’s methods, then stored at 60°C for 12 hours ("hot" storage). Electron microscopy performed at the Berkeley EM Core. Click here to see the complete dataset.
Sample from a rat brain preserved using Nectome’s methods, then stored at -32°C for 12 hours ("cold" storage). Electron microscopy performed at the Berkeley EM Core. Click here to see the complete dataset.
We'll be in the comments again for a few hours, ready to answer your questions. Our sale is still available. The next post, by popular demand, will be about how we can know whether preservation is good enough prior to actually restoring someone. I'll see you in the comments!
A single synapse from our rat brain demo, preserved after 5 minutes of ischemia and stored at 60°C for 12 hours. The dark curve is the junction between the two neurons. Those tiny grains at the bottom of the synapse are individual vesicles, still filled with neurotransmitter, suspended in place by fixation. The larger gray sphere near the vesicles is a mitochondria that helps power the synapse. You can see individual cytoskeletal details. The individual proteins are also still there, though they're not distinguishable at this level of resolution. This is what I mean by "subsynaptic" preservation.
Previous: Less Dead
Greg Fahy has recently released a preprint discussing cryoprotectant dehydration and some ways to reverse it in rabbit brains, check it out too!
This donor has since been revealed to be Saar Wilf.
Common choices are formaldehyde or glutaraldehyde.
ASC actually does better than preserving every synapse – it also retains virtually all proteins, nucleic acids, and lipids. I'll get into the evidence for that in a later post.
We nicknamed the rat Chandra. Andrew was sad about us experimenting on animals, and asked us if we'd try to help preserve and reanimate non-human animals in the future, and of course we said yes!
I've actually recorded a time of 4 minutes 30 seconds in pigs. But I like to leave myself a little wiggle room.
I've never met someone else who routinely uses QRNGs for their decisions :)
2026-03-20 11:04:47
Spinoza's Compendium of Hebrew Grammar (1677, posthumous, unfinished) contains a claim that scholars have been misreading for centuries. He says that all Hebrew words, except a few particles, are nouns. The standard scholarly reaction is that this is either a metaphysical imposition (projecting his monistic ontology onto grammar) or a terminological trick (defining "noun" so broadly it's vacuous). Both reactions wrongly import Greek and Latin grammatical categories and then treat those categories as the neutral baseline.
From Chapter 5 of the Compendium (Bloom translation, 1962):
"By a noun I understand a word by which we signify or indicate something that is understood. However, among things that are understood there can be either things and attributes of things, modes and relationships, or actions, and modes and relationships of actions."
And:
"For all Hebrew words, except for a few interjections and conjunctions and one or two particles, have the force and properties of nouns. Because the grammarians did not understand this they considered many words to be irregular which according to the usage of language are most regular."
The word "noun" here is nomen. It means "name." Spinoza is saying: almost every Hebrew word is a name for something understood. This includes names for actions, names for relationships, names for attributes. His taxonomy of intelligible content explicitly includes actions and modes of actions alongside things and attributes.
The obvious objection is: if "noun" covers actions as well as things, then the claim that "all words are nouns" is trivially true and does no work. Any content word names something intelligible; so what?
But this objection assumes that a useful grammar must draw a hard categorical line between nouns and verbs, and that Spinoza's refusal to draw it is therefore vacuous. That assumption is embedded in the Greek grammatical tradition; it is not a fact about Hebrew.
In Hebrew (and Arabic, Akkadian, and other Semitic languages), words are generated from consonantal roots—typically trilateral—by applying vowel patterns and affixes. The root כ-ת-ב generates katav (he wrote), kotev (one who writes), ktav (writing/script), mikhtav (letter), katvan (scribbler). The morphological operation is the same in every case: take the root, apply a pattern that describes the relation of the concept to the thing you are describing. For example, mikhtav is something that is made-written, a letter, much like the Arabic mameluke is someone who is made-owned, a slave. Whether the output functions as what a Greek grammarian would call a "noun" or a "verb" depends on which pattern you applied, not on some fundamentally different generative process.
This is not how Greek or Latin works. In those languages, nouns and verbs belong to largely separate inflectional systems (though they do have participles). Nouns decline for case and number; verbs conjugate for tense, aspect, mood, and person. A Greek speaker can usually tell from a word's form alone which category it belongs to. The noun/verb distinction corresponds to a real difference in morphological machinery.
In Hebrew, it doesn't. The grammarians who insisted on the distinction—both the rabbinical grammarians working in the Arabic tradition and the Christian Hebraists working from Latin—were forcing Hebrew into a framework designed for languages with a different structure. The result, as Spinoza observed, was that regular Hebrew forms got classified as irregular, because they didn't respect a boundary the language doesn't draw.
The Arabic grammatical tradition, which the medieval rabbinical Hebrew grammarians adopted wholesale, classifies words into three categories: ism (noun/name), fi'l (verb/action), and ḥarf (particle). Scholars have long noted the parallel between this trichotomy and Aristotle's division of speech into onoma (name), rhema (verb/predicate), and sundesmos (connective); Syriac scholars were important intermediaries in transmitting Greek linguistic thought to Arabic, though the degree of direct dependence remains debated. [1] The classification reached Hebrew grammar through two independent routes: Greek → Latin → Christian Hebraists, and Greek → Arabic → rabbinical grammarians. Both paths originate in Greek philosophy.
Judah ben David Hayyuj (c. 945–1000), the founder of scientific Hebrew grammar, applied Arabic grammatical theory to Hebrew, including the ism/fi'l/ḥarf trichotomy and the principle that all Hebrew roots are trilateral. [2] His technical terms were translations of Arabic grammatical terms. Jonah ibn Janah (c. 990–1055) extended this work, producing the first complete Hebrew grammar and drawing explicitly from the Arabic grammatical works of Sibawayh and al-Mubarrad. [3] When Spinoza complained that "the grammarians" misunderstood Hebrew, this is the tradition he was arguing against.
Aristotle's noun/verb distinction is not just a grammatical observation. It reflects his substance/predication ontology. The world consists of substances (things that exist independently) and predicates (things said about substances). A noun names a substance; a verb predicates something of it. The sentence "Socrates runs" has the structure: substance + predication. The grammar encodes the metaphysics.
Greek and similar languages have different pools of words for filling the grammatical roles of noun and verb. Hebrew has one pool of roots that supplies words for both roles, depending on the pattern applied. These aren't just two different ways of doing the same thing. They reflect different structural priorities.
The Indo-European system is built around assembling a scene: placing distinct actors into relationships with distinct actions. You need different building blocks for the actors and the actions because they play different structural roles in the scene. Who did what to whom, when, in what manner. Case endings on nouns tell you the role; verb conjugation tells you the temporal and modal frame. The grammar presupposes that the actor/action distinction is primitive.
The Semitic system works differently. Each root is a node in a flat graph of intelligibles. The graph doesn't recurse; roots refer to intelligible things, not to relations between other roots. And it doesn't privilege any type of node over any other, which is why the morphological system treats them all with the same machinery. It does not start by assigning one word the role of "the thing" and another the role of "what the thing does."
A sentence picks out some nodes from this graph, and casts them into some definite relation to each other. Their arrangement and patterns of modification describe the way in which these intelligibles are related: process, agent, result, instrument, quality, location.
When you take the Greek-derived framework and impose it on Hebrew, you're asking a flat graph of intelligibles to behave like a scene-assembly system. The spurious irregularities Spinoza complained about are projections of the friction from this mismatch.
The standard scholarly line is that Spinoza projected his philosophical commitments onto his grammar; that his monism (one substance, everything else is modes) motivated his claim that Hebrew has one part of speech with subcategories rather than two fundamentally different parts of speech. Harvey (2002) argues that the Compendium's linguistic categories parallel the conceptual categories of the Ethics. [4] Rozenberg (2025) goes further, claiming Spinoza "project[ed] the characteristics of Latin onto Hebrew" and thereby "neglected the dynamism of Hebrew." [5] Stracenski provides a more sympathetic reading but still frames the question as whether the Compendium serves the Ethics' metaphysics or the Tractatus' hermeneutics. [6]
This gets the direction of explanation backwards, or at least sideways. Spinoza was reading a Semitic language and describing how it actually generates words. The fact that his description aligns with his metaphysics may reflect a common cause: both the grammar and the metaphysics are what you get when you don't take the Aristotelian actor/action distinction as a primitive. Spinoza rejected Aristotle's substance/predicate ontology in the Ethics; he also noticed that Aristotle's noun/verb grammar didn't fit Hebrew.
Aristotle divides lexis into onoma, rhema, and sundesmos in Poetics 1456b–1457a. Farina documents how this tripartite scheme reached Arabic grammar via Syriac translations of Aristotle's Organon, with Syriac Christians serving as intermediaries between Greek and Arabic linguistic thought. See Margherita Farina, "The interactions between the Syriac, Arabic and Greek traditions," International Encyclopedia of Language and Linguistics, Third Edition, 2025. The question of whether Sibawayh's ism/fi'l/ḥarf directly derives from Aristotle or represents independent development remains actively debated; the structural parallels are clear even if the exact transmission pathway is contested. ↩︎
On Hayyuj's application of Arabic grammar to Hebrew and his establishment of the trilateral root principle, see the Jewish Encyclopedia entry on "Root". His Wikipedia biography notes that "the technical terms still employed in current Hebrew grammars are most of them simply translations of the Arabic terms employed by Hayyuj." ↩︎
Ibn Janah's Kitab al-Luma was the first complete Hebrew grammar. It drew from Arabic grammatical works including those of Sibawayh and al-Mubarrad. See also the Jewish Virtual Library entry on Hebrew linguistic literature. ↩︎
Warren Zev Harvey, "Spinoza's Metaphysical Hebraism," in Heidi M. Ravven and Lenn E. Goodman, eds., Jewish Themes in Spinoza's Philosophy (Albany: SUNY Press, 2002), 107–114. ↩︎
Jacques J. Rozenberg, "Spinoza's Compendium: Between Hebrew and Latin Grammars of the Middle Ages and the Renaissance, Verbs versus Nouns," International Philosophical Quarterly, online first, October 26, 2025, DOI: 10.5840/ipq20251024258. ↩︎
Inja Stracenski, "Spinoza's Compendium of the Grammar of the Hebrew Language," Parrhesia 32. Stracenski notes the divide between historical approaches (Klijnsmit, placing Spinoza within Jewish grammatical tradition) and philosophical approaches (Harvey, connecting the Compendium to the Ethics). See also Guadalupe González Diéguez's companion chapter in A Companion to Spinoza (Wiley, 2021) and Steven Nadler, "Aliquid remanet: What Are We to Do with Spinoza's Compendium of Hebrew Grammar?" Journal of the History of Philosophy 56, no. 1 (2018): 155–167. ↩︎
2026-03-20 08:42:57
Sometimes people say things like "If the humans and AIs have linear utility in resources, then their interactions are zero-sum". Here, "linear utility in resources" typically means something like: "Supposing AIs already control all the galaxies, then they'd accept a bet with a 60% chance of gaining one more galaxy and a 40% chance of losing one."
I think this is too hasty.
I'll list several reasons why interactions can be positive-sum even when both humans and AIs have such preferences. I've ordered these from most important to least important.
1. Epistemic public goods. Humans and AIs both benefit from learning true things about the universe — knowledge will have value to almost any agent regardless of its terminal goals. Hence, if both parties need to invest resources in acquiring knowledge, they can share the costs and both come out ahead. This includes basic mathematics and science. But I also expect this will include very expensive simulations of counterfactual worlds.
2. Security public goods. Some expenditures protect everyone, e.g., both humans and AIs might need to spend resources ensuring their galaxies don't trigger false vacuum decay in neighbouring regions. Or defending against external threats (hostile aliens, natural disasters, exogenous risks).
3. Common values. It's possible that human utility is something like hedonium+a⋅X and AI utility is something like paperclips+b⋅X, where X is some component of value that both parties share. Then, if producing hedonium, paperclips, and X is linear in resources, then both humans and AIs have linear utility in resources. However, the game isn't zero-sum, because the X-component means total welfare increases when both parties spend more resources on the common value.
4. Different marginal rates of substitution across resource types. "Linear returns to resources" elides the fact that there are many kinds of resources. Humans and AIs might differ in how much they value proximal galaxies versus distal ones (e.g. because of time discounting), even though each party's utility function u(proximal, distal) is linear. If humans value proximal galaxies relatively more and AIs value distal galaxies relatively more, then trading proximal-for-distal makes both parties better off.
5. Complementarities in production. If humans love hedonium and AIs love paperclips, maybe we can build paperclips out of hedonium (or vice versa). I expect this kind of direct complementarity to be rare in practice, because of "tails come apart" reasoning — removing the looks-like-a-paperclip constraint probably more than doubles the hedonium you can extract from the same resources. But less extreme versions of production complementarities might exist.
6. Comparative advantage. Even when both parties are linear in the same single resource, if they differ in their relative productivity across tasks, there are gains from specialisation and trade. This is why most transactions between humans are positive-sum despite each transaction being so small that both parties' utility functions are approximately linear over the relevant range.
I'm unsure whether comparative advantage persists at cosmological timescales, where both parties are optimising with mature technology. But it plausibly matters during the transition period.
7. Gains from trade under uncertainty. If humans and AIs have different beliefs, they can make bets that both parties expect to gain from. These bets are ex post zero-sum but not ex ante zero-sum. As long as the parties disagree about probabilities, both can expect to gain from the same wager.
Thanks to Alexa Pan for discussion, inspired by Lukas Finnveden's Notes on cooperating with unaligned AIs.
2026-03-20 07:43:23
In the last two weeks, social media was set abuzz by claims that scientists had succeeded in uploading a fruit fly. It started with a video released by the startup Eon Systems, a company that wants to create “Brain emulation so humans can flourish in a world with superintelligence.”
On the left of the video, a virtual fly walks around in a sandpit looking for pieces of banana to eat, occasionally pausing to groom itself along the way. On the right is a dancing constellation of dots resembling the fruit fly brain, set above the caption ‘simultaneous brain emulation’.
At first glance, this appears astounding - a digitally recreated animal living its life inside a computer. And indeed, this impression was seemingly confirmed when, a couple of days after the video’s initial release on X by cofounder Alex Wissner-Gross, Eon’s CEO Michael Andregg explicitly posted “We’ve uploaded a fruit fly”.
Yet “extraordinary claims require extraordinary evidence, not just cool visuals”, as one neuroscientist put it in response to Andregg’s post. If Eon had indeed succeeded in uploading a fly - a goal previously thought to be likely decades away according to much of the fly neuroscience community - they’d need more than a video to prove it.
Did the upload show evidence of known neurophysiological markers of working memory, such as the head-direction ring attractor bump? How did their brain model actually control the virtual fly body, given it seemed to lack a modeled spinal cord? Where was the data and the write-up?
Because if Eon couldn’t back up what their video seemed to show, at least some neuroscientists were going to be markedly less than impressed:
Eon did follow up with a blog post - How the Eon Team Produced A Virtual Embodied Fly - detailing how they combined pre-existing models of the fly brain and body into a system that could respond to virtual environmental cues. But for the neuroscientists scrutinising the uploading claim, these details only sharpened their objections - so much so that some are accusing Eon of misleading conduct and gross misrepresentation.
To understand just why these scientists are so upset, you need a bit of context.
The fruit fly Drosophila melanogaster has been a workhorse of neuroscience for decades; its brain small enough to be tractable but complex enough to produce genuinely interesting behaviour such as learning, navigation, decision-making, and courtship. A long-running ambition within the community has been to map the complete wiring diagram - a ‘connectome’ - of that brain, and in October 2024, after years of incremental progress, the FlyWire Consortium achieved it: a complete connectome of the adult fly brain, documenting all 139,255 neurons and over 50 million synaptic connections.
These increasingly complete connectomes have enabled the creation of increasingly elaborate computational models. In 2024, Shiu et al. published a model of the entire adult fly brain in which every neuron and neural connection was represented, albeit in highly simplified form (ignoring differences in cell shape, neurotransmitter dynamics, and much else). Despite these simplifications, the model could predict which neurons activate in response to sensory stimuli and identify pathways underlying behaviors like feeding and grooming, a striking demonstration that wiring alone carries substantial information about function. Separately, Lappalainen et al. built a ‘connectome-constrained’ model of the fly’s visual system, whose predictions matched real neural recordings across dozens of experiments.
Meanwhile, other researchers had built NeuroMechFly, a biomechanical simulation of the adult fly body based on micro-CT scans of real anatomy. Updated to a second version in late 2024, the new virtual fly body could walk, groom, or be trained via reinforcement learning to navigate through virtual environments. Crucially, it could also be reprogrammed to be driven by any other kind of external controller.
One of the videos in the NeuroMechFly v2 publication, demonstrating a ‘hierarchical sensorimotor task in [a] closed loop’. There’s no connectome involved here, yet it is still remarkably similar behavior to the Eon demo.
By early 2025, the pieces Eon needed for their demo were largely in place: a complete brain connectome, computational models of both the central brain and the visual system, and a detailed biomechanical body model. All that remained was to wire them together.
Eon took the pre-existing components we just described - the Shiu et al. brain model and the NeuroMechFly v2 body - and connected them together into a closed loop: sensory events in a virtual world feed into the brain model, and selected outputs from the brain model direct the virtual body.
The loop has four steps. First, something happens in the virtual environment - the fly’s leg contacts a sugar source, or dust accumulates on its antennae - and these events activate specific sensory neurons in the brain model. Second, the brain model runs for a 15-millisecond time step, propagating activity through the connectome’s ~140,000 simplified digital neurons. Third, Eon reads out the activity of a small, hand-picked set of descending neurons and translates it into high-level commands - turn left, walk forward, groom, feed - that are passed to pre-trained motor controllers in the body model. Fourth, the body moves, changing what the fly senses, and the loop repeats.
The result is the video that went viral. But the behaviors on screen are less impressive than they appear, because the brain model is doing far less of the work than a viewer would naturally assume.
Take the walking. The brain model does not orchestrate the fly’s legs. It doesn’t compute the gait cycle, coordinate the six limbs, or position the joints. It activates a few descending neurons - oDN1 for forward velocity, DNa01/DNa02 for steering - and hands that signal off to a locomotion controller within NeuroMechFly that already knows how to walk. The brain is issuing something like a “go forward” or “turn left” instruction; the body model handles everything else. In a biological fly, the detailed work of translating such commands into coordinated leg movements is performed by ~15,000 neurons in the ventral nerve cord (the fly’s equivalent of a spinal cord), none of which are simulated here. The same applies to grooming: the connectome selects the behavior, but NeuroMechFly’s controllers execute it.
In their blog post, Eon are open about this. They compare the descending neurons to a car’s steering wheel, accelerator, and brake - you can predict what the car will do from these controls “without explicitly simulating every combustion event inside the engine.” They also acknowledge that the visual system activity displayed so prominently in the video - derived from the Lappalainen model - is “somewhat decorative” and does not substantially drive behavior. They do note that the brain-body mappings are in some cases “somewhat arbitrarily chosen by hand.” And they explicitly state the work “should not yet be interpreted as a proof that structure alone is sufficient to recover the entire behavioral repertoire of the fly.”
This is fair enough, and their efforts to connect brain and body models are genuinely useful engineering. If Eon had described this as “the first integration of connectome-constrained brain and body models into a closed sensorimotor loop”, nobody in the fly neuroscience community would have objected.
But they didn’t say that. They said “We’ve uploaded a fruit fly.” Transparency in a blog post that few will read doesn’t undo a headline that millions saw. The typical person who encounters a claim on X, watches the video, and sees a fly walking, grooming, and feeding while a digital brain flickers alongside it is probably not going to think “a simplified brain model is selecting from a small menu of pre-programmed behaviors via a hand-tuned interface.” They’re likely to think the fly has been faithfully recreated inside a computer.
It hasn’t. Eon’s virtual fly implements only a handful of behaviors, and those rely heavily on NeuroMechFly’s pre-trained controllers rather than on the connectome. This is the most fundamental problem with the demo as evidence of an upload: because the body model already knows how to walk, groom, and feed, almost any signal that triggers the right controller at the right time will produce fly-like behavior on screen. You could replace the connectome with a simple rule-based script - if dust, groom; if sugar, feed; otherwise, walk forward - and the resulting video would look much the same. The fly-like behavior the viewer sees is a product of the body model, not the brain. The digitized connectome may be producing meaningful internal dynamics, but this demo cannot tell us whether it is.
So if what Eon built isn’t an upload, what would be?
The word ‘upload’ carries a claim that ‘model’ and ‘simulation’ do not. When one says they’ve modeled or simulated a fly, they’re saying they’ve captured some elements of the original insect’s behaviour, but with significant simplifications and assumptions. If instead they say they’ve uploaded a fly, they’re making a claim about the fly itself: that its identity has been faithfully transferred into a new medium, that the thing in the computer in some sense is the fly, just running on a different substrate. When you upload a photo, the file on your computer is the photo. Nobody says “I’ve partially uploaded this photo” to mean “I’ve made a rough sketch inspired by it.”
An uploaded fly, then, should be able to do everything the original fly could do. It should be playable forward in time indefinitely, responding to novel situations as the original would have. It should serve as a faithful proxy for the real thing; so much so that a neuroscientist could peer inside, observe realistic equivalents of neurophysiology, and run experiments that would be impractical or impossible on a biological fly, with confidence that the results would generalise back.
The leading proposal for how to actually achieve this is whole brain emulation: faithfully recreating the brain’s causal mechanisms at whatever level of detail turns out to be necessary so that the digital system behaves identically to the original. This is what distinguishes emulation from simulation. A weather simulation is useful - it can predict next week’s temperature with reasonable accuracy - but it breaks down when pushed further out, because its approximations are coarser than the actual atmospheric processes of real weather. In contrast, one can run an emulation of the Nintendo 64 game Banjo-Kazooie on a laptop, and because the emulator faithfully recreates the logic of the N64’s hardware - the processor, the memory, the graphics pipeline - the game will never fail to behave as it would have on the original console.
It’s currently an open scientific question what level of biological detail an emulation needs to capture. It’s unlikely we’d need to simulate every ion channel, and perhaps much of the brain’s physiology could be simplified with no consequence. But the key feature of the emulation approach is the guarantee: if you’ve faithfully recreated the causal mechanisms down to the necessary level, the resulting behaviour is trustworthy by construction. Low-fidelity approaches might produce correct-looking behavior in some cases, but it’s hard to tell to what degree this will generalise to novel situations.
In response to this line of criticism, Michael Andregg has argued that uploading shouldn’t be considered so binary. “I don’t think of uploading as a binary concept” he told The Verge, outlining “different levels” of upload. By this logic then, Eon’s system - containing connectome-derived elements driving behavior in a virtual body - might qualify as a ‘partial upload’.
But if a connectome-constrained model can count as a ‘partial upload’, then the Shiu et al., brain model was already a partial upload before Eon touched it. So was the Lappalainen visual model. So, for that matter, is any computational neuroscience model that incorporates anatomical connectivity data. The word ‘upload’ loses its distinctive meaning, and the field loses its ability to communicate what it is actually trying to achieve and how far away a true fly upload still is.
When the vocabulary of breakthroughs is spent on incremental demos, the actual breakthroughs are cheapened when they arrive. Funders and the public lose the ability to distinguish genuine milestones from slick demos, and investment flows towards groups making the boldest claims rather than those doing the most foundational work. Worse, for a field that is struggling to graduate from science fiction to serious research, premature claims risk triggering the cycle of hype and disillusionment that has set back other ambitious programs before.
To be fair, we’re not unsympathetic to why Eon used the language they did. Their careful blog post on ‘How the Eon Team Produced a Virtual Embodied Fly’ would likely have only been read by a few hundred neuroscientists, while “We’ve uploaded a fruit fly” reached millions. Startup survival requires investment, funding follows excitement, and excitement follows headlines - not careful caveats. This bold approach may even feel obligatory when an organisation’s stated mission is “solving brain emulation as an engineering sprint, not a decades-long research program.”
But the history of science - and the gap between what Eon demonstrated and what uploading actually requires - suggests that there is likely no shortcut through the long slog ahead.
Because in all probability, before anyone can truthfully claim to have uploaded a fly, there will still need to be years more of tedious work. Countless painstaking patch clamping experiments of carefully guiding a glass electrode into a single neuron while keeping it alive, just to learn how that one cell type, out of the fly brain’s thousands, transforms its inputs into outputs. Endless sessions of pinning flies under two-photon microscopes, collecting calcium imaging data while the animals walk or groom or navigate an odor plume, slowly building up ground-truth measurements of what real brain activity actually looks like during real behavior. Thousands of hours still to come of building computational models, testing them against that data, failing, and refining them again.
Then, and very likely only then, will there come a day when someone will hit ‘run’, and a fly - disoriented in whatever way a fly can be, having been sitting in a vial a moment ago - will find itself somewhere unfamiliar. It won’t know that in the intervening time, it had been anesthetised, embedded in resin, and its brain sliced into thousands of thin sections. It won’t know that those sections were painstakingly imaged, or that its neural architecture was reconstructed from those images, or that thousands of its fellow flies were studied and sacrificed to fill in what images alone couldn’t tell us. It won’t know of the billions of dollars and thousands of careers that it took to reach this point, or the millions of hours spent staring down microscopes, handling vials, and debugging code. It will certainly never know that it was once made of proteins and cells, and is now made of silicon and mathematics.
It will just beat its wings, lift off, and search for fruit.
2026-03-20 07:10:46
I think the community underinvests in the exploration of extremely-low-competence AGI/ASI failure modes and explain why.
There is a sufficient level of civilizational insanity overall and a nice empirical track record in the field of AI itself which is eloquent about its safety culure. For example:
All these things sound extremely dumb, and yet, they are, to my best knowledge, true.
Eliezer has been pointing at this general cluster of failures for years, though from a different angle. His Death with Dignity post and of course AGI Ruin paint some parts of the picture in which AGI alignment is going to be addressed in a very undignified manner. So, the idea is definitely not new, and yet.
Many existing scenarios are high quality, interesting and actually may easily be more likely and realistic than extremely low-competence scenarios. In particular, I am talking about famous pieces like AI 2027, It Looks Like You're Trying To Take Over The World, How AI Takeover Might Happen in 2 Years, Scale Was All We Needed, At First, How an AI company CEO could quietly take over the world.
It's just it seems we don't have extremely low-competence scenarios at all, although they are not negligibly improbable.
The scenarious which start to focus to some extent on the low-competence area are What failure looks like by Christiano and What Multipolar Failure Looks Like by Critch, yet even they don't treat it as a big explicit domain.
Across these otherwise very different vibes (hard-takeoff Clippy horror, bureaucratic AI 2027 doom, multipolar economic drift, CEO-as-shogun power capture), the stories repeatedly converge on a small set of motifs: stealth through normality, exploitation of real-world bottlenecks by routing around them socially, replication and parallelization as the decisive advantage, bio or nanotech as a late-game cleanup tool.
They serve a just educational and modelling cause, and it may indeed be the case that significantly superhuman competence is needed to successfully execute a full takeover against a humanity. But many of them, in my view, look more like they are trying to persuade a reader who is skeptical about AI takeover if humans act competently, rather than trying to deliver a realistic scenario in which humans are not that smart, because in reality, they are not.
As a result, the implicit adversary in most of these stories has to be very capable because the implicit defender is assumed to be at least somewhat functional. The scenarios are answering the question "could a sufficiently intelligent AI beat a reasonably competent civilization?" rather than the question "could a moderately intelligent AI cause catastrophic harm in a civilization that is demonstrably bad at responding to novel technological threats?"
John Wentworth, in his post The Case Against AI Control Research, argues that the median doom path goes through slop rather than scheming. In his framing, the big failure mode of early transformative AGI is that it does not actually solve the alignment problems of stronger AI, and if early AGI makes us think we can handle stronger AI, that is a central path by which we die.
Wentworth's argument maps two main failure channels: (1) intentional scheming by a deceptive AGI, and (2) slop where the problem is simply too hard to verify and we convince ourselves we have solved it when we have not. I want to point at a third channel: moderately superhuman AIs that are not particularly capable of doing anything singularity-level but are still capable of defeating humanity because of humanity's incompetence.
These AIs are not producing slop. "It ain't much, but it's honest work," they say, as they cooperate with human sympathizers on the development of a supervirus. The research goes slowly, it requires extensive experimentation, to some extent the process is even being documented in public blog posts or on forums, but no one particularly cares, or rather, the people who care lack the institutional power to do anything about it, and the people who have institutional power are busy with other things, or have been convinced by interested parties that the concern is overblown, or are themselves collaborating.
This is, to some degree, what Andrew Critch describes in "What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)": a world where no single system does a theatrical betrayal, but competitive automation yields an interlocking production web where each subsystem is locally "acceptable" to deploy, governance falls behind the speed and opacity of machine-mediated commerce, and the system's implicit objective gradually becomes alien to human survival. The difference in my framing is that the AIs in question do not need to be particularly alien or incomprehensible in their goals. They may have straightforwardly bad goals that are recognizable as bad, and they may be pursuing those goals through channels that are recognizable as dangerous, and the response may still be inadequate.
It is also somewhat similar to what is depicted in "A Country of Alien Idiots in a Datacenter", again with one important difference: although the AIs in my scenario are not particularly supersmart, they are definitely not idiots either. They are, let us say, slightly-above-human-level in relevant domains, capable of doing cool novel scientific work but not capable of the kind of rapid recursive self-improvement or decisive strategic advantage that most takeover scenarios assume. They are the kind of system that, in a competent civilization, would be caught and contained. In the actual civilization we live in, they may not be.
In other words: we do not need to posit 4D chess when ordinary chess is sufficient against an opponent who keeps forgetting the rules.
As examples, I am talking about such things:
I do agree that this kind of work looks a bit unserious, but that is precisely why I am pointing at this. It would be a shame, and a historically very recognizable kind of shame, if this threat model turned out to be real and no one had worked on it because it seemed ridiculous.
Or, to frame it more playfully: imagine a timeline like the one in the "Survival without Dignity", where humanity lurches through the AI transition via a series of absurd compromises, implausible cultural shifts, and situations that no serious forecaster would have put in their model because they would have seemed too silly. Except imagine that timeline without the extreme luck that happens to keep everyone alive. Survival without Dignity is a comedy in which everything goes wrong in unexpected ways and people muddle through regardless. My concern is that the realistic scenario is the same comedy, minus the happy ending.
My goal in this post is rather to discuss the state of reality than what to do with that reality. That said, I envision at least several immediate implications:
I welcome thinking about implications in more detail, as well as developing specific scenarios.
Note: all of this is by no means an argument against singularity-stuff galaxy-brain ASI threats. I believe they are super real and they are going to kill us if we survive until then.