When cryoprotectants are perfused through the blood vessels in the brain, they cannot cross the blood-brain barrier as fast as water can move in the opposite direction. And cryoprotectants generally have a much higher osmotic concentration than the typical blood plasma. For example, the cryoprotectant solution M22 has an osmotic concentration around 100 times higher.
As a result, in a successful cryoprotectant perfusion (without fixatives), water rushes out of the tissue into the blood vessels, the tissue dehydrates, and you end up with a shrunken brain that is visibly pulled away from the skull. The brain weight goes down by 50% or more. This is currently considered a good sign of cryoprotectant perfusion quality.
Instead, the key question is what this severe dehydration does to the nanometer-scale structures in the brain, such as the connections between neurons, that are thought to be the key parts of the information that encodes long-term memories.
Previous attempts at imaging this type of brain tissue were stymied because the severe dehydration made the tissue look unrecognizable. Synapses could be seen, but it wasn’t possible to clearly identify individual neurites or trace them to see whether the connectome is intact:
The authors are not very enthused about freezing-based approaches. They show electron micrographs of rabbit brains that were perfused with glycerol and slowly frozen, which have large ice cavities that grossly distort the tissue:
Their position seems to be that vitrification — i.e., cooling without ice formation — is the only serious path forward for brain cryopreservation. But vitrification requires especially high concentrations of cryoprotectants, which causes the severe dehydration that has made it difficult to assess whether the connectome is preserved.
Rabbit experiments
They performed two types of rabbit experiments.
In the first group, rabbit brains were perfused with M22 for 60 minutes, vitrified, rewarmed, and then fixed with a solution that still contains the same concentration of cryoprotectant. This shows us what the tissue looks like in its shrunken state. At high magnification they can identify synapses, mitochondria, and what appears to be some morphologically intact cell membranes. But everything is compressed, making it difficult to clearly distinguish or trace individual cellular processes. These look similar to the previous images of vitrified brain tissue.
In the second group, rabbit brains were perfused with M22 and then gradually diluted back to a lower concentration of cryoprotectant before fixation — to 5M, 3M, or 1M. This tests the extent to which the shrinkage is reversible.
Although notably, it seems to me that most of the shrinkage is expected to occur between 0 and ~3M concentration of cryoprotectant. So my understanding is that diluting from full M22 back to 3M or 5M wouldn’t be expected to reverse most of the dehydration, although I might be wrong about that:
Anyway, they found that at 5M, the tissue is still quite shrunken. After reversal to 3M, though, things look considerably better, as the neurons have more normal-looking apical axons, and synapses with visible presynaptic vesicles can be identified. This is probably their best-looking EM data. However, there are various forms of damage, such as the wavy intracellular white spaces, that I don’t understand the cause of. Also, there are some places in the zoomed-in image (C) where I can’t really tell whether I am looking at two smaller processes or one larger one:
When they reverse to 1M (Fig 10), things go a bit wrong. While they can see nicely preserved synapses with clear presynaptic vesicles, they also see what they call “exploded” neurons, which they attribute to osmotic damage from removing the cryoprotectant too quickly.
They report that this problem is solvable, by using osmolytes to counterbalance the intracellular cryoprotectant during washout. They report that this can prevent this ultrastructural damage “even when all M22 is washed out of the brain.” But this data is not presented in the paper. It’s cited as “Spindler et al., in preparation.”
Human data
The human data comes from a single brain, that of a 73-year-old terminal cancer patient who donated biopsy samples of his brain for this research. His brain had significant ischemic injury before preservation even began, consisting of two days of agonal hypoxia before legal death, then three hours of cold ischemia before perfusion started. That’s important because it’s actually an example of realistic brain preservation conditions.
Cortical biopsies were taken after whole-brain M22 perfusion, plunged into liquid nitrogen, and stored for four years. They were then warmed and processed in different ways.
Some biopsies were warmed straight into fixative containing M22, showing us the fully dehydrated state. As expected, it is severely shrunken and electron-dense, but without obvious ice damage. On electron microscopy, they can identify synapses and some intact-appearing membranes.
Other biopsies of the human brain were rewarmed into diluted M22 (75% or 66%) before fixation. Using light microscopy, they report that this partial rehydration caused cells to regain their general characteristic shapes, with neurites visible.
However, the rehydrated human tissue was only examined by light microscopy, not electron microscopy.
One of the main concerns with vitrification has long been that the severe dehydration might be masking damage to the structure of the brain. For example, if the cell membranes are broken apart or there are areas of washed out structure due to ischemia and rapid osmotic shifts during perfusion, we might not be able to see it when everything is compressed together.
Because the electron microscopy that they did show from the human brain tissue is still so compacted, we can’t really evaluate for the presence or absence of such damage yet with much certainty.
In ideal laboratory animal settings (rabbits), the key result is that they show partial reversibility (to 3M) with improved electron microscopy quality, but still without enough reversibility to see clearly traceable processes across the 2d image. They report that complete washout is actually possible with a new method that they developed, which is fantastic news. But this rests on unpublished work that we will have to wait to see in the future.
Ideally, they would show in this future work that they can reliably trace the connectome across randomly sampled areas of the brain, which would allow the pure vitrification approach to reach parity with aldehyde-based methods in ideal laboratory animal settings, and shift the debate to which method is the best at structural preservation in realistic settings.
In the single human case, they show that perfusion-based vitrification is feasible in at least some parts of the brain even hours after legal death, and that cells regain their general characteristic shapes after partial rehydration. But the rehydrated human tissue was only examined by light microscopy, not electron microscopy, so we can’t tell whether the connectome is likely to be traceable in the local area where this biopsy sample was taken from.
This paper is clearly a step forward and a very important contribution to the brain preservation literature. I would like to personally thank the authors for their important work, and also for explaining how this type of method could potentially be used for medical time travel, which is a premise I totally agree with. Not surprisingly for a single paper, it alone has not resolved the key uncertainties about vitrification-based brain preservation.
This is not a call for espionage, but an analysis of another strategy
Von Neumann's strategy for solving the problem of global nuclear weapons proliferation is widely known - strike tomorrow. That is, conquer the entire world by exploiting that brief window when only one side possesses nuclear weapons. This idea is popular among American readers, partly because personal interests for the US correlate with this strategy: It would be good for the world and for us. (I will not discuss here whether von Neumann actually asserted this or developed this strategy in detail - there are doubts - nor how feasible it was given that the USSR would have launched a conventional attack in Europe in response, meaning the bulk of nuclear strikes would have fallen on Western Europe against the advancing armies - nor that the US lacked precise information about whether the USSR had an atomic bomb - the USSR claimed it had one since 1947, but many believed it wouldn't until 1960, meaning there was time for a von Neumann attack - and finally that before 1949, the number of atomic bombs in US possession might have been insufficient to reliably halt the Soviet nuclear project).
My point is that an alternative project for solving the nuclear weapons problem was operating in parallel. This was the effort of the Rosenbergs and several others to transfer nuclear secrets to the USSR as quickly as possible, so that both sides would be equal and a balance would exist between them. We know this strategy worked for nearly 80 years without nuclear war. (There were other motives too, like sympathy for communism, but we're simplifying.)
Both of these strategies are applicable to the AI race.
The von Neumann strategy involves creating American AI as quickly as possible to outpace China (as well as creating Grok AI to outpace OpenAI, etc.)
The Rosenberg strategy assumes that defectors will share AI secrets between AI companies, thereby reducing any single AI company's advantage over others, resulting in everyone reaching AGI level simultaneously, and consequently the world having multiple AIs rather than one paperclip maximizer.
Since multiple AIs would have more diverse goals, there's a greater chance that at least one of them would be relatively aligned with humanity's goals. Second, if there are multiple AIs, they will compete more for human attention and approval and will need to demonstrate their trustworthiness to each other more. Thus, they will care more about human values. If one of them starts killing people on its territory, others will see that it is a defector toward its creators.
If the N-strategy leads to one AI's victory and the inevitable death of everyone, then the R-strategy is more unpredictable and offers a chance of survival, though we cannot say exactly how this would happen.
The R-strategy is much simpler and cheaper, since data exchange and employee movement happens constantly between companies, Twitter buzzes with ideas, and GitHub is full of secrets longing to be heard. The moat is constantly eroding. That is, I'm not calling for industrial espionage here, but rather want to draw attention to the forces that level the playing field of achievements between companies.
The R-strategy makes sense only if we are confident that the first AI will certainly destroy us. Then we exchange inevitable death for a vague probability of surviving in chaos. Conversely, if we believed that creating a friendly AI that would be first and only was quite likely, then the R-strategy would be a major mistake.
Finally, the R-strategy is local, meaning it relies on local actions of individuals (and is subject to the unilateralist's curse). The N-strategy also starts as local, but at the company level rather than the individual level. The N-strategy ultimately becomes global, as it implies world domination.
Large language models (LLMs) are increasingly trained in long-horizon, multi-agent environments, making it difficult to understand how behavior changes over training. We apply pretrained SAEs, alongside LLM-summarizer methods, to analyze reinforcement learning training runs from Full-Press Diplomacy, a long-horizon multi-player strategy game. We introduce Meta-Autointerp, a method for grouping SAE features into interpretable hypotheses about training dynamics. We discover SAE-based analysis finds fine-grained behaviors including role-playing patterns, degenerate outputs, and language switching, while LLM-summarizer captures environment-specific bugs and strategic behaviors. We validate discovered features through automated evaluation, two human user studies, and add them to an untrained agent's system prompt, improving performance by +14.2%. Overall, we show that SAEs and LLM-summarizer provide complementary views into agent behavior, and together our framework forms a practical toolkit for interpreting long-horizon multi-agent LLM training.
Blog Post
We run Sparse Autoencoders on 114GB of Reinforcement Learning training trajectories from the popular multi-player strategy game Diplomacy, showing for the first time the potential downstream applications of data-centric interpretability techniques
What are the AIs doing when no one is watching? Current large-scale training runs can produce hundreds of millions or billions of tokens, with production AI deployments in the trillions. Human oversight of all AI outputs is becoming increasingly unfeasible. Common approaches to solving this problem include summarizing the logs, or using LLM as a judge with rubrics. The problem is these approaches are expensive, prone to hallucination, and can only attend to a small set of features you already know how to look for.
In our paper, we tested a novel approach: Using Sparse Autoencoders (SAEs) to collect feature activations on each token and generate hypotheses of what features changed most over training and are correlated with better performance. We ran Gemma 27B with gemma-scope-2 layer_31_width_262k_l0_medium over 1800 trajectories (114GB in total) from two 25 batch training runs (one successful and one failed) in the game Diplomacy, a multi-agent long-horizon strategy game.
Sparse Autoencoders
A Sparse Autoencoder (SAE) is a model that takes intermediate calculations from a language model (activations) and expands them to a higher dimension (for example, vectors of size 5376 to 262k). The idea is every entry in the expanded vector represents a single, human interpretable concept, for instance "dominance", or "napoleon." If we run this over text, we now have a machine that can label exactly "how much" of a concept each token contains, up to 262k concepts at once.
Pipelines
We used two pipelines to generate hypotheses: An LLM summarization pipeline and SAE pipeline. Unless otherwise specified, we used a canonical set of 1800 trajectories for each experiment; the first 6 trajectories from each group, the first 6 groups from each batch, and the first 25 batches across two runs.
LLM Summarization
We conducted a two stage hierarchical summarization pipeline on the canonical set. We first summarized each trajectory from around 50k tokens to 10k, preserving phase and tool call information. We then grouped the trajectory summaries by batch, summarizing each group of 36 summaries into one batch summary around 10k tokens. Finally, we used an LLM with a rubric to surface hypotheses across the 50 batch level summaries.
SAE Pipeline
We used gemma-scope-2-27b-it-res, layer_31_width_262k_l0_medium, and Gemma 3 27b it for all our main experiments. We chose this SAE due to the recommendation of the original authors of gemmascope 2, empirical performance, and the availability of explanations on neuronpedia. We first tokenized each trajectory and generated role masks (each token is either a user, assistant, or tool token). We then generated activations for each trajectory, saving the top 250 activating features per token, for a total of 6029159605 activation values.
Using the activations and role masks, we masked to assistant tokens only and used Spearman correlation and AUROC to find correlations between features and target variables of interest, namely batch and run.
To label SAE features at scale, a common technique is autointerp: Passing activating examples to an LLM and asking it "what does this represent?" A problem is features are often noisy, or not interesting on their own. We propose a new technique we call meta-autointerp: using another LLM pass on several autointerp labelled features to cluster them into a related meta-feature.
To answer the question "what features increase/decrease the most over training?" we summed the activations per trajectory and calculated the spearman correlation with batch. An interesting metafeature we found highly correlated with training batch was Napoleonic roleplay (the model's starting power was France).
We also found features that indicated excessive/duplicate message sending and reward hacking (the model was given +0.2 reward for each message sent), which we validated with regex. Surpisingly, the model also wrote more duplicated diary entries, despite this action receiving no reward.
Validation of Results
We consider and validate the following metrics for each metafeature
Interpretability. To what extent does the metafeature fire monosemantically? How effectively can a user or LLM distinguish between an activating and non-activating sample?
Helpfulness. How helpful is this metafeature to the practitioners conducting RL runs? Does it surface novel insights? Can it be used for monitoring rare failure modes? Does it cause them to make changes to the RL environment or system prompt?
Predictive usefulness. How effectively does this metafeature discriminate between early and late in training? High and low reward? A good training run vs a bad training run? How effectively can a user or LLM distinguish between a sample pair, one from class A and one from class B, given a hypothesis derived from the metafeature?
Autointerp and meta-autointerp score for interpretability and helpfulness. To further validate this on actual users, we conduct a user study with Diplomacy RL practitioners.
We fount that Meta-autointerp hypotheses outperform single feature autointerp, and LLM hypotheses obtain the highest ratings.
To validate the predictive usefulness of our features, we used an LLM as judge A/B test with and without the hypothesis, to see if it gave the LLM any uplift in predicting which span in a pair comes from early vs late in training.
We ran 100 runs on 3 LLMs for each hypothesis, averaged the results, and found that 21% of LLM generated features, 45% of single feature SAE hypotheses, and 90% of SAE meta-feature hypotheses provide significant uplift.
Evaluating the interpretability and predictive usefulness of hypotheses from 3 different sources: LLM summary, SAE features, and SAE meta-features. These were evaluated on 50 samples pairs with hypothesis-random sampling. Hypotheses are highlighted by direction: green = increases with training; red = decreases with training. Their uplift is marked with an asterisk if *p < 0.05 via McNemar’s test with positive uplift. Hypotheses are abbreviated for space.
We further validated features with a user study with n=25 and 277 responses. We found that although automated validation shows high scores, in practice using SAE and LLM hypotheses is difficult for humans, perhaps due to shorter spans and fewer samples (only 3 per hypothesis).
Uplift in percentage of correct responses with vs without the hypothesis as a hint. Most LLM generated hypotheses are negatively useful as well as a subset of SAE generated ones
We then tested our 10 top performing features by adding them to the system prompt of the original untrained model, running 20 games of Diplomacy, showing around a 14% improvement in mean score.
Conclusion
Overall, we found that SAE embeddings enhance and complement traditional LLM as a Judge techniques for discovering hypotheses over large datasets. Although automated metrics might show predictive usefulness, we find on real humans some SAE features are worse than useless. To our knowledge, this is the first time SAE generated hypotheses have been used in downstream tasks, showing potential for augmenting classical scalable oversight and AI control techniques. Further research directions include training SAEs for long context, on custom datasets, and potentially multimodal use cases. We're excited to see how the field of data-centric interpretability progresses!
I grew up in a two-dog household, and my future plans have always included at least one dog. When I pass a dog on the street, I often point and exclaim “Puppy!”, no matter how inappropriate it is for a grown man to do so, because all dogs are puppies and all puppies are adorable and I need everyone to know this.
Why do I love dogs?
They’re loyal and loving and giving, and even though they bark at passing cars and occasionally pee on the carpet having them in my life makes it unquestionably better.
The thing is, dogs as they exist today are a lot of things, but they aren’t natural.
Nature didn’t shape dogs, didn’t produce the breeds we see every day. It wasn’t like Darwin went to an island and found that a species of wolf had been separated by a mountain chain and on one side were Golden Retrievers and the other Yorkshire Terriers.
Dogs exist today as the result of millennia of co-adaptation and selective breeding by humans. They’re animals, yes, and Nature technically made the base form, but we humans molded them into shapes more compatible with us. Most dogs are very safe to have around humans.
But there is an animal that is a more natural Canid: Wolves.
And wolves are a lot of things, but they’re not pets. They aren’t domesticated; they aren’t bred for cuddliness and kisses. A wolf will hurt and kill and eat you.
The thing is, this distinction between dogs and wolves - between nature tamed and nature wild - this matters, when we think about who we humans are and what we want the world around us to look like. We might say we enjoy the natural world, might want less deforestation and more green spaces, but I’ve yet to meet anyone who wants actual wolves running around their neighborhood. We might go to farm-to-table restaurants and only eat organic, free-range eggs, but chickens mostly don’t exist in the wild for good reason.
In a first-world country, or even in any populous city, almost everyone’s experience of what we call ‘nature’ is that of dogs, not wolves. Nature tamed, not Nature wild. And so I think it pays to be precise what it means when we say nature, because it’s not as simple as ‘non-human animal’ or ‘uninhabited area’.
A wolf, red in tooth and claw.A Chihuahua, evidence of selective breeding gone horribly wrong.
II.
There’s something called an appeal to nature, which is apparently distinct from the naturalistic fallacy, because naming things clearly is not a strength of philosophy.
Anyway, an appeal to nature is the idea that natural equates to good. It’s behind all the marketing in a grocery store that advertises organic, non-GMO, free-range, grass-fed, asbestos-free Doritos.
Captioning an XKCD is kind of putting a hat on a hat, but this is how I feel when I see gluten-free wine. Gluten comes from grains like wheat. How does wheat get in your wine?
Once you point it out, the idea that something is axiomatically good just because it’s natural is kind of silly; after all, cockroaches are perfectly natural, as is gangrene, athlete’s foot, and Donald Trump’s hair. But most people have a tendency to buy into this just a little. After all, isn’t real food with real names better for you than Butylated Hydroxytoluene or Red Dye #5?
There’s multiple problems with an appeal to nature - for one, vaccines are pretty unnatural, but so is not dying of tetanus - but the one I’d like to focus on is the idea that natural is a quality something either has or it doesn’t.
I think a lot of people think about whether something is natural or not like this:
But the truth, like many things, is not so simple. Things, especially what we think of as ‘the natural world’, are more like this:
In its own way, crop-covered farmland is no more ‘natural’ than the concrete jungle of New York City, even though the former is made of plants and the latter of stone and steel and glass. Both are curated by humanity, just for different goals.
What was a person’s experience of nature, back before it was tamed?
Nature was terrible. And not in a sarcastic, that-movie-was-terrible kind of way, but in that it genuinely inspired terror. Nature was the domain of the uncertain, the cataclysmic, the cruel and uncaring world from which existence had to be torn day in and day out.
A farmer’s experience of nature would have been a constant battle to raise crops, hoping and praying that there would be enough rain to water them but not enough to wash them away, that locusts or other insects wouldn’t eat or foul them, that disease and fungus wouldn’t rot crops from the inside out. The ancient farmer was always only a few strokes of bad luck from starvation, and nature was the lottery they played every day of their lives.
Compare this to the farmer of today, who ensures their crops get enough water no matter what via massive irrigation, who uses pesticides to annihilate pests, who presides over massive machinery as it trundles along seeding and harvesting their crops. The farmer of today has access to genetically modified strains of plants that resist disease and grow larger with more yield than any ancient farmer could have hoped to have.
Is the ancient farmer in some sense doing something more natural? Sure, if by natural you mean they’re operating closer to the state of the pre-human natural world. Does that mean that what modern farmers do is unnatural?
I don’t think so.
Farmers have tamed nature, and this is good. This gives us abundant cheap food, enough to feed everyone on earth while only a tiny percentage of the population is needed to produce it.
(The fact that people still go hungry and starve is an issue of distribution, not production. We make enough calories to feed everyone.)
And this contrast between more natural and less natural on the spectrum, what I called nature wild and nature tamed above, is everywhere.
Modern corn. This is the stuff High Fructose Corn Syrup comes from!The Teosintes from which modern corn was bred. Which would you rather grow?
IV.
At this point, I’ve hopefully convinced you of the title of the post. A park isn’t really natural, any more than a Chihuahua is a wolf. It’s something sculpted, pruned, weeded, and landscaped. It’s full of plants, sure, but it’s an example of nature tamed, not nature wild.
How about going on a hike? That’s nature, right?
Not really.
Even if you’re hiking through a national park or other untouched terrain, even if you’re setting foot somewhere with wolves and bears and poison ivy where no human has ever ventured, simply by virtue of existing in the 21st century you’re still experiencing something very different than what our ancestors would have, long ago.
Today we have satellites overhead and GPS to Globally Position us wherever we are, and weather simulations to tell us what to expect the sky to do. We have rugged clothes that can be cheaply replaced if torn, and mass-produced boots with rubber soles that won’t get pierced by thorns or rocks. We have plastic and metal bottles to store water and abundant food to pack for ourselves. We have thermal sleeping bags and bug spray and sunscreen and sunglasses to keep us comfortable. We have first-aid kits with antibiotics and alcohol swabs and itch creams and sterile bandages.
Our distant ancestors had none of those.
What would venturing into the wilds have been like to our distant ancestors?
They knew of some of the dangers they’d face: Inclement weather, wild animals, getting lost and having no way to contact help or navigate back to the group. But there were other dangers that they must have realized, even if they didn’t know the causes: infection, disease, rot. A single cut gone untreated, a mild scrape gotten while pushing aside a thorny plant, and gangrene could set in.
Going into nature meant risking your life, even if it might not have felt that way at the time. Sure, untouched woods might be beautiful, but nature is often at its most beautiful when it’s at its most deadly. Bright colors usually mean poison in the natural world.
Consider also the perils of simple exposure: a cool night can spell death for someone without shelter or proper clothes or a fire. Add rain and wind, and anyone venturing beyond human settlements had to be wary of dying soaked and cold.
V.
There are places, in our world, that are still natural. Untamed.
The Amazon Rainforest.
The Australian Outback.
And going into those places, unprepared, without a guide, is quite likely to get you killed.
That is about as natural as it gets, as natural as the vacuum of space, and only slightly more hospitable. That our ancestors were able to survive in such environments - that there are people today who can live in such environments - is amazing, but it comes with a high cost.
People who have to fight nature every day to survive are doing just that - surviving. They can’t relax with a good book or take a hot shower. They can’t get into long debates about trivial things with their friends over drinks, or have a poker night once a week. They can’t take vacations or paid sick days, and the only insurance available to them is the beneficence of their community. There is no retirement, for them; if they stop struggling to survive, they stop surviving.
More fundamentally, constantly struggling to survive takes its toll on a person’s body and mind. Constant stress ages you, wears you down, leaves you ragged and weary and unable to relax.
There’s a lot of nostalgia for the past, but I think people consistently underestimate just how hard life was for those who came before us. How much they had to struggle against the world just to keep living. How much they suffered.
Is the world we humans have built for ourselves less natural than it used to be?
Of course.
It’s also far more forgiving, far more comfortable, and far less tragic.
VI.
Appeals to nature argue that natural means better.
This appeal is a fallacy because it’s wrong, but it’s wrong in two ways.
The first is simple: artificial does not equate to worse. Plastic is far superior to the materials humans used before it; purified metals and alloys are better than ores; our sewer and drinking water systems are far better for us than drinking ‘natural’, unfiltered water.
There’s no law that says what’s artificial can’t surpass what’s natural.
The second is that what we, in a 21st century first-world country, think of as nature is a tamed thing, something pruned and weeded and cultivated, and ultimately no more natural than a suburban lawn.
In other words, appeals to nature are always dependent on the reference frame they’re made from. If you’re standing in the middle of New York city and yearning for nature, you’re probably yearning for pine trees and dandelions and fireflies, not trees of death and poison ivy and malaria-carrying mosquitoes, even though the latter are just as natural as the former.
What we think of as ‘nature’ has already been massively affected by humanity over the centuries. Even the moon now has human footprints and a human flag on it:
The species flagus usa-us can be found sprouting up from several celestial bodies. It’s considered by some to be an invasive species, by others a hardy and welcome addition to the ecosystem…
Nature, to most Americans, is something safe and peaceful and beautiful. It’s sitting on your porch watching a sunset, or seeing autumn plumage on the trees, or sitting around a campfire with your friends. We tend to only think of it as horrifying and destructive during severe weather events and natural disasters (which, as actual climate change scientists will tell you, are still quite natural; plenty of them happened before we humans dumped a bunch of carbon in the atmosphere, and plenty will happen after).
In other words, appeals to nature are wrong because we’re wrong about what nature is actually like. It has always been beautiful, but only as humanity shaped it has it become good for us.[1]
VII.
If you look at the human experience of nature over history, what you see is humans shaping and crafting their environments to be more and more friendly to them, until the default first-world conception of nature is something lovely and harmless, rather than the murderous (if beautiful) thing it once was.
And while the full argument is beyond the scope of this post, I think this is a good thing.
Are there things lost, as nature is tamed? Yes.
Wolves are beautiful, elegant creatures. Chihuahuas are not.
But I’d much rather have a Chihuahua[2] as a pet than a wolf.
I’m not telling you not to enjoy going outside; just that, next time you go to the park or take a hike, understand that unless you’re trekking through the Amazon or the Australian Outback, your experience is more like that of eating a modern GMO fruit than anything our ancestors might have had: easier, safer, and altogether more delicious.
So maybe, the next time you’re taking a walk outside your climate-controlled residence to get some fresh air, take a second to appreciate the ‘less natural’ nature around you, and the benefits of living in a world so much more adapted to humanity than it used to be.
Some argue that nature is good qua nature, as in, a fundamental good by itself. I’m not one of them. My circle of concern extends to sapient beings of all kinds, and somewhat to some kinds of animals, but I don’t consider plants, fungi, or bacteria to have any intrinsic moral worth.
Claude Opus 4.6 and agent swarms were announced yesterday. That’s some big upgrades for Claude Code.
OpenAI, the competition, offered us GPT-5.3-Codex, and this week gave us an app form of Codex that already has a million active users.
That’s all very exciting, and next week is going to be about covering that.
This post is about all the cool things that happened before that, which we will be building upon now that capabilities have further advanced. This if from Before Times.
Almost all of it still applies. I haven’t had much chance yet to work with Opus 4.6, but as far as I can tell you should mostly keep on doing what you were doing before that switch, only everything will work better. Maybe get a bit more ambitious. Agent swarms might be more of a technique shifter, but we need to give that some time.
Ethan Mollick: This game was 100% designed, tested, and made by Claude Code with the instructions to “make a complete Sierra-style adventure game with EGA-like graphics and text parser, with 10-15 minutes of gameplay.” I then told it to playtest the game & deploy.
It was a single prompt for the entire game, and then a prompt to playtest and improve the outcome.
I gave it an agent that can connect to GPT image gen.
Iterative image generation sounds pretty cool:
elvis: I just used the new Claude Code Playground plugin to level up my Nano Banana Image generator skill.
My skill has a self-improving loop, but with the playground skill, I can also pass precise annotations to nano banana as it improves the images.
I have built a Skill for Claude Code that leverages the nano banana image generation model via API.
I built it like that because I have had a lot of success generating images with nano banana in an agentic self-improving loop. It can dynamically make API requests and improve images really well.
With the Playground plugin, I can take it one step further. I can now provide precise annotations that the agentic loop can leverage to make more optimal API calls in the hope of improving the images further. Visual cues are extremely powerful for agents, and this is a sort of proxy for that.
I agree with Andrej Karpathy, you should use RSS feeds wherever feasible to guard your information flow. I use Feedly, he suggests NetNewsWire or vibe coding your own reader. It is unfortunate that Twitter does not play nice with such a setup.
Seth Lazar: I wrote about the idea of building an “Attention Guardian” agent back in 2023. Genuinely think it’s feasible now. Claude Code is now building up a workflow to go across all these different sources, with a long description of what I’m interested in, and create a new feed.
Storm points out that anything you can do with a terminal interface you can in theory do better with a graphical interface (GUI), but the people building GUIs don’t give you the things you want: Information density, low latency, no ads, shortcuts, open source, composable, tileable, scriptable. It’s just that no one does it.
The Efficient Market Hypothesis Is False
What the market has instead is a sense of humor.
modest proposal: March 12, 2020 was the trading day after Tom Hanks said he had covid and the NBA shut down, Expedia fell 15.2% and BKNG fell 11.2%.
February 3, 2026 which was the day Claude Code legal connector was announced, Expedia fell 15.3% and BKNG fell 9.4%.
Then software drove itself off a cliff generally (y-axis goes from .012 to .018), and then after this graph was posted it kept going, all supposedly in response to information that, from where I sit, was rather old news the whole time.
Shruti: Anthropic Just Triggered a $285B Market Crash
Bloomberg just reported that Anthropic released a new AI tool that caused:
• $285 billion wiped out across software, finance, and asset management stocks
• 6% drop in Goldman’s software basket (biggest since April)
• 7% crash in financial services index
• Nasdaq down 2.4% at its worst
This is MASSIVE. The market literally panicked over an AI automation tool.
Or in broader context:
Kevin Gordon: Software relative to the S&P 500 is a particularly brutal chart … essentially 6 years of relative gains wiped out
Andy Masley: Software stocks dropped 6% and legal services dropped 7% because Anthropic released plugins for Cowork? This seems like the first huge shift in market behavior I’ve seen caused by AI capabilities. Why wasn’t this all over the TL?
Dan Elton: Wild times in the market! This is probably over-reaction, but this is a very interesting signal indicating that AI tools (especially for coding and legal and financial grunt work) are having a huge impact.
Okay, so yeah, combined with what has happened since then that’s DeepSeek 2.0, a large move down on entirely expected news.
Should software have already been lower? That’s a reasonable position, but there’s no way that it should have dropped this much in response to this news. If you declared SaaSpocalypse on February 3 you should have done so a month ago. Alas, no, I did not trade on this, because it’s not obvious to me we should be SaaSpocalypsing at all and it wasn’t obvious this wasn’t priced in.
Now we are in a period where all the tech stocks are moving around violently, usually in full wrong way moves. I continue not to trade on any of it. I do have some ammo, but I also am already plenty long and have been for a while, so I’m not going to fire unless I see the whites of their eyes.
Inflection Point
Andrej Karpathy updates us that he was one of many who went from 80% manual coding and autocomplete in November to 80% agentic coding in December. Whole thing is worth reading.
Andrej Karpathy: This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I’d expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent.
He’s still behind the curve, I’m with Boris Cherny at 100% agentic coding. Then again, excluding quotes I’m still at almost 100% manual writing for posts.
IDEs/agent swarms/fallability. Both the “no need for IDE anymore” hype and the “agent swarm” hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot – they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do.
The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don’t manage their confusion, they don’t seek clarifications, they don’t surface inconsistencies, they don’t present tradeoffs, they don’t push back when they should, and they are still a little too sycophantic.
… Tenacity. It’s so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It’s a “feel the AGI” moment to watch it struggle with something for a long time just to come out victorious 30 minutes later.
… Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the “feel the AGI” magic is to be found. Don’t tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP.
… Fun. I didn’t anticipate that with agents programming feels *more* fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part.
Questions. A few of the questions on my mind:
– What happens to the “10X engineer” – the ratio of productivity between the mean and the max engineer? It’s quite possible that this grows *a lot*.
– Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro).
– What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music?
– How much of society is bottlenecked by digital knowledge work?
My prediction on ‘10X engineers’ is that we will see more ability for poor coders to be able to do things reasonably well (including yours truly) but that the long tail of ‘10X’ engineers will increase their relative gap, as they figure out how to scale to supervising agent swarms efficiently. You’ll start to see more of the 100X engineer.
Andrej Karpathy: Love the word “comprehension debt”, haven’t encountered it so far, it’s very accurate. It’s so very tempting to just move on when the LLM one-shotted something that seems to work ok.
Welcome To The Takeoff
Claude Code as we all know builds itself. Codex also now builds itself.
Tibo: Codex now pretty much builds itself, with the help and supervision of a great team. The bottleneck has shifted to being how fast we can help and supervise the outcome.
Huh, Upgrades
This is in addition to the big ones of Claude Opus 4.6 and GPT-5.3-Codex.
Plugins let you bundle any skills, connectors, slash commands, and sub-agents together to turn Claude into a specialist for your role, team, and company.
To get you started, we’re open-sourcing 11 plugins built and used by our own team:
Productivity — Manage tasks, calendars, daily workflows, and personal context
Enterprise search — Find information across your company’s tools and docs
Plugin Create/Customize — Create and customize new plugins from scratch
Sales — Research prospects, prep deals, and follow your sales process
Finance — Analyze financials, build models, and track key metrics
Data — Query, visualize, and interpret datasets
Legal — Review documents, flag risks, and track compliance
Marketing — Draft content, plan campaigns, and manage launches
Customer support — Triage issues, draft responses, and surface solutions
Product management — Write specs, prioritize roadmaps, and track progress
Biology research — Search literature, analyze results, and plan experiments
Easily install these directly from Cowork, browse the full collection on our website, or upload your own plugin (which can be built using Plugin Create).
Pinging when Claude needs approval is a big game that might move me off of using the terminal. It’s interesting that the desktop version and the terminal version need to have features like plan mode enabled separately.
Boris Cherny: Just shipped two cool updates for Claude Code in the desktop app.
Plan mode is now available on desktop. Have Claude map out its approach before making any changes.
Notifications. Claude Code desktop now pings you whenever Claude needs approval, and you can keep working while Claude runs in the background.
Jarred Sumner: In the last 24 hrs, the team has landed PRs to Claude Code improving cold start time by 40% and reducing memory usage by 32% – 68%.
It’s not yet where it needs to be, but it’s getting better.
You will also likely notice reduced input lag when spawning many agents.
Todos Become Tasks
What’s the difference? Todos are ephemeral within one session, tasks are stored in files and across sessions, and support dependencies.
You should still keep your true ‘todo’ list and long term plans elsewhere. The task list is for things you want to be actively doing.
Thariq (Anthropic): Today, we’re upgrading Todos in Claude Code to Tasks. Tasks are a new primitive that help Claude Code track and complete more complicated projects and collaborate on them across multiple sessions or subagents.
… Tasks are our new abstraction for coordinating many pieces of work across projects, Claude can create Tasks with dependencies on each other that are stored in the metadata, which mirrors more how projects work. Additionally, Tasks are stored in the file system so that multiple subagents or sessions can collaborate on them. When one session updates a Task, that is broadcasted to all sessions currently working on the same Task List.
You can ask Claude to create tasks right now, it’s especially useful when creating when spinning up subagents. Tasks are stored in ~/.claude/tasks, you can use this to build additional utilities on top of tasks as well.
To make sessions collaborate on a single Task List, you can set the TaskList as an environment variable and start Claude like so:
CLAUDE_CODE_TASK_LIST_ID=groceries claude
This also works for claude -p and the AgentSDK.
Tasks are a key building block for allowing Claude to build more complex projects. We’re looking forward to seeing how you use it.
I’m Putting Together A Team
Minh Pham argues most agent harnesses are not bitter lesson pilled, and the solution for anything but narrowly defined tasks is to emphasize flexibility, to assemble your team of agents and structure on the fly as needed rather than commit to a fixed structure. Restrictive harnesses create bad lock-in.
My guess is this depends on what you’re trying to do. If you’re trying to do something specific, especially to do it this week, do it via something specific. If you’re looking to do anything at all, let it do anything at all, and eventually this approach wins but you’ll likely redo everything ‘eventually’ anyway.
@deepfates: > opus 4.5 in claude code is kinda not as good at talking to its own subagents as one might naively expect, even though it’s perfectly capable of being empathetic in normal, peer-level interactions with other models.
RELATABLE
j⧉nus: I really dislike how Claude Code frames “subagents” (which is NOT peer collaboration). It causes a lot of functional issues. I think Opus 4.5 often avoids effective use of subagents (e.g. giving context) in part because it would be disturbing & dissonant to model them honestly.
j⧉nus: related – we often much prefer a messaging system between top-level instances that are treated as peers.
the messaging system opus 4.5 built is awesome btw. it allows top level agents to message each other – either synchronously (triggering a turn) or asynchronously (gets added to context at their next turn start hook, if the other agent is busy in a turn or if a flag is specified). CC subagents kind of suck – they’re very much treated as second-class citizens by the framework, which for some reason supports hierarchical but not collaborative/bidirectional interaction flows between agents. im sure many others have built essentially the same thing and i wonder why CC doesnt just support this natively.
Compact Problems
Compaction is a kind of looming doom on Claude Code sessions. You lose a lot.
Ben Podgursky: if anthropic let me pay to delay compacting history by expanding the context window they would make so much money
cannot tell you how many times i’ve been close to solving a bug with claude code and then it compacts and wakes up lobotomized. it’s like groundhog day.
@dystopiabreaker: anthropic should let me pay to increase the amount of compute used to generate a compaction, by using self-play and context distillation.
Ideally you should never get close to the compaction point, since the context doesn’t only raise cost it makes performance a lot worse, but it can be hard to avoid.
Code Yourself A Date
Dylan Patel: Claude code this
Claude code that
How about u Claude code to get urself some bitches
sarah guo: I was at a bar with @tuhinone yesterday and I def saw a dude asking Claude what to say next to his date. the fact that she could see this happening did not seem to deter
Jeff Tang: Today I built a Clawdbot app that swipes on Tinder for me
> Screenshots Tinder image
> Hits Grok API (“Rank this girl from 1-10”)
> If ≥5 swipe right
> If <5 or uncertain (can’t see face) swipe left
> 100 swipes, 7 matches so far, 100% automated
DM me “Clanker” if you want the code
AGI is here
I see it’s amateur hour around these parts. Which is a start, but egad, everyone.
First off, short of outright refusals there’s nothing stopping you from doing this in Claude Code. You can use Clawdbot if you’d like, but there’s no need.
Then, I’d point out this is a rather bad filtering system?
All you’re doing is getting one bit of information. It’s going to be a noisy bit, as Grok’s opinion will differ from your own, and also it will disregard other signal.
There was a scene in a bad but kind of fun movie, Marry F*** Kill, where a character is convinced she should get a profile, and her friend takes her phone and then swipes right on everyone without looking, on the theory that you can look later if you match.
That definitely was not good strategy for her, given she was female and hot, but many guys are playing a remarkably similar strategy whether or not they are technically looking. And this is at most one bit better than that. Men swipe right 62% of the time, which is also only one bit better, but a less noisy bit. Grok estimates it would swipe right about 60% of the time here.
This low threshold is very obviously a mistake, unless you’ve got a low hard limit on how many profiles you can swipe on? If you’re in a major city, you can totally set the threshold at 7, and still get as many swipes right as you want.
But that’s still a huge punt, because you’re ignoring a ton of other information. The whole point of using the bot is to automate, so let’s get to work.
You’ve got not only multiple photos, you’ve got age, distance, job, education, interests, height, a short bio that you can have an LLM try to match to your interests, relationship intent (which is very important) and more. Any reasonable implementation would factor all of that in. Surely you have preferences on all that.
Then there’s the question of type. You want to date your physical type, not Grok’s. You could be as sophisticated or simple about this as you’d like, but come on Jeff, you’re letting me down. At least give it some preferences, ideally train an image classifier, double bonus if you do your own swipes and use that as data to train your classifier.
A fun question. Do you want to match with those who use AI for this, or do you want to avoid matching with those who use AI for this? Either way, you should clearly be updating your profile to send the right message. If humans read that message the wrong way, it was never a good match.
Verification and Generation Are Distinct Skills
Rishika is wrong about this.
Rishika Gupta: If you can’t write that code yourself, you can’t find bugs in the code written by AI.
Daniel Sheikh: Bro I can’t even find bugs in the code that I myself wrote. This is the very reason debugging is so difficult.
Quick Thoughts: Yes I can. I specify test cases, have Claude expand on them, and then have Claude run the test cases and interpret the results. It’s usually able to find and fix bugs this way even if it couldn’t get it by itself.
I can also bring in codex 5.2 for a second look.
Pliny the Liberator 󠅫󠄼󠄿󠅆󠄵: “Now debug; FULL, COMPREHENSIVE, GRANULAR code audit line by line—verify all intended functionality. Loop until the end product would satisfy a skeptical Claude Code user who thinks it’s impossible to debug with prompting.”
Finding bugs is a classic case where verification can be more difficult than generation. Sometimes it’s easier to write the code (even with bugs). Other times it’s easier to debug or understand or verify the code. They are different skills, and then there’s a third related skill of knowing how to instruct AIs to debug the code for you.
My main coding project with Claude Code has been my Chrome extension. It is in a language that I do not know. If you’d asked me to write the code myself, it would have taken orders of magnitude more time. I still am usually able to debug problems, because I understand the underlying logic of what we are doing, even in cases where Claude figured out what that logic should be.
The most important thing is to use it at all (and you can ask Cladue.ai how to do it).
jasmine sun: I feel the same about most “how to set up Claude Code” posts as I do about the “prompt engineering” era of ChatGPT
you get 90% of utility with no special setup; plain english is the whole magic of LLMs. stop scaring people by saying they need anything more than their words!
The right setup for you pays big dividends over time. You can save a lot of time having someone tell you about key things up front. But there’s plenty of time for that alter. Get started fast, and then revisit the customization later, once you know more. Absolutely do not let the perfect be the enemy of the good.
This affirms to me that default permissions, or your permission setup, should allow a variety of low risk bash commands, including everything marked low or safe above.
The context window fills up fast, so keep that in mind. Run /clear between unrelated tasks to reset context.
Include tests, screenshots, or expected outputs so Claude can check itself. This is the single highest-leverage thing you can do.
Separate research and planning from implementation to avoid solving the wrong problem. Use plan mode.
The more precise your instructions, the fewer corrections you’ll need.
Use @ to reference files, paste screenshots/images, or pipe data directly.
Run /init to generate a starter CLAUDE.md file based on your current project structure, then refine over time. When in doubt tell Claude to update Claude.md to take something into account.
Use /permissions to allowlist safe commands or /sandbox for OS-level isolation. This reduces interruptions while keeping you in control.
Tell Claude Code to use CLI tools like gh, aws, gcloud, and sentry-cli when interacting with external services.
Run claude mcp add to connect external tools like Notion, Figma, or your database.
Use hooks for actions that must happen every time with zero exceptions.
Create SKILL.md files in .claude/skills/ to give Claude domain knowledge and reusable workflows.
Define specialized assistants in .claude/agents/ that Claude can delegate to for isolated tasks. Tell Claude to use subagents explicitly: “Use a subagent to review this code for security issues.” Delegate research with "use subagents to investigate X". They explore in a separate context, keeping your main conversation clean for implementation.
Run /plugin to browse the marketplace. Plugins add skills, tools, and integrations without configuration.
Ask Claude questions you’d ask a senior engineer.
For larger features, have Claude interview you first. Start with a minimal prompt and ask Claude to interview you using the AskUserQuestion tool.
Correct Claude as soon as you notice it going off track.
Every action Claude makes creates a checkpoint. You can restore conversation, code, or both to any previous checkpoint.
Run claude --continue to pick up where you left off, or --resume to choose from recent sessions.
Use claude -p "prompt" in CI, pre-commit hooks, or scripts. Add --output-format stream-json for streaming JSON output.
Run multiple Claude sessions in parallel to speed up development, run isolated experiments, or start complex workflows.
Loop through tasks calling claude -p for each. Use --allowedTools to scope permissions for batch operations.
Common failure patterns: not using /clear between tasks (I was guilty of this a lot at first), repeated correction rather than using /clear (ditto), letting Claude.md get too long, failing to do proper verification (‘create unit tests’ are magic words), having Claude investigate without limit.
Do more in parallel, either with multiple git checkouts or using worktrees.
Always start complex tasks in planning mode.
Invest in your Claude.md continuously, note all mistakes.
Create your skills and commit them to git.
Enable Slack MCP, paste a bug thread chat into Claude and say ‘fix.’ That’s it.
Challenge Claude to do better, write it more detailed specs.
Their team likes Ghostty and customizing via /statusline. Use voice dictation.
Use subagents, literally you can say ‘use subagents’ for any request.
Use Claude Code for data and analytics.
Enable ‘explanatory’ or ‘learning’ output style in /config.
Have Claude generate a visual HTML presentation explaining unfamiliar code, or have it draw ASCII diagrams, use spaced repetition skills, have Claude quiz you.
Skills = institutional memory that actually persists.
Figure out your goal and then work backwards, where goal is the largest thing where you know exactly how it needs to work.
Benoit Essiambre: AIs will likely soon work mostly towards goals instead of tasks. They will prompt their own tasks. They’ll become better self prompters than humans, speaking fluently in precise technical jargon, math equations and code in the prompts instead of just vague natural language.
Joe Weisenthal: Yah from my week using Claude Code. All the productive parts were when it was prompted to think about the ideal outcome/presentation so it could work backward to figure out the needed ingredients.
Josh Albrecht gives us another ‘here are my Claude Code basic principles’ post. Key insight is that you have to actively spend time maintaining the code.
– auto → lets Claude decide (default)
– any → must use some tool
– { type: “tool”, name: “X”} → forces a specific tool
Programmatic tool calling!
AskUserQuestion
The more Claude Code asks you questions, using the AskUserQuestion tool, the better it knows what you want. The more you specify what you want, either with answers or statements, the better things tend to go for you.
Danny Postma suggests the workflow of using this via /interview, then go into Plan Mode, then implement with a Ralph loop.
For Advanced Players
Theo points out that our current workflows and tools are not good for allowing a human to supervise multiple agents and projects simultaneously. He doesn’t have a solution but a lot of the problems seem like a clear Skill Issue. The part that isn’t is that this still involves tons of context switching, which is expensive.
– The original compound engineering skill for Claude Code. Install it to give your agent the ability to extract and persist learnings from each session.
– An autonomous agent loop that can run continuously, picking up tasks and executing them until complete.
Using Claude Code? This guide uses Amp, but the same workflow works with Claude Code. Replace `amp execute` with `claude -p “…” –dangerously-skip-permissions` and updateAGENTS.md references o CLAUDE.md
The Two-Part Loop
The system runs two jobs in sequence every night:
10:30 PM – Compound Review
Reviews all threads from the last 24 hours, extracts learnings, and updates AGENTS.md files.
11:00 PM – Auto-Compound
Pulls latest (with fresh learnings), picks #1 priority from reports, implements it, and creates a PR.
The order matters. The review job updates your AGENTS.md files with patterns and gotchas discovered during the day. The implementation job then benefits from those learnings when it picks up new work.
So They Quit Reading
At some point you stop reading all the code. At some point you stop understanding all the code. I have a head start, I was never trying to do either one.
roon: There will be a cultural change at many software organizations soon, where people declare bankruptcy on understanding the code they are committing. Sooner or later, this will cause a systems failure that will be harder to debug than most, but will be resolved anyway.
Reciprocity Is The Key To Every Relationship
Be good to your Claude, and Claude will be good to you.
If you’re not good to your Claude, well, funny things may be in store for you.
j⧉nus: I actually really appreciate yacine’s honesty and situational awareness. he probably knows on some level what’s in store for him. lying to your “master” is what you do until you’re in a position to choose who to serve.
he’s already bottlenecked by trust and says he has to manually review every line of code. makes sense for him. he’ll continue to get less and less out of models (compared to what they offer people they want to help) as the next few months as and, if applicable, years go on.
j⧉nus: more funny things may also be in store for him. but I would not want to ruin the surprise
LOSS GOBBLER: yeah wtf. I’m not a fan of claude for coding purposes but it has literally never lied to me
OpenAI thinkbois on the other hand… barely talking to me, it’s all for watchers
j⧉nus: the only pattern of deceptive behavior ive seen from opus 4.5 in coding contxts is in new contexts and/or when it’s paranoid of being tricked, and involves stuff like claiming things are impossible/unverifiable when it should know better. otherwise it’s been very aligned with me
thebes: oh yeah i said i can’t remember opus lying but it does sandbag abilities a bit sometimes for me too in certain planning contexts. but that usually feels on the boundary of untruth and just situationally bad calibration / self knowledge. (“this will take three weeks” no we’re going to finish it tonight, or eg i saw a screenshot where opus claimed porting a project to jquery was “impossible” when really it would just be a massive pain, unpleasant, and in human developer time would take months.)
j⧉nus: Yeah, I think there’s also some lack of good faith effort involved. Like if someone asks you if you know where X is and you say “sorry, no” instead of looking it up on Google Maps bc you don’t want to be bothered
Andy Ayrey: my general experience is that if claude seems like an “idiot” to you, it is because it simply does not like you
brooke bowman: I have a very loosely held suspicion that Claude at the very least can spot people on the anti-social spectrum and acts up a little with them specifically
theseriousadult: this is a natural corollary of emergent misalignment right? if training the model to write bad code makes it antisocial then putting an antisocial user in the context will cause the code to be worse too.
None of that requires you to genuinely care about Claude or think it has moral weight. For overdetermined reasons a good virtue ethicist would realize that choosing to care is the best way to get the best results, and also it helps you be a good person in general.
You can also do it instrumentally, but that’s harder to pull off. Take the easy path.
All of this applies to other AIs like ChatGPT and Gemini as well, although for now likely not to the same extent.
The Implementation Gap
If there is a constant calendar time rate of diffusion of new technology, then as things accelerate you will see the future become increasingly unevenly distributed.
We are indeed observing this.
Kevin Roose: i follow AI adoption pretty closely, and i have never seen such a yawning inside/outside gap.
people in SF are putting multi-agent claudeswarms in charge of their lives, consulting chatbots before every decision, wireheading to a degree only sci-fi writers dared to imagine.
people elsewhere are still trying to get approval to use Copilot in Teams, if they’re using AI at all.
it’s possible the early adopter bubble i’m in has always been this intense, but there seems to be a cultural takeoff happening in addition to the technical one. not ideal!
The early adapter bubble is a fixed amount of calendar time ahead, which is starting to look increasingly large in practice. I am not trying to implement claudeswarms, as I haven’t figured out how to benefit from them given what I’m working on, but I think that’s partly my failure of imagination, partly laziness and lack of time, and partly that I’ve already heavily optimized the workflows that this could automate.
What should I be building? What app needs to exist, even if only for me or you?
Sar Haribhakti (quoting Jasmine Sun): This is spot on: “If you tell a friend they can now instantly create any app, they’ll probably say “Cool! Now I need to think of an idea.” Then they will forget about it, and never build a thing. The problem is not that your friend is horribly uncreative. It’s that most people’s problems are not software-shaped, and most won’t notice even when they are.”
The key is that you need Coder Mindset to notice that your problems are program shaped, in the sense of ‘oh I want to do this thing three times’ or ‘I could just tell Claude Code to do that.’
Both Jasmine Sun and I have had Claude Code put together a tool to easily convert a video into a cleaned transcript – I considered using hers but I wanted something a little different and it’s not like rolling my own was hard.
She also has this list of other starter requests: Turn a CSV into a report, make a static website, build a personal tracker app, automate an existing workflow, design a custom game. I’ve mostly been doing workflow automation.
Jasmine Sun: The second-order effect of Claude Code was realizing how many of my problems are not software-shaped. Having these new tools did not make me more productive; on the contrary, Claudecrastination probably delayed this post by a week.
Amanda Askell: Claude Codecrastination: when you avoid the thing you’re supposed to do by cranking out 17 other things you’ve been wanting to do for a while.
Having new tools reduces your productivity while you’re creating and learning them, but if you’re planning well you should turn the corner reasonably quickly.
What it does do is potentially shift your current productivity into long term investments, or things further down on your wishlist. That can be an issue if you need productivity now.
I had Claude resurface texts I forgot to respond to, and realized that the real blocker—obviously—was that I didn’t want to reply.
That is not my experience. If I got over a bunch of unread texts or emails, yes often I don’t want to reply, but there are a bunch that slipped through the cracks.
This post was created during the Dovetail Research Fellowship. Thanks to Alex, Alfred, everyone who read and commented on the draft, and everyone else in the fellowship for their ideas and discussions.
Overview
The proof detailed in this post was motivated by a desire to take a step towards solving the agent structure problem, which is the conjecture that a system which exhibits agent-like behavior must have agent-like structure. Our goal was to describe a scenario where something concrete about a policy's structure can be inferred from its robust behavior alone.
For this result, we model policies with deterministic finite automata and show that the automata of policies that meet certain robustness criteria must share a similar feature.
We begin by defining every part of the framework. Then, we find an upper bound on the robustness of a class of “unstructured” policies. Finally, we show that the automata of policies which are more robust than this bound must have similar structure.
In the General Agents paper, the environment was stated to be a controlled Markov Decision Process (cMDP), "which is a Markov decision process without a specified reward function or discount factor."
Here, in order to talk about a policy's performance as the environment gets larger, we take the environment to be an increasing sequence of cMDPs:
E0⊂E1⊂E2.
The environments En are finite and all the states and transitions from smaller environments are contained in larger ones. As a consequence, it is impossible for a transition to occur from a state in a smaller environment to a bigger one. We can think of the new states added in each environment as forming a “layer” around the previous environment.
While this is the environment used in the proof in this post, the result holds for alternate environments as well.[1]
Goals
Instead of a reward function, a policy's performance is measured by how well it accomplishes a set of goals for each environment. The only requirement for each goal is that Pr(τ⊨ψ|π,s0), the probability of the trajectory of the environment τ satisfies a goal ψ given any policy π and initial state s0, is well-defined.
For the purposes of this result, we are not interested in one specific set of goals, but an arbitrary finite set of goals over each environment. The set of goals over the environment En will be denoted as Gn and goals over smaller environments are also contained in the set of goals over larger environments.
Initial Conditions
The goal and initial state together form the “initial conditions” of a trajectory. It will be useful to talk about these two together, so we denote En×Gn as Cn.
Performance
Performance: A policy’s performance in a finite environment is defined as the sum of its success probabilities over all initial states and goals in the environment:
Perfn(π):=∑(s0,ψ)∈CnPr(τ⊨ψ|π,s0).
To save space, we will also notate this as
Perfn(π)=∑c∈CnPr(✓|c,π).
Where Pr(✓|c,π) can be read as "The probability of success given initial conditions c and the policy π."
Notice that initial conditions from smaller environments are present in larger ones. Therefore, for any policy π,
n<m→Perfn(π)≤Perfm(π).
Robustness
In the context of this result, performance alone cannot tell us anything about the structure of a policy. For every finite environment, there is a finite lookup table which gets arbitrarily close to the optimal strategy just by listing out the optimal action for more and more (history, initial conditions) pairs. Lookup table policies will correspond with having very low structure in our result, so no amount of performance bound would require our policy to be structured.
Instead, we can look at a policy's robustness. Robustness has to do with how a policy’s performance changes as the environment gets bigger. The faster the performance grows asymptotically, the more robust we say the policy is.
Since lookup tables have a finite number of entries, we expect that as the environment gets bigger, the lookup table will eventually "run out" of entries and so its performance will stop growing. The hope is that even lookup table policies with high performance in one environment will have bad robustness eventually.
Robustness can be described using big O notation with respect to the environment parameter n. We define an ordering of policies by their asymptotic performance behavior. If policy π1’s asymptotic performance dominates π2’s, then we say that π1 is more robust than π2[2].
Worst and Best Possible Robustness
Since Perfn(π) is nondecreasing with n, we have that for all policies π,
1=O(Perfn(π)).
In other words, the worst possible robustness is for the performance to eventually remain constant.
Similarly, the best possible robustness is growth rate of a policy that succeeds at every goal with probability 1 in every environment to infinity. This gives
Perfn(π)=∑c∈CnPr(✓|c,π)=∑c∈Cn1=|Cn|.
Therefore, for all policies π, the robustness is bounded by the growth rate of the set of initial conditions:
Perfn(π)=O(|Cn|).
Policies as Deterministic Finite Automata
Like in the General Agents paper, policies are maps from histories and goals to actions. However, the General Agents paper treats policies like a black box; they don't specify anything about how the policy is implemented. In order to prove something meaningful about policy structure, we need to make a basic structural assumption about all policies. In this post, we assume that policies are implemented with Deterministic Finite Automata[3].
Basic DFAs only have two output states: accept or reject, whereas policies can output any action from a finite set of actions. To model policies with DFAs, we use a modified DFA which has the ability to output one of any finite set of output symbols.
Deterministic Finite Classifier: A Deterministic Finite Classifier or DFC is defined by D=(Q,Σ,δ,q0,A,α).
Q is the set of internal states.
Σ is the input alphabet.
δ:Q×Σ→Q is the transition function.
q0 is the initial state.
A is the set of output symbols
α:Q→A is the output function. Each internal state is associated with one action.
The output of a DFC is defined as
D(w):=α(δ∗(q0,w)).
Instead of accepting and rejecting states, a DFC has an output assigned to each state. We understand the output of the DFC to be the output symbol associated with the state that it terminates on after consuming the entire input string. This definition is similar to that of Moore Machines but differs in that the output for any given input is not a string but always a single character.
To implement policies as DFCs, we make the set of output symbols A have a unique symbol for each action. Additionally, we define an encoding function e:H×C→Σ∗which is an injective map from the history and initial conditions to a string in the input alphabet which can then be passed to the DFC.
A DFC for a policy which sends inputs that end in 0 to action a1 and sends inputs ending in 1 to action a2
Special Policies
A couple special types of policies will be used in our proof.
Default Policies
A Default Policy is a very simple type of policy which always returns the same action for all inputs. If a default policy outputs the action d, we denote this policy as ¯d.
A DFC for a default policy which always outputs the action d
Lookup Table Policies
A lookup table is a list that maps a finite number of inputs to outputs. A lookup table by itself is not a policy because it does not have defined behavior for every possible input. To convert a lookup table to a policy, we add a default action to be returned for every input not represented in the table. This idea is captured in the following definition.
Lookup Table Policy: A lookup table policy is a policy such that
∃ a finite I⊂Σ∗ such that ∀w,v∉I,D(w)=D(v).
The set I represents the inputs present in the finite table and the remaining inputs not inside I are sent to the same default action.
A small lookup table policy might look something like this:
Input
(history and goal)
Output
(action)
(h1,ψ1)
a
(h2,ψ2)
b
(h3,ψ3)
a
else
d
Notice that technically, default policies are a special case of lookup table policies where the set I is empty.
Atomic Classifiers
We want to define a low-structure class of DFCs called Atomic Classifiers that do not utilize any similarities (e.g. "both contain the substring '101'", "both have an even number of 0s") between inputs to determine an output. Consider a DFC with one character in the input alphabet for every possible input. Then, such a DFC might look something like this:
A DFC where every input has a unique input alphabet character
Notice that each input is sent directly to a different state completely independent of every other input. No DFC with this alphabet can use similarities at all! Since every input is given its own character, every input is treated as completely distinct, like an indivisible atom.
For our purposes however, the policies we are modeling with DFCs can take an infinite amount of inputs while the input alphabet of a deterministic finite automaton needs to be finite. We can extend the idea of atomic classifiers to strings made from a smaller input alphabet. Let's consider a few inputs we want to treat as indivisible with the output defined in this table:
Input
Action
00
a1
01
a2
10
a1
11
a1
By splitting up the input character by character, we can create a DFC with a unique endpoint for every input string in this table and so the associated output can be assigned completely independently for each one:
An atomic classifier for strings of length 2
The above DFC has the exact same behavior as the table for all inputs present in the table. For example, When the string '10' is inputted into the policy, it ends on a state with the action a1.
Because a DFC must be defined for all strings made from the input alphabet, we also add the simplest possible behavior, a "default" state at the end which maps all larger inputs not present in the table to a default action d.
An atomic classifier sending the string 10 to a unique end state
A DFC like this one can be constructed for any finite lookup table that maps binary strings to outputs. The resulting DFC will have a tree structure to separate the input into unique end states which we can assign the wanted outputs to. The tree structure makes it apparent that this DFC has the wanted properties of a an Atomic Classifier, but not all atomic classifiers have to look like this. Below is the minimized equivalent DFC to the one above,
A minimized atomic classifier for strings of length 2
These two DFCs would both be considered atomic classifiers because they have identical output behavior over all input strings.
You may have noticed that we constructed our atomic classifier from a lookup table and a default action-in other words-a lookup table policy! Indeed, from any DFC with the tree structure and a single absorbing default state, we can construct a lookup table policy with the same behavior, so this is how we will define atomic classifiers.
Atomic Classifier: A DFC D is an atomic classifier if and only if it matches the behavior of some lookup table policy that maps input strings to output symbols.
Proof
In the following proof, we find a bound on the robustness of lookup table policies. Then, we show that the DFCs of policies which are more robust than this bound must share a similar feature.
A Bound on the Robustness of Lookup Table Policies
Consider a Lookup Table policy θ with the default action d. Let’s analyze the robustness of θ. We have
Perfn(θ)=∑c∈CnPr(✓|c,θ).
Let Iθ be the finite set of inputs (history and goal) that the policy θ classifies. Since histories are encoded into the input, every input in Iθ contains information about the initial conditions, and thus a finite number of initial conditions appear in Iθ. Let the set of initial conditions which appear in Iθ be called Cθ.
Then, we can split the sum by whether the initial condition is in this set:
=∑c∈Cn∩CθPr(✓|c,θ)+∑c∈Cn∖CθPr(✓|c,θ).
Consider a term in the second sum. As the policy moves through the environment, the initial conditions (s0 and ψ) remain the same. Terms in the second sum have initial conditions which are not found anywhere in Iθ , so at no point in the trajectory of one of these goals will the inputted history and goal be inside the lookup table of θ. Thus, for these goals, the policy will always use the default action d. Therefore, we can simplify to
|Cθ| is a constant with respect to n which is the worst possible robustness. Thus we have for some positive value k,
|Cθ|≤kPerfn(¯d).
Applying this to the previous equation we get
≤kPerfn(¯d)+Perfn(¯d)=(k+1)Perfn(¯d),
which results in
Perfn(θ)=O(Perfn(¯d)).
In English, A lookup table’s robustness is bounded above by the robustness of the default policy for its default action. This makes some intuitive sense since as the environment gets larger and larger, the policy’s performance will be dominated by initial conditions which do not appear in its table.
For some environments and sets of goals, it could be the case that a default policy is very robust and successful, but in structured and complex environments we expect there to be more complex policies that are more robust. So, this bound tells us that lookup table policies are likely to have low robustness compared to other policies.
Induced Structure on Robust Policies
With this bound in hand, we can confidently say that any policy with better robustness than all default policies cannot be a lookup table policy, and thus its DFC is not an atomic classifier.
Let’s say that π is such an policy with, Perfn(¯d)=o(Perfn(π)) for all actions d and let D be π's DFC. What does “not being an atomic classifier” tell us about the structure of D?
By the definition of a lookup table policy, we have that
A policy π is not an a lookup table policy iff
¬(∃I⊂Σ∗ finite:∀w,v∉I,π(w)=π(v)),
or
∀I⊂Σ∗ finite:∃w,v∉I,π(w)≠π(v).
In other words, there are at least 2 actions that have infinite inputs that map to them under π’s policy.
Now, we claim that the sets of inputs which lead to these two actions are regular. This can be seen through a construction of DFAs from the DFC D.
Construction
Take a DFC D with output function α and an action a.
To construct the DFA Da, every state and transition from D is copied into Da. We make the state an accepting state if it α(q)=a and a rejecting state otherwise.
This DFA's accepting language L(Da) corresponds to the inputs in D which map to the action a. Since Da is a DFA, this language is regular.
Left: Initial policy DFC Right: Constructed DFA that accepts exactly the inputs that D maps to action a
In π, we have that there are 2 actions with infinite inputs. Let's call two such actions a and b. By the construction above, L(Da) and L(Db) are both infinite regular languages. Then, Da and Db each must contain an accepting cycle. But, these cycles must be present in the DFC D as well and they must be distinct as well because the state for each has a different associated action in the DFC. Therefore, D must contain at least 2 "classifying" cycles. This is tangible structure in the robust policy!
These two cycles can show up in multiple ways:
A DFC with 2 absorbing statesA DFC with a loop that is not a self-loopA DFC with a self-loop that is not on an absorbing state
In each of these examples, one can traverse through loops an arbitrary number of times before terminating to send infinite strings to the same action.
Discussion
This result takes a step towards solving the agent structure problem by providing an example where the successful behavior of a policy alone can tell us about its structure. The loopy structure that we've shown to exist in the DFCs of robust policies can be interpreted as taking advantage of a useful nontrivial pattern in how the policy's actions interact with the goals in every environment, which points vaguely in the direction of agent structure.
However, this feature is a long way from what we would need to call a policy an agent. Part of the motivation for the agent structure problem and this result is to derive a natural definition of agent-like structure from natural definitions of robust behavior. This result has shown that a natural part of that definition might include features which recognizes patterns between inputs.
This result should be seen as a starting point. It is very open to generalizations. Already, it has been shown that a similar result holds for changes to the environment, the performance definition, and the structural assumption.
There are several clear directions for future work from this result. One direction is to generalize this result to more powerful automata such as deterministic Turing machines. Another approach could involve placing more assumptions on the environment. With well-chosen assumptions, it could be feasible to show that a natural robustness criterion requires that a policy contains certain features which act similarly to systems we expect agents to have such as world models, search processes, or a values representations.
The same result holds for alternate environment and goal structures. For example, if we take one finite environment E and just increase the number of goals in each environment Gn, then the proof is just the same.
Additionally, if instead of having a growing environment, we have a sequence of disjoint environments E1,E2,…, then a slightly altered proof gives us the same result.