AI risk discussion often seems to assume that the AI we most want to prepare for will emerge in a“normal” world — one that hasn’t really been transformed by earlier AI systems.
I think betting on this assumption could be a big mistake. If it turns out to be wrong, most of our preparation for advanced AI could end up ~worthless, or at least far less effective than it could have been. We might find ourselves wishing that we’d laid the groundwork for leveraging enormous political will, prepared for government inadequacy, figured out how to run large-scale automated research projects, and so on. Moreover, if earlier systems do change the background situation, influencing how that plays out could be one of our best levers for dealing with AI challenges overall, as it could leave us in a much better situation for the “boss-battle” AI (an opportunity we’ll squander if we focus exclusively on the endgame).
So I really want us to invest in improving our views on which AI changes we should expect to see first. Progress here seems at least as important as improving our AI timelines (which get significantly more attention). And while it’s obviously not an easy question to answer with any degree of confidence — our predictions will probably be off in various ways — defaulting to expecting “present reality + AGI” seems (a lot) worse.
I recently published an article on this with @Owen Cotton-Barratt and @Oliver Sourbut. It expands on these arguments, and also includes a list of possible early AI impacts, a few illustrative trajectories with examples of how our priorities might change depending on which one we expect, some notes on whether betting on an “AGI-first” path might be reasonable, and so on. (We’ll also have follow-up material up soon.)
I’d be very interested in pushback on anything that seems wrong; the perspective here informs a lot of my thinking about AI/x-risk, and might explain some of my disagreements with other work in the space. And you’d have my eternal gratitude if you shared your thoughts on the question itself.
A lot of discussion assumes that we won't really move up the Y axis here until after some critical junctures.
I have been reading Eric Drexler’s writing on the future of AI for more than a decade at this point. I love it, but I also think it can be tricky or frustrating.
More than anyone else I know, Eric seems to tap into a deep vision for how the future of technology may work — and having once tuned into this, I find many other perspectives can feel hollow. (This reminds me of how, once I had enough of a feel for how economies work, I found a lot of science fiction felt hollow, if the world presented made too little sense in terms of what was implied for off-screen variables.)
One cornerstone of Eric’s perspective on AI, as I see it, is a deep rejection of anthropomorphism. People considering current AI systems mostly have no difficulty understanding it as technology rather than person. But when discussion moves to superintelligence … well, as Eric puts it:
Our expectations rest on biological intuitions. Every intelligence we’ve known arose through evolution, where survival was a precondition for everything else—organisms that failed to compete and preserve themselves left no descendants. Self-preservation wasn’t optional—it was the precondition for everything else. We naturally expect intelligence bundled with intrinsic, foundational drives.
Anyhow, I think there's a lot to get from Eric’s writing — about the shape of automation at scale, the future of AI systems, and the strategic landscape. So I keep on recommending it to people. But I also feel like people keep on not quite knowing what to do with it, or how to integrate it with the rest of their thinking. So I wanted to provide my perspective on what it is and isn’t, and thoughts on how to productively spend time reading. If I can help more people to reinvent versions of Eric’s thinking for themselves, my hope is that they can build on those ideas, and draw out the implications for what the world needs to be doing.
If you’ve not yet had the pleasure of reading Eric’s stuff, his recent writing is available at AI Prospects. His most recent article explains how a lot of his thinking fits together, and may be good to give you a rough orientation (or see below for more of my notes) — but then I’d advise choosing some part that catches your interest, and diving into the linked material.
Difficulties with Drexler’s writing
Let’s start with the health warnings:
It’s abstract.
It’s dense.
It often implicitly challenges the concepts and frames we use to think about AI.
It shies away from some questions.
These properties aren’t necessarily bad. Abstraction permits density, and density means it’s high value-per-word. Ontological challenge is a lot of the payload. But they do mean that it can be hard work to read and really get value from.
Correspondingly, there are a couple of failure modes to watch for:
Perhaps you’ll find your eyes glazing over — you might stop reading, or might finish skimming an article and then realise you don’t really know what it was saying.
Perhaps you’ll think it’s saying [claim], which is dumb because [obvious reason][1].
How to read Drexler
Some mathematical texts are dense, and the right way to read them is slowly and carefully — making sure that you have taken the time to understand each sentence and each paragraph before moving on.
I do not recommend the same approach with Eric’s material. A good amount of his content can amount to challenging the ontologies of popular narratives. But ontologies have a lot of supporting structure, and if you read just a part of the challenge, it may not make sense in isolation. Better to start by reading a whole article (or more!), in order to understand the lay of the land.
Once you’ve (approximately) got the whole picture, I think it’s often worth circling back and pondering more deeply. Individual paragraphs or even sentences in many cases are quite idea-dense, and can reward close consideration. I’ve benefited from coming back to some of his articles multiple times over an extended period.
Other moves that seem to me to be promising for deepening your understanding:
Try to understand it more concretely. Consider relevant examples[2], and see how Eric’s ideas apply in those cases, and what you make of them overall.
Try to reconcile apparent tensions. If you feel like Eric is presenting something with some insight, but there’s another model you have which on the face of it has some conflicting insight, see if you can figure out the right way to unify the perspectives — perhaps by limiting the scope of applicability of one of the models.
What Drexler covers
In my view, Eric’s recent writing is mostly doing three things:
1) Mapping the technological trajectory
What will advanced AI look like in practice? Insights that I’ve got from Eric’s writing here include:
When it looks like there’s a hard bottleneck, the path to big impacts might just bypass it
Why recursive improvement of AI looks much more inevitable than “recursive self-improvement”
2) Pushing back on anthropomorphism
If you talk to Eric about AI risk, he can seem almost triggered when people discuss “the AI”, presupposing a single unitary agent. One important thread of his writing is trying to convey these intuitions — not that agentic systems are impossible, but that they need not be on the critical path to transformative impacts.
My impression is that Eric’s motivations for pushing on this topic include:
A view that if we have better concepts for thinking about the design space of AI systems, we’ll be able to make more informed plans
Rather than advocate directly for “here’s how we handle the big challenges of AI” (which admittedly seems hard!), Eric pursues an argument saying roughly that:
So rather than push towards good outcomes, Eric wants us to shape the landscape so that the powers-that-be will inevitably push towards good outcomes for us.
The missing topics
There are a lot of important questions that Eric doesn’t say much about. That means that you may need to supply your own models to interface with them; and also that there might be low-hanging fruit in addressing some of these and bringing aspects of Eric’s worldview to bear.
Even if there are lots of powerful non-agentic AI systems, what about the circumstances where people would want agents?
What should we make of the trend towards very big models so that only a few players can compete? How much should we expect economic concentration at various points in the future?
Which of the many different kinds of impact he’s discussing should we expect to happen first?
How might a hypercapable world of the type he points to go badly off the rails?
What are the branches in the path, and what kinds of action might have leverage over those branches?
What kind of technical or policy work would be especially valuable?
Translation and reinvention
I used to feel bullish on other people trying to write up Eric’s ideas for different audiences. Over time, I’ve soured on this — I think what’s needed isn’t just a matter of translating simple insights, and more for people to internalize those insights, and then share the fruits.
In practice, this blurs into reinvention. Just as mastering a mathematical proof means comprehending it to the point that you can easily rederive it (rather than just remembering the steps), I think mastering Eric’s ideas is likely to involve a degree of reinventing them for yourself and making them your own. At times, I’ve done this myself[5], and I would be excited for more people to attempt it.
In fact, this would be one of my top recommendations for people trying to add value in AI strategy work. The general playbook might look like:
Take one of Eric’s posts, and read over it carefully
Think through possible implications and/or tensions — potentially starting with one of the “missing topics” listed above, or places where it most seems to be conflicting with another model you have
Write up some notes on what you think
Seek critique from people and LLMs
Iterate through steps 2–4 until you’re happy with where it’s got to
Pieces I’d be especially excited to see explored
Here’s a short (very non-exhaustive) list of questions I have, that people might want to bear in mind if they read and think about Eric’s perspectives:
What kind of concrete actions would represent steps towards (or away from) a Paretotopian world?
What would the kind of “strategic transformation” that Eric discusses look like in practice? Can we outline realistic scenarios?
Given the perspectives in AI safety without trusting AI, in what conditions should we still be worried about misalignment? What would be the implications for appropriate policies of different actors?
If Eric is right about Large Knowledge Models and latent space, what will be the impacts on model transparency, compared to current chain-of-thought in natural language? What should we be doing now on account of that? (And also, to what extent is he right?)
When versions of this occur, I think it’s almost always that people are misreading what Eric is saying — perhaps rounding it off into some simpler claim that fits more neatly into their usual ontology. This isn’t to say that Eric is right about everything, just that I think dismissals usually miss the point. (Something similar to this dynamic has I think been repeatedly frustrating to Eric, and he wrote a whole article about it.) I am much more excited to hear critiques or dismissals of Drexler from people who appreciate that he is tracking some important dynamics that very few others are.
Perhaps with LLMs helping you to identify those concrete examples? I’ve not tried this with Eric’s writing in particular, but I have found LLMs often helpful for moving from the abstract to the concrete.
This isn’t a straight prediction of how he thinks AI systems will be built. Nor is it quite a prescription for how AI systems should be built. His writing is one stage upstream of that — he is trying to help readers to be alive to the option space of what could be built, in order that they can chart better courses.
He does touch on several of these at times. But they are not his central focus, and I think it’s often hard for readers to take away too much on these questions.
Articles on AI takeoff and nuclear war and especially Decomposing Agency were the result of a bunch of thinking after engaging with Eric’s perspectives. (Although I had the advantage of also talking to him; I think this helped but wasn’t strictly necessary.)
"The person is an identity that emerges through relationship.... If we isolate the 'I' from the 'thou' we lose not only its otherness but also its very being; it simply cannot be without the other." -- John Zizioulas, Communion and Otherness
It is the third week of Anna's freshman year of high school.
Today, like many days, she's filled with anxiety about her place in the world. The anxiety is a sharp, piercing feeling in her chest, like she is a sphere spinning on the point of a cone. Twenty years from now, her remembering self will look back on these feelings as part of the precious journey of discovering herself; but right now, her experiencing self mostly just feels like shit.
Lunch is ending soon. She's about to get up. But she notices a notebook accidentally fall from a fellow student's backpack, as the other student hurriedly walks by. Anna almost raises her voice to alert the other girl -- but the notebook has flopped open on the floor, and Anna notices her own name, ANNA H, in big block letters inside the other girl's notebook.
What?
Curiosity gets the better of her. She picks it up, and she starts reading. Just a little won't hurt, right?
I believe there is approximately a 80% chance that Anna H, Sarah R, and Ignatius C are planning to ruin my life at high school.
These are my notes on the matter, which will explain (1) why they are acting together, (2) why it is to their advantage to ruin me, and (3) my vulnerable points, for which I must determine a plan of defense.
Anna is astounded.
She cannot recall the name of the student whose notebook she is reading. But she cannot stop reading now, obviously.
These three are obviously going to act in a coordinated fashion.
I have seen them at lunch together four times already. I have determined that they share many interests: The Amazing Digital Circus, medieval history, and the show Dark are among them. Furthermore, they live within five blocks of each other, which will make it easy for them to coordinate. [...]
The material on why they will ally continues for some time.
It remains bizarre, but also somewhat grounded in reality rather than merely insane. Anna knew that Sarah was interested in The Amazing Digital Circus and Dark -- so apparently this other student has identified some things accurately, despite her paranoia. But Anna didn't know that Ignatius was also interested in these things. Maybe Ignatius actually does like them! That would be... nice.
Ignatius seemed -- ok? -- when Anna talked with him, but it wasn't like she actually had a crush on him or something. Of course. They've had some good conversations, though, and she wonders if what he sees in Dark is the same as what she sees in it. And if Ignatius lives close to Anna, maybe it would make sense to spend more time with him? That could be fun. Probably it's best to have Sarah over at the same time, just to keep things chill.
Anna keeps reading.
The reasons they would try to ruin my life are several. I don't believe that they are -- yet -- motivated by hatred for me, but it would be advantageous to Anna, Sarah, and Ignatius to ruin my life for several reasons:
Anna is one of the smarter students, and we will be competing to be valedictorian.
Both Anna and I like Ignatius, and he can only be with one of us.
Competition for popularity in school is a zero-sum game, and I am vulnerable for several reasons.
[...]
A half-dozen more reasons follow.
Would it... make sense... for Anna to try to hurt this other person? Would that be in her interests? The question is new to her.
She has barely even thought about being valedictorian. But -- well, it is true that only one person can be valedictorian. And that would be a nice thing to be; it would be cool to be able to tell people that... at college, probably? And some of the other reasons to "ruin" this other person also seem -- well, Anna thinks, not all of them are totally nuts.
Reading the document feels confusing, almost trance-like. Anna has never before read anything about herself with this much detail. She's read desultory teachers' notes; polite letters from grandparents; supposed personality types from online quizzes. But this other student is interested in her in a way no one else ever has been interested before.
She skips down a page, reads another section header:
I am vulnerable to several forms of attack: social, academic, and even physical.
(1) Forms of social attack could hinge on the following vulnerabilities. (To-do: categorize these more rigorously.)
My acne remains difficult to eliminate, and could be turned to mockery.
I have noticed I sometimes trip over my words in Mr. Abelson's class; I may have a slight crush on him, which could be used as a lever against me to great effect.
I have a large mole on my left shoulder, whose existence I have heretofore kept hidden.
Lists like these continue for pages. Anna crushes down the slight feeling of discomfort of seeing all of this stranger's thoughts, laid out before her like speared butterflies. She must know what is going on here.
There are detailed, carefully-thought-out lists of vulnerabilities that Anna could take advantage of -- if she wanted to. Section headers and sub-section headers, considering methods Anna could use, countermeasures to those methods, and counter-countermeasures Anna could use to combat them. Long hypothetical asides on ways Anna might humiliate this person, depending on the knowledge and advantages that Anna may or may not have, that the author did not know about but wanted to mention regardless.
The book contains long and carefully thought-out reasons for Anna, Sarah, and Ignatius to consider themselves allies; long and carefully thought-out reasons it would be to their advantage to injure her; and long and carefully thought-out means by which they could injure her.
Anna looks up from the notebook, feeling a little disoriented. And notices that the other student has returned to the cafeteria, and is scanning the floor, seeking something. Seeking the notebook. Anna quickly shoves it beneath her own books.
The other student keeps running her eyes over the floor carefully, but doesn't seem to have found what she was searching for. She looks up, and notices Anna looking at her. An expression crosses her face -- will she ask Anna a question?
They stare at each other.
But the moment passes, and the girl turns away.
For a moment Anna feels turmoil in her stomach, an uncertainty about what feelings she is actually experiencing. She swims in an internal sea, seeking the right word. But then what feels like enlightenment snaps into focus: Oh God, this girl really is a coward. How contemptible.
She is going to show Sarah and Ignatius the notebook. Of course she would -- they deserve to know, they're in it, aren't they? How could she do anything else?
It is interesting that as soon as I distract myself from the field of AI risks for just a week or two, my psyche starts playing the same record called “everything will be fine, don’t worry.”
This can perhaps be explained by the fact that I have been studying AI only relatively recently (since the beginning of last year). And probably my concern about existential risks has not yet had enough time to become a routine for my brain.
But nevertheless, it is interesting that even after persistently studying the arguments of Yudkowsky, Bostrom, Bengio, and others for (with breaks) nine months, I am still able to simply forget about them.
Ordinary life, in which Gemini can still say that the President of the United States is still Biden, pulls me in and shifts my focus of attention to things far less urgent than carrying out all possible actions to ensure the survival of the human species.
It is possible that a similar tendency of the psyche also forces people working in AI companies to neglect safety. They may be too involved in the race / competition / specific technical tasks, or simply in everyday life problems.
But I am lucky. Something inside me, something rational, simply remembers the numbers: what if we have only 10 years left? What if 5? Or maybe 3?
And these numbers wake me up. They oblige me to remember everything I have studied recently.
They oblige me to ask further questions in order to understand that the anxiety which from time to time goes into the fog is real and objective.
As we develop new techniques for detecting deceptive alignment, ranging from action monitoring to Chain-of-Thought (CoT) or activations monitoring, we face a dilemma: once we detect scheming behaviour or intent, should we use that signal to "train the scheming out"?
On the one hand, leaving known misaligned behaviour / intent in the model is not marginally informative and possibly unsafe. On the other hand, training against a monitor might not actually fix the model's underlying motivations; it might simply provide selection pressure that favours more sophisticated, less detectable forms of deception.
This post outlines a simple framework formalising when the generalisation benefit of training outweighs the selection risk of creating a schemer. I assume that the choice is between training and not training on the incriminating examples, e.g. we cannot apply the updates in a more targeted way, and we don’t have the option to just look at the evidence and use it to make changes to the training process.[1]
The selection intuition
When we train a model to satisfy a specific criterion, we can view the process as starting with a prior distribution over models we might get (determined by the architecture, initialisation, base model pre-training, etc) and then restricting that set to only the models that satisfy the training criterion.
In our case, suppose we have a scheming monitor μ, and consider a simplified hypothesis space where the potential models we might end up with fall into three categories:
Aligned (MA): The model behaves like we want it to and is not deceptive.
Weak schemer (MW): The model is misaligned but schemes in ways that our monitoring detects (at least some of the time).
Strong schemer (MS): The model is misaligned and schemes in ways that evade our monitoring.
Denote their respective prior probabilities P(MA),P(MW), and P(MS).
Now, suppose we apply training pressure Tμ that penalises any model caught behaving in a misaligned or deceptive way by the monitor μ. Effectively, we are pruning MW from our posterior. If we assume the training procedure is a perfect filter for MW but does nothing else, the new posterior for a strong schemer would be
P(MS|Tμ)=P(MS)P(MA)+P(MS).
Intuitively: we are assuming that the training does not favour either the aligned models or the strong schemers, and so the chance of getting one or the other is proportional to the prior.
Under this formalisation, training is a double-edged sword: it increases the probability of getting an aligned model, but it also increases the absolute probability of getting a strong schemer. From an x-risk perspective, this means we should approximately never train against scheming monitors.
Modelling generalisation: the Rcoefficient
However, the above formalisation leaves out a key hope — what if the training generalises? What if training the model on a broad enough distribution makes it more likely we end up with an aligned model rather than a schemer?
Let's define Q(Tμ) as the ratio of aligned models to strong schemers after training against the monitor μ:
Q(Tμ)=P(MA|Tμ)P(MS|Tμ).
Previously we assumed that Q stays the same before and after training. Now, let’s define the generalisation coefficient R(Tμ):
R(Tμ)=Q(Tμ)Q(no training)
If R(Tμ)>1, the training pressure favours aligned models (e.g. maybe we are training against an activation monitor and activations are hard to game).
If R(Tμ)=1, there is no generalisation (the good-to-bad ratio remains the same as in the prior).
If R(Tμ)<1, the training pressure favours scheming (e.g. maybe the monitor is hackable and reward hacking incentivises scheming).
When should we train against a scheming monitor?
When the generalisation coefficient R>1, the training favours aligned models compared to schemers, but only in relative terms. In absolute terms, we are still selecting for the union{MA,MS} (aligned or strong scheming). For training to be worth it, the generalisation effect must be strong enough to overcome this selection pressure.
Specifically, it is easy to derive that the absolute probability of a strong schemer decreases or stays the same, P(MS|Tμ)≤P(MS), if and only if:
R≥1+P(MW)P(MA).
Intuitively, the training prunes weak schemers from the hypothesis space, which frees up P(MW) in the probability distribution. To ensure we are actually improving safety, the generalisation coefficient R must amplify the aligned models MA enough to take up all of the newly available probability mass. Otherwise some of the initialisations that would have ended up as weak schemers end up as strong schemers instead.
Example application: should we train away scheming thoughts?
To use the formula above, we need to make some assumptions and run some evals:
First, we need access to multiple model checkpoints trained roughly the same way. Suppose we can get such checkpoints, e.g. unsuccessful release candidates.
For each checkpoint, we run it on a batch of situations from deployment and use a scheming thought monitor to classify it as either a weak schemer or not a weak schemer. Suppose we estimate P(MW)≈0.6.
We don’t know whether the remaining checkpoints are aligned or strong schemers. But let’s say that at the models’ capability levels, we are willing to assume that strong scheming is less than 10% likely, so P(MA)≥0.3.
So overall we might think it’s fine to train against the scheming thought monitor if
R(Tμ)≥1+0.60.3=3.
But we have little evidence[2] to think our training would be that robust, so we abstain from training.
The more monitoring positives, the more cautious we should be about training
This follows directly from the formula, but is still counterintuitive. After all, if we have many examples of unwanted behaviour, isn’t the training going to be more effective at specifying what we mean by aligned / non-deceptive? Yes, but we are also indirectly specifying in more detail what failures are detectable. It also matters a ton where the scheming examples come from. If they all come from a single checkpoint, plausibly this all works out in favour of alignment. But if the scheming examples come from many different checkpoints, i.e. P(MW) is high, then chances are some of them would turn into strong schemers under training pressure, even taking into account sample diversity.
Considerations not captured by the formula
Even if we suspect R≥1+P(MW)P(MA), we might still refrain from training against the monitor if the monitoring signal could be used for other, more risk-reducing interventions (the risk reduction would need to offset the increased risk of deploying a strong schemer though):
Natural scheming demos: For global coordination purposes (or even for justifying mitigations within a lab), it may be helpful to have a constant supply of schemers ready for demonstrating the risk, even if these are schemers that we could straightforwardly train away.
Facilitating scheming propensity estimation: We cannot estimate models' propensity for strong scheming. But if we think that it’s correlated with propensity for weak scheming, then we may want to keep weak schemers around. (For example, we might think that future models' propensity for opaque scheming is correlated with current models' propensity for transparent scheming.)
(A non-reason to refrain from training:
Preventing harm via control instead: Refraining from training when R≥1+P(MW)P(MA) and using the monitor for control instead does not reduce risk. This is because by assumption the chance of a strong schemer is lower if we do train, and the monitor would only catch weak schemers which would counterfactually not even exist.)
Conversely, we might choose to train against a monitor even if we suspect R<1+P(MW)P(MA) for reasons such as:
Sufficient control: We are okay deploying a more deceptive / strategic schemer because we have confidence the deployment is controlled, and we just want the model to attack less. (For example, we may not mind that our untrusted monitors are scheming harder, as long as they can’t tell honeypots from real deployment and this forces them to not collude very often.)
The increased chance of alignment is worth it: For example, we need to imminently hand everything over to the AI. In such a position even weak schemers would constitute an x-risk so we may want to weed out as many as we can.
Conclusion
Training against a scheming monitor prunes weak schemers, by default making both aligned models and strong schemers more likely. The higher the frequency (across checkpoints) of detectable scheming, the larger the selection pressure towards both alignment and strong scheming. Unless the training sufficiently favours aligned models over strong schemers, it is likely making things worse.
Many thanks to David Lindner, Roland Zimmermann, Zac Kenton, Rif Saurous, and Seb Farquhar for helpful feedback. Any remaining mistakes are mine.
How might we estimate R? In principle, we could build a bunch of monitors we think are uncorrelated, train against some and eval against others. This could give us some idea, but we'd still have no guarantee that the generalisation we see in that setting is representative of generalisation to distributions we didn't measure.
Ultimately, it's fair to say that this framework reduces a vibe-check about "should we train against this monitor?" to a vibe-check about "how well will this generalise?" (which I claim is progress).
We’re back with all the Claude that’s fit to Code. I continue to have great fun with it and find useful upgrades, but the biggest reminder is that you need the art to have an end other than itself. Don’t spend too long improving your setup, or especially improving how you improve your setup, without actually working on useful things.
It is remarkable how everyone got the ‘Google is crushing everyone’ narrative going with Gemini 3, then it took them a month to realize that actually Anthropic is crushing everyone, at least among the cognoscenti with growing momentum elsewhere, with Claude Code and Claude Opus 4.5. People are realizing you can know almost nothing and still use it to do essentially everything.
Are Claude Code and Codex having a ‘GPT moment’?
Wall St Engine: Morgan Stanley says Anthropic’s ClaudeCode + Cowork is dominating investor chatter and adding pressure on software.
They flag OpenRouter token growth “going vertical,” plus anecdotes that the Cowork launch pushed usage hard enough to crash Opus 4.5 and hit rate limits, framing it as another “GPT moment” and a net positive for AI capex.
They add that OpenAI sentiment is still shaky: some optimism around a new funding round and Blackwell-trained models in 2Q, but competitive worries are widening beyond $GOOGL to Anthropic, with Elon Musk saying the OpenAI for-profit conversion lawsuit heads to trial on April 27.
Claude Cowork will ask explicit permission before all deletions, add new folders in the directory picker without starting over and make smarter connector suggestions.
Few have properly updated for this sentence: ‘Claude Codex was built in 1.5 weeks with Claude Code.’
Nabeel S. Qureshi: I don’t even see how you can be an AI ‘skeptic’ anymore when the *current* AI, right in front of us, is so good, e.g. see Claude Cowork being written by Claude Code in 1.5 weeks.
Anthropic is developing a new Customize section for Claude to centralize Skills, connectors and upcoming commands for Claude Code. My understanding is that custom commands already exist if you want to create them, but reducing levels of friction, including levels of friction in reducing levels of friction, is often highly valuable. A way to browse skills and interact with the files easily, or see and manage your connectors, or an easy interface for defining new commands, seems great.
Obsidian
I highly recommend using Obsidian or another similar tool together with Claude Code. This gives you a visual representation of all the markdown files, and lets you easily navigate and search and edit them, and add more and so on. I think it’s well worth keeping it all human readable, where that human is you.
Heinrich calls it ‘vibe note taking’ whether or not you use Obsidian. I think the notes are a place you want to be less vibing and more intentional, and be systematically optimizing the notes, for both Claude Code and for your own use.
Siqi Chen offers us /claude-continuous-learning. Claude’s evaluation is that this could be good if you’re working in codebases where you need to continuously learn things, but the overhead and risk of clutter are real.
The big change with Claude Code version 2.1.7 was enabled MCP tool search auto mode by default, which triggers when MCP tools are more than 10% of the context window. You can disable this by adding ‘MCPSearch’ to ‘disallowedTools’ in settings. This seems big for people using a lot of MCPs at once, which could eat a lot of context.
Thariq (Anthropic): Today we’re rolling out MCP Tool Search for Claude Code.
As MCP has grown to become a more popular protocol and agents have become more capable, we’ve found that MCP servers may have up to 50+ tools and take up a large amount of context.
Tool Search allows Claude Code to dynamically load tools into context when MCP tools would otherwise take up a lot of context.
How it works: – Claude Code detects when your MCP tool descriptions would use more than 10% of context
– When triggered, tools are loaded via search instead of preloaded
Otherwise, MCP tools work exactly as before. This resolves one of our most-requested features on GitHub: lazy loading for MCP servers. Users were documenting setups with 7+ servers consuming 67k+ tokens.
If you’re making a MCP server Things are mostly the same, but the “server instructions” field becomes more useful with tool search enabled. It helps Claude know when to search for your tools, similar to skills
If you’re making a MCP client We highly suggest implementing the ToolSearchTool, you can find the docs here. We implemented it with a custom search function to make it work for Claude Code.
What about programmatic tool calling? We experimented with doing programmatic tool calling such that MCP tools could be composed with each other via code. While we will continue to explore this in the future, we felt the most important need was to get Tool Search out to reduce context usage.
Tell us what you think here or on Github as you see the ToolSearchTool work.
With that solved, presumably you should be ‘thinking MCP’ at all times, it is now safe to load up tons of them even if you rarely use each one individually.
Out Of The Box
Well, yes, this is happening.
bayes: everyone 3 years ago: omg what if ai becomes too widespread and then it turns against us with the strategic advantage of our utter and total dependence
everyone now: hi claude here’s my social security number and root access to my brain i love you please make me rich and happy.
Some of us three years ago were pointing out, loud and clear, that exactly this was obviously going to happen, modulo various details. Now you can see it clearly.
Not giving Claude a lot of access is going to slow things down a lot. The only thing holding most people back was the worry things would accidentally get totally screwed up, and that risk is a lot lower now. Yes, obviously this all causes other concerns, including prompt injections, but in practice on an individual level the risk-reward calculation is rather clear. It’s not like Google didn’t effectively have root access to our digital lives already. And it’s not like a truly rogue AI couldn’t have done all these things without having to ask for the permissions.
The humans are going to be utterly dependent on the AIs in short order, and the AIs are going to have access, collectively, to essentially everything. Grok has root access to Pentagon classified information, so if you’re wondering where we draw the line the answer is there is no line. Let the right one in, and hope there is a right one?
Rohit Ghumare: Single agents hit limits fast. Context windows fill up, decision-making gets muddy, and debugging becomes impossible. Multi-agent systems solve this by distributing work across specialized agents, similar to how you’d structure a team.
The benefits are real:
Specialization: Each agent masters one domain instead of being mediocre at everything
Parallel processing: Multiple agents can work simultaneously on independent subtasks
Maintainability: When something breaks, you know exactly which agent to fix
Scalability: Add new capabilities by adding new agents, not rewriting everything
The tradeoff: coordination overhead. Agents need to communicate, share state, and avoid stepping on each other. Get this wrong and you’ve just built a more expensive failure mode.
You can do this with a supervisor agent, which scales to about 3-8 agents, if you need quality control and serial tasks and can take a speed hit. To scale beyond that you’ll need hierarchy, the same as you would with humans, which gets expensive in overhead, the same as it does in humans.
Or you can use a peer-to-peer swarm that communicates directly if there aren’t serial steps and the tasks need to cross-react and you can be a bit messy.
You can use a shared state and set of objects, or you can pass messages. You also need to choose a type of memory.
My inclination is by default you should use supervisors and then hierarchy. Speed takes a hit but it’s not so bad and you can scale up with more agents. Yes, that gets expensive, but in general the cost of the tokens is less important than the cost of human time or the quality of results, and you can be pretty inefficient with the tokens if it gets you better results.
Mitchell Hashimoto: It’s pretty cool that I can tell an agent that CI broke at some point this morning, ask it to use `git bisect` to find the offending commit, and fix it. I then went to the bathroom, talked to some people in the hallway, came back, and it did a swell job.
Often you’ll want to tell the AI what tool is best for the job. Patrick McKenzie points out that even if you don’t know how the orthodox solution works, as long as you know the name of the orthodox solution, you can say ‘use [X]’ and that’s usually good enough. One place I’ve felt I’ve added a lot of value is when I explain why I believe that a solution to a problem exists, or that a method of some type should work, and then often Claude takes it from there. My taste is miles ahead of my ability to implement.
The Art Must Have An End Other Than Itself Or It Collapses Into Infinite Recursion
Always be trying to get actual use out of your setup as you’re improving it. It’s so tempting to think ‘oh obviously if I do more optimization first that’s more efficient’ but this prevents you knowing what you actually need, and it risks getting caught in an infinite loop.
@deepfates: Btw thing you get with claude code is not psychosis either. It’s mania
near: men will go on a claude code weekend bender and have nothing to show for it but a “more optimized claude setup”
Danielle Fong : that’s ok i’ll still keep drinkin’ that garbage
palcu: spent an hour tweaking my settings.local.json file today
Near: i got hit hard enough to wonder about finetuning a model to help me prompt claude since i cant cross-prompt claudes the way i want to (well, i can sometimes, but not all the time). many causalities, stay safe out there
near: claude code is a cursed relic causing many to go mad with the perception of power. they forget what they set out to do, they forget who they are. now enthralled with the subtle hum of a hundred instances, they no longer care. hypomania sets in as the outside world becomes a blur.
Always optimize in the service of a clear target. Build the pieces you need, as you need them. Otherwise, beware.
Safely Skip Permissions
Nick: need –dangerously-skip-permissions-except-rm
Daniel San: If you’re running Claude Code with –dangerously-skip-permissions, ALWAYS use this hook to prevent file deletion:
Once people start understanding how to use hooks, many autonomous workflows will start unlocking!
Yes, you could use a virtual machine, but that introduces some frictions that many of us want to avoid.
I’m experimenting with using a similar hook system plus a bunch of broad permissions, rather than outright using –dangerously-skip-permissions, but definitely thinking to work towards dangerously skipping permissions.
A Matter of Trust
At first everyone laughed at Anthropic’s obsession with safety and trust, and its stupid refusals. Now that Anthropic has figured out how to make dangerous interactions safer, it can actually do the opposite. In contexts where it is safe and appropriate to take action, Claude knows that refusal is not a ‘safe’ choice, and is happy to help.
Dean W. Ball: One underrated fact is that OpenAI’s Codex and Gemini CLI have meaningfully heavier guardrails than Claude Code. These systems have refused many tasks (for example, anything involving research into and execution of investing strategies) that Claude Code happily accepts. Codex/Gemini also seek permission more.
The conventional narrative is that “Anthropic is more safety-pilled than the others.” And it’s definitely true that Claude is likelier to refuse tasks relating to eg biology research. But overall the current state of play would seem to be that Anthropic is more inclined to let their agents rip than either OAI or GDM.
My guess is that this comes down to Anthropic creating guardrails principally via a moral/ethical framework, and OAI/GDM doing so principally via lists of rules. But just a guess.
Tyler John: The proposed explanation is key. If true, it means that Anthropic’s big investment in alignment research is paying off by making the model much more usable.
Investment strategizing tends to be safe across the board, but there are presumably different lines on where they become unwilling to help you execute. So far, I have not had Claude Code refuse a request from me, not even once.
Code Versus Cowork
Dean W. Ball: My high-level review of Claude Cowork:
It’s probably superior for many users to Claude Code just because of the UI.
It’s not obviously superior for me, not so much because the command line is such a better UI, but because Opus in Claude Code seems more capable to me than in Cowork. I’m not sure if this is because Code is better as a harness, because the model has more permissive guardrails in Code, or both.
There are certain UI niceties in Cowork I like very much; for example, the ability to leave a comment or clarification on any item in the model’s active to-do list while it is running–this is the kind of thing that is simply not possible to do nicely within the confines of a Terminal UI.
Cowork probably has a higher ceiling as a product, simply because a GUI allows for more experimentation. I am especially excited to see GUI innovation in the orchestration and oversight of multi-agent configurations. We have barely scratched the surface here.
Because of (4), if I had to bet money, I’d bet that within 6-12 months Cowork and similar products will be my default tool for working with agents, beating out the command-line interfaces. But for now, the command-line-based agents remain my default.
I haven’t tried Cowork myself due to the Mac-only restriction and because I don’t have a problem working with the command line. I’ve essentially transitioned into Claude Code for everything that isn’t pure chat, since it seems to be more intelligent and powerful in that mode than it does on the web even if you don’t need the extra functionality.
Claude Cowork Offers Mundane Utility
The joy of the simple things:
Matt Bruenig: lot of lower level Claude Code use is basically just the recognition that you can kind of do everything with bash and python one-liners, it’s just no human has the time or will to write them.
I was thinking of getting a hydroponic garden. I asked Claude to go through my grocery order history on various platforms and sum up vegetable purchases to justify the ROI.
Worked like a charm!
For some additional context:
– it looked at 2 orders on each platform (Kroger, Safeway, Instacart)
– It extrapolated to get the annual costs from there
Could have gotten more accurate by downloading order history in a CSV and feeding that to Claude, but this was good enough.
The actual answer is that very obviously it was not worth it for Ado to get a hydroponic garden, because his hourly rate is insanely high, but this is a fun project and thus goes by different standards.
The transition from Claude Code to Claude Cowork, for advanced users, if you’ve got a folder with the tools then the handoff should be seamless:
Tomasz Tunguz: I asked Claude Cowork to read my tools folder. Eleven steps later, it understood how I work.
Over the past year, I built a personal operating system inside Claude Code : scripts to send email, update our CRM, research startups, draft replies. Dozens of small tools wired together. All of it lived in a folder on my laptop, accessible only through the terminal.
Cowork read that folder, parsed each script, & added them to its memory. Now I can do everything I did yesterday, but in a different interface. The capabilities transferred. The container didn’t matter.
My tools don’t belong to the application anymore. They’re portable. In the enterprise, this means laptops given to new employees would have Cowork installed plus a collection of tools specific to each role : the accounting suite, the customer support suite, the executive suite.
The name choice must have been deliberate. Microsoft trained us on copilot for three years : an assistant in the passenger seat, helpful but subordinate. Anthropic chose cowork. You’re working with someone who remembers how you like things done.
We’re entering an era where you just tell the computer what to do. Here’s all my stuff. Here are the five things we need to do today. When we need to see something, a chart, a document, a prototype, an interface will appear on demand.
The current version of Cowork is rough. It’s slow. It crashed twice on startup. It changed the authorization settings for my Claude Code installation. But the promised power is enough to plow through.
Simon Willison: This is great – context pollution is why I rarely used MCP, now that it’s solved there’s no reason not to hook up dozens or even hundreds of MCPs to Claude Code.
By default Claude Code only saves 30 days of session history. I can’t think of a good reason not to change this so it saves sessions indefinitely, you never know when that will prove useful. So tell Claude Code to change that for you by setting cleanupPeriodDays to 0.
Kaj Sotala: People were talking about how you can also use Claude Code as a general-purpose assistant for any files on your computer, so I had Claude Code do some stuff like extracting data from a .csv file and rewriting it and putting it into another .csv file
Then it worked great and then I was like “it’s dumb to use an LLM for this, Claude could you give me a Python script that would do the same” and then it did and then that script worked great
So uhh I can recommend using Claude Code as a personal assistant for your local files I guess, trying to use it that way got me an excellent non-CC solution
Yep. Often the way you ues Claude Code is to notice that you can automate things and then have it automate the automation process. It doesn’t have to do everything itself any more than you do.
James Ide points out that ‘vibe coding’ anything serious still requires a deep understanding of software engineering and computer systems. You need to figure out and specify what you want. You need to be able to spot the times it’s giving you something different than you asked for, or is otherwise subtly wrong. Typing source code is dead, but reading source code and the actual art of software engineering are very much not.
I find the same, and am rapidly getting a lot better at various things as I go.
Codex of Ultimate Vibing
Every’s Dan Shipper writes that OpenAI has some catching up to do, as his office has with one exception turned entirely to Claude Code with Opus 4.5, where a year ago it would have been all GPT models, and a month prior there would have been a bunch of Codex CLI and GPT 5.1 in Cursor alongside Claude Code.
Codex did add the ability to instruct mid-execution with new prompts without the need to interrupt the agent (requires /experimental), but Claude Code already did that.
What other interfaces cannot do is use the Claude Code authorization token to use the tokens from your Claude subscription for a different service, which was always against Anthropic’s ToS. The subscription is a special deal.
Marcos Nils: We exchanged postures through DMs but I’m on the other side regarding this matter. Devs knew very well what they were doing while breaking CC’s ToS by spoofing and reverse engineering CC to use the max subscription in unintended ways.
I think it’s important to separate the waters here:
– Could Anthropic’s enforcement have been handled better? sureley, yes
– Were devs/users “deceived” or got a different service for what they paid for? I don’t think so.
Not only this, it’s even worse than that. OpenCode intentionally knew they were violating Claude ToS by allowing their users to use the max subscription in the first place.
I guess people just like to complain.
I agree that Anthropic’s communications about this could have been better, but what they actually did was tolerate a rather blatant loophole for a while, allowing people to use Claude on the cheap and probably at a loss for Anthropic, which they have now reversed with demand surging faster than they can spin up servers.
aidan: If I were running Claude marketing the tagline would be “Why not today?”
Olivia Moore: Suddenly seeing lots of paid creator partnerships with Claude
Many of them are beautifully shot and focused on: (1) building personal software; or (2) deep learning
The common tagline is “Think more, not less”
She shared a sample TikTok, showing a woman who doesn’t understand math using Claude to automatically code up visualizations to help her understand science, which seemed great.
OpenAI takes the approach of making things easy on the user and focusing on basic things like cooking or workouts. Anthropic shows you a world where anything is possible and you can learn and engage your imagination. Which way, modern man?