2026-03-17 12:42:44
I’m pleased to report that I have no new AI-related crises for you this week. Instead we get to focus on the fun parts, starting with physical constraints on AI development. Dylan Patel explains how power, GPUs, and memory will each be crucial bottlenecks on AI development over the next few years. Turning our attention to AI itself, we'll ask two leading neuroscientists whether AI is likely to become conscious (conclusion: probably yes, or almost certainly not).
AI is doing fascinating things to programmers: for many of us, this moment is simultaneously exhilarating and slightly heartbreaking. We’ll look at one high level overview of how AI is affecting programming, and one deeply personal reflection on that same topic. Programmers aren’t the only ones being disrupted: prinz joins us to argue that while the legal profession will survive AI, the big law firms will not.
If you’re here for the AI, it may not be clear to you why you should listen to a two and a half hour podcast about semiconductors. But this one features Dwarkesh Patel and Dylan Patel and it’s really good—it’s super interesting, but also maps out some of the most important strategic questions that will shape AI over the next few years. A few highlights:
It’s a really good podcast—go listen to it (or read the transcript).
Zvi reviews GPT-5.4. This looks like a very substantial upgrade, and it’s getting great reviews. If you use AI heavily and you haven’t played with GPT in a while, now is a good time to give it another try.
Nice: Opus 4.6 and Sonnet 4.6 now have a 1 million token context window. I spent much of this weekend coding and the bigger context window was fantastic.
Andrej Karpathy continues to push the frontier of one-person AI development. His most recent project is autoresearch: an autonomous AI system that makes improvements to his nanochat AI:
This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones.
If you want to go deeper, here’s a great annotated version of the prompt.
I love coding in 2026: I’m several times more productive than I’ve ever been before, and it’s absolutely intoxicating. You can have my agentic coding models when you pry them from my cold, dead fingers. But at the same time, I mourn the loss of parts of my craft that just a year ago were important parts of my identity.
This week brings two very different issues exploring how programmers are adapting to agentic coding. Clive Thompson has a carefully researched piece for the NY Times ($), and James Randall has a deeply personal reflection.
Ajeya Cotra shares some very interesting thoughts on METR’s time horizon metric. This piece has received attention because she’s changing her January prediction that the metric will reach 24 hours by the end of this year. Based on recent progress (it’s already reached 12 hours), she’s now predicting 100 hours by the end of the year.
Even more interesting to me is her discussion of how the metric starts to fall apart beyond a certain point. She suggests that almost no tasks really have a one year time horizon: software tasks that would take a human a year to complete are really a collection of multi-day or maybe multi-week tasks that are largely independent.
We’re quickly running out of traditional benchmarks that can usefully measure the capability of frontier models. Where we’re going, there is no map and no speedometer.
One criticism of Claude’s Constitution is “that couldn’t possibly work”. aryaj investigated how well it’s working as part of the MATS program. The results are far from definitive, but very encouraging:
Anthropic has gotten much better at training the model to follow its constitution! Sonnet 4.6 has a 1.9% violation rate, Opus 4.6 is at 2.9%, and Opus 4.5 is at 4.4%.
As a control, Sonnet 4, which did not have special soul doc training, has a ~15.00% violation rate.
Might China be open to an international treaty to pause AI development? In part, that depends on how concerned China is about AI safety, which is complicated. On the one hand, China takes AI safety much more seriously than the US, requiring all AI products to obtain an extensive AI safety certification. On the other hand, “AI safety” is more concerned with ideological correctness and “core socialist values” than existential risk.
ChinaTalk explores the business side of AI safety compliance in China, shedding light on a field I previously knew very little about.
The AI Whistleblower Initiative presents 6 in-depth profiles of whistleblowers at AI companies, exploring the concerns they raised, what impact they had, and what cost they paid.
prinz believes BigLaw will not survive the AI era. He argues that with AI, a senior partner plus a small number of specialists and support staff will be able to do everything a BigLaw firm does today.
This is a likely path for many professions: with AI, the best people in a field can do far more than previously (and get paid accordingly). But the rank and file will find themselves increasingly unemployable.
Dwarkesh wades into the DoW / Anthropic dispute. I don’t agree with everything here, but it’s a really good piece that explores some of the very challenging questions about who gets to make the big decisions in our near future.
Our future civilization will run on AI labor. And as much as the government’s actions here piss me off, in a way I’m glad this episode happened - because it gives us the opportunity to think through some extremely important questions about who this future workforce will be accountable and aligned to, and who gets to determine that.
Are LLMs likely to become conscious as they approach human-level intelligence? That’s a highly contested topic, with lots of strongly held opinions but not a lot of evidence. Even experts on consciousness can’t seem to agree: this week brings us opposing opinions from two well-regarded experts.
Michael Graziano (originator of Attention Schema Theory) tells PRISM that AI consciousness seems likely, and argues that conscious AI might be safer than “zombie AI”.
In the opposing corner is Anil Seth (previously), with a short video presenting four reasons why he thinks AI consciousness is extremely unlikely.
I’ll publish a longer piece on Wednesday examining Anil’s argument in more detail (sneak preview: I have a lot of respect for him, but in this matter I think he’s overconfident).
If you’re interested in going deeper on AI psychology and welfare, here are three reading lists to get you started.
Robert Long presents an AI Welfare Reading List and a selection of readings on self knowledge and introspection. Both lists look excellent but focus heavily on academic papers.
Avi Parrack and Štěpán Los have put together a Digital Minds quickstart guide that might be more accessible to casual readers.
Anthropic is moving toward letting employees sell $6b worth of shares. A significant fraction of that is likely to be donated to effective altruism-aligned causes (which would be great) as well as AI safety causes (where it might make a very significant difference).
Transformer explores where the money might go.
Open models have struggled to gain widespread adoption: the best models are quite good, but simply can’t compete with the frontier. Nathan Lambert surveys the state of the open model ecosystem and explores where open models are most likely to succeed. I like his idea of open models that are cheap and fast and can be trained for specific tasks, though I’m not sure that will see widespread adoption in the near future.
New from NVIDIA: Nemotron 3 Super is an open model with strong performance and a ton of supporting data and training information. It’s not competitive with the frontier, but Nathan Lambert believes it’s a big deal for the open model world.
Out-of-context reasoning is “when an LLM reaches a conclusion that requires non-trivial reasoning but the reasoning is not present in the context window”. It is sometimes the result of reasoning during the training process, and sometimes (increasingly with large modern models) the result of computation that occurs during a single forward pass. Owain Evans has a short but helpful explainer.
Brain emulation has been making rapid progress. We’re still a very long way from being able to emulate a full human brain, but it now seems plausible that we might be less than a decade away from being able to emulate the brains of fruit flies or other relatively simple organisms.
Asimov Press and Maximilian Schons review what’s currently possible, discuss the technological obstacles that still need to be surmounted, and lay out a roadmap for achieving full emulation of a human brain.
I can confirm that food tastings are a fantastic and low-effort way to create shared experience, not to mention an excellent excuse to eat a lot of good food.
2026-03-17 12:13:07
AI researcher Andrej Karpathy just put into visuals something that I already knew: AI is coming for my job.
“Technical Writer” is not actually a big enough category to get its own rectangle within the technology group, but everything in the technology group is red, and red here means “high likelihood of AI replacement.”
It’s a vibes-based prediction, but it’s one we all agree with. The way LLM technology has played out these last few years, tech bros seem to be automating their own jobs away (while leaving many other jobs safe for many years), and we’re mostly still excited about it. And I find this whole situation funny. Let it never be said that tech bros were trying to rent-seek from the world, or gatekeep our expertise.
Months ago when Elon was leading the Department Of Government Efficiency, my friend commented on his apparent cunning and cynicism, to use the cover of “government efficiency” to corrupt the capital game so his own companies could win. I said I’m sure he’s enjoying whatever government subsidies he gets, but the reason he’s doing D.O.G.E. is because of that deep-seated “autistic” joy of making systems efficient. That’s it. For all of Elon’s flaws, he’s not a politician or a sociopath CEO, he’s a tech bro. And this is how tech bros are: not perfect capitalist robots driven to maximize their power, but virtue-driven, and the virtue is something like “Use tools well to make systems run better.”
Any ordinary tech bro would’ve been equally delighted to find themselves in Elon’s position, even without the fame and big money subsidies. It’s the love of the game. You’re shown a system (like the US government) that’s horribly inefficient and wasteful, and you get a chance to “use tools well to make systems run better.” Yes!
That doesn’t mean tech bros are automatically good: you can do a lot of harm by pursuing one virtue single-mindedly. But the proper use of tools is a virtue, not a cover for Machiavellian power-seeking like The New York Times would have you believe. And the AI revolution is now bearing that out.
At the end of January this year it felt like we really turned a corner on my team with how we’re using AI tools. It was a combination of the tech improving to a certain level and a few key individuals realizing what they could do and becoming catalysts for everyone else. I can use AI agents to do a lot of my typical work on our public documentation.
I’m not upset that my job is being automated away; really it’s every tech bro’s dream to automate their job away. The lucky ones will tell you, with a twinkle in their eyes, about the time they automated everything, “So I could just log in, push a button, and that was my work for the day.” Those situations don’t tend to last, but the tech bro doesn’t mind: the work is getting done, efficiently.
That’s why I’ll be happy to hand this job off to the robots. My job has been to write the docs, and now they’re getting written more efficiently. Soon this kind of work will just be in a state of done, permanently. And I’ll be happy to be there, witnessing and bringing about the new status quo. I didn’t “lose” my job, I finished it. The tech bro cannot stand doing work that’s less efficient than it could be—busywork—even if it puts money in his pocket.
Maybe I’ll manage to find more to do at Microsoft so they decide to keep paying me. Maybe Microsoft itself will find something for me; they’ve been very good to me so far. Or maybe they’ll lay me off. But in any case, my job, the pattern of work I was hired to do, is ending. It simply won’t be done that way, at any company, ever again.
I’m proud of all my fellow tech bros who are embracing chaotic change and leaving their old work behind to do bigger and better things. I know it’s not as if one person had the option to accelerate LLM tech and courageously chose to do it, but it was the collective choice of tech bros, a result of their shared character. It’s not in the tech bro’s nature to consider holding back the development of a tool for the sake of his own career stability. If a system can be run better, we will make it happen, and we’ll figure out the rest later. This is the tech bro ethos.
If you liked this post, consider subscribing to my personal blog at patrickdfarley.com.
2026-03-17 07:38:41
Software encodes information flows. An ERP system, for instance, takes procurement and locks it into a specific sequence of purchase orders, approval routing, invoice matching, and payment release. Git takes multiple people changing code and imposes a protocol of branching, diffing, reviewing, and merging. By codifying these information flows, software says how things should happen; it makes patterns repeatable and enforceable by expressing them in deterministic code.
Software took over the world because we learned to express useful actions into information flows, and then to express these information flows into deterministic code. The actions we could express this way were, by definition, within the space of consistent logical operations on rigid data structures.
Agents, too, encode information flows. And while they do so through the same infrastructure of software — that is, code — they are able to create information flows that are far more flexible. For one, they’re able to execute on more open-ended commands. But perhaps more importantly, agents work with the natural complexity of real-world tasks, rather than requiring that complexity be compressed into rigid data structures first. They do this by drawing on both system-specific information and the generalized knowledge they’ve absorbed through pre-training.
Further, as instances of software, they benefit from its useful properties — rerunnability, testability, and scalability. The potential for AI to replace human work hinges on this increasing flexibility as well as the practical advantages that come with being software.
But before the competition between AIs and humans plays out, we argue that agents first compete with the deterministic structures humans have encoded in software.
The existence of general reasoners pushes us towards replacing structured flows in our software stack with more general agent flows. The space of classical software it makes sense to build is shrinking. More and more software will be better described as agent software - infrastructural backbone in code calling various arrangements of agents in pre-defined or newly synthesized flows.
Repeatable and easy-to-measure components like infrastructure, systems, low-level transports, etc. will remain in code, albeit code probably optimized by models. When the user edits a collaborative document, there’s no need for an intelligent intermediary to save their changes to the server. But for higher-level tasks, agent software will overtake classical software by virtue of its greater generality.
Agent software will eat the well-defined world of pre-AI software and spit it out softer. This softening will change how we answer questions. To be more concrete, consider the following examples:
- Research-grade web scraping. We could produce standard, reproducible social-science analysis pipelines. Such a pipeline would be fed a prompt for the kind of data agents should scrape, how that data should get coded, and output a pre-defined analysis. When put in an agent-software script, the pipeline could also be reused and adapted to a variety of contexts, like systematic reviews or interview thematic coding.
- Full-coverage feature testing. Instead of a static test suite, we could place agents into a testing pipeline that varies with the feature being pushed. Such agents would trace the happy path and, if necessary, test against new edge cases each time a feature gets pushed. Since the whole thing runs in a standard CI environment, it can be rerun on every push like any other build step.
In machine learning, the bitter lesson says that general methods which scale with computation reliably beat methods which encode human knowledge as structure. Every attempt to hand-craft domain expertise into a system — chess heuristics, grammar rules, hand-engineered features — eventually lost to a simpler, more general method given enough compute.
With the improving capabilities of coding agents, the bitter lesson is now impressing itself onto the world of software itself. The rigid schemas, fixed integrations, and deterministic pipelines that defined classical software are a form of encoded structure, and agents are coming for them too.
Agent capabilities today require a shift in how we think about building software. Instead of asking what structure a system needs, we should be asking where we’ve been forcing structure just because code demanded it. The boundaries between what needs to be rigid and what can be flexible, what is and isn’t possible, have moved. And they will keep moving.
2026-03-17 06:24:25
This is a rough draft I'm posting here for feedback. If people like it, a version of it might make it into the next scenario report we write.
...
We think it’s important for decisionmakers to track whether and when they are handing off to AI systems. We expect this will become a hot-button political topic eventually; people will debate whether we should ever handoff to AIs, and if so how, and when. When someone proposes a plan for how to manage the AI crisis or the AGI transition or whatever it’s called, others will ask them “So what does your plan say about handoff?”
There are two importantly different kinds of handoff: Handing off trust and handing off decisionmaking. You can have one without the other.
Trust-handoff means that you are trusting some AI system or set of AI systems not to screw you over. It means that they totally could screw you over, if they chose to, and therefore you are trusting them not to.
Decision-handoff means that you are allowing some AI system or set of AI systems to make decisions autonomously, or de-facto-autonomously (e.g. a human is still technically “in the loop” but in practice basically just does whatever the AI recommends)
With both kinds of handoff, there are smaller and bigger versions.
Small |
Big |
|
|
Trust- handoff |
I used Claude Code to write most of my codebase. Anthropic’s cyber evals indicate that Claude could totally have inserted security vulnerabilities if it wanted to, and I wouldn’t notice. But probably Claude wouldn’t do that, so it’s fine. |
It’s September 2027 in the AI 2027 scenario. Agent-4 is a giant corporation-within-a-corporation of many thousands of copies running across OpenBrain’s datacenters. It’s broadly superhuman at all things coding and cyber, and has also been heavily involved in its own network security, and also regularly gives strategic advice to OpenBrain leadership. It’s now being tasked with designing and aligning Agent-5, a superior AI architecture that it autonomously discovered. Agent-4 appears to be obedient/loyal/aligned/etc., but ho boy, if it’s not, well, not only is OpenBrain screwed but quite plausibly the whole world will end up controlled by Agent-4 and its descendents (such as Agent-5). |
Decision-handoff |
Yesterday I decided to switch from coding with Claude to vibe-coding. I no longer make decisions about e.g. what file structures to use or how the UI should be managed on the backend, instead I just give the high-level goal to Claude and then press “tab” to accept whatever it suggests, unless what it suggests is so obviously insane that I can tell in two seconds. |
It’s July 2028 in the AI 2027 scenario. The army of superintelligences that flies the US flag (“Safer-4”) has been aligned to Spec, and the Spec says to obey the Oversight Committee. So in some sense, humans are still in control. However de facto Safer-4 is making basically all the most important decisions. For example Safer-4 autonomously negotiated and implemented a complex treaty with its Chinese counterpart, and the Oversight committee knew better than to raise any objections—why would they? It is so much smarter and wiser than them, and every time they objected in the past, it patiently explained to them why they were wrong, and they eventually agreed that they were wrong & approved, having accomplished nothing but wasting time. |
By default when we talk about trust-handoff and decision-handoff, we mean the really big ones, unless it’s clear from the context that we mean something smaller. So for example, if you see a diagram of scenario branches and the label “Trust Handoff” at a particular time on a particular branch, that means that at that point in that scenario, some set of AIs has become smart enough, and been entrusted with enough power, that they plausibly could take over the world if they tried. Similarly, the label “Decision Handoff” would indicate that at that point in the scenario, the overall trajectory of society is being steered by some set of AI systems; that extremely important decisions about how to structure society etc. are being de facto made by AIs.
When the benefits outweigh the costs, of course.
To a first approximation, we should only hand off trust to AIs if those AIs are trustworthy. By definition, when you hand off trust to a group of AIs, you are making it the case that if they decided to screw you over, they could. So, you better have well-founded trust that they won’t decide to screw you over.
Handing off decisionmaking is more complicated. You may be confident that your AIs won’t lie to you, won’t deceive you, will obey your orders, etc. and yet still be rightly reticent to put AIs in charge of everything. In other words you may be confident that your AIs are trustworthy, yet still not trust them to decide everything.
For example, your AIs might have enough deontological constraints on their actions (honesty, obedience, etc.) that they can be trusted not to take over the world, not to disempower you, etc. Yet at the same time, your AIs’ long-term goals might be subtly (or majorly!) different from your own, such that if you let them make the decisions, things will predictably go downhill from your perspective.
Analogy: You are a nonprofit board looking for a CEO to run your fast-growing organization. For some candidates, you might be worried that they are untrustworthy—for example they might lie to you, pull various schemes to get their rivals on the board kicked off, and ultimately one day you might try to fire them and find that you can’t. But, suppose they are in fact trustworthy and would never do those things and will always obey your orders. Still, they might have different values than you, different philosophies, different attitudes towards risk tolerance, etc., such that it would be a bad idea (from your perspective, not theirs) to hire them. They might end up taking the nonprofit in a very different direction than you envisioned, for example, or they might end up doing too many risky things that end up blowing up big time later. “Personnel is policy,” as the saying goes.
Why would you ever hand off trust? Why ever put a group of AIs in a position where they could take over the world if they wanted to? Well, perhaps because the other options are even worse. For example, perhaps the world has gotten itself into a very sticky situation (e.g. a crazy arms race towards superintelligence that’s on the brink of escalating into WW3) and you think your best bet is to put AIs in charge of a bunch of things (e.g. AI research, diplomatic and military strategy, …) and hope they can handle things better than you. After all, they are more capable than you. In other words, perhaps one plausible reason for handing off trust is that you want to hand off decision-making and you don’t have a way to do that without also handing off trust.
Another reason to hand off trust is to enforce contracts/agreements, in situations where the AIs are probably aligned. For example, the US and China might want to agree to respect each other’s sovereignty in perpetuity, yada yada, because otherwise they are trapped in a crazy robot and WMD and superpersuasion arms race. But you can’t trust humans to keep your word. But you CAN trust AIs to keep theirs, at least if they’ve been suitably trained/designed.
For smaller-stakes handoffs, the calculus is similar. E.g. you might hand off decision-making over many aspects of hospital patient’s health to an AI system because you have evidence that the AI system is more competent than your doctors and nurses; you recognize that this also involves handing off trust to the AI system (if it decided to kill your patients, it could easily do so) but you trust that it won’t.
AIFP hot take: We generally expect most powerful actors in charge of AI programs to hand off trust too early (while the risks are still high and outweigh the benefits) and to hand off decisionmaking too late (e.g. harmfully keeping a ‘human in the loop’ long after the point when they mostly just get in the way & slow things down). We think there might be an awkward “worst of both worlds” period where superhuman AI systems have been given significant power and autonomy — such as de facto control of their own datacenters & license to self-improve — such that they could take over the world if they wanted to, and yet simultaneously the world is full of problems that could be solved much better and faster, and risks that could be reduced/averted, if only the AIs were put in charge of more things in the real world.
That said, we aren’t confident. For reasons mentioned previously (See: the CEO analogy) such a period might make a lot of sense.
2026-03-17 05:44:22
TL;DR: We hypothesize that most alignment researchers have more ideas than they have engineering bandwidth to test. AICRAFT is a DARPA-funded project that pairs researchers with a fully managed professional engineering team for two-week pilot sprints, designed specifically for high-risk ideas that might otherwise go untested. We will select 6 applicants and execute a 2 week pilot with each, the most promising pilot may be given a 3 month extension. This is the first MVP for engaging DARPA directly with the alignment community to our knowledge, and if successful can catalyze government scale investment in alignment R&D. Apply here.
Applications close March 27, 2026 at 11 PM PST.
AICRAFT (Artificial Intelligence Control Research Amplification & Framework for Talent) is a DARPA-funded seedling project executed by AE Studio. The premise is straightforward: we hypothesize that alignment research could progress faster if the best researchers had more leverage. We believe that researchers currently are bottlenecked on either execution (i.e. they are doing the hands-on experiments themselves) or management (i.e. they are managing teams that are executing the work). Management is higher leverage but what if we could push that much further. AE Studio has been running a model where we pair researchers with fully managed ML teams, allowing the researcher to spend as little as 45 minutes per week with our team. Without the execution and management burden, this model provides a new outlet for research ideas that would have otherwise gone untested.
The U.S. pool for AI/ML engineering is much larger than the talent pool for AI alignment. If experts in alignment can effectively scale their capacity with general-purpose AI/ML engineering talent, that unlocks a much larger pipeline of alignment research than the field currently supports.
AICRAFT tests this by pairing researchers directly with an experienced engineering team for focused two-week sprints. The goal is to get initial signal on ideas that wouldn't otherwise get tested. If successful, the most promising ideas may have an opportunity to expand to a 3 month engagement.
We will select 6 researchers and execute a 2 week research sprint with each. The purpose of the sprint is to get signal on a high-risk idea, or to prove it wrong quickly.
DARPA has already set a goal to achieve military-grade AI. This was announced recently by our CEO in the Wall Street Journal. What makes that relevant to alignment? Military deployment requires reliability guarantees that deceptively aligned or unpredictably behaving systems simply can't meet. You can't field an AI system that pursues hidden objectives or behaves differently under distribution shift. In that sense, the DoD's requirements create a concrete, well-funded forcing function for alignment research outcomes, even if the framing and vocabulary differ from what you'd see on the Alignment Forum.
AICRAFT is the first direct engagement between DARPA and the alignment research community. If the pilots demonstrate that this model works, it builds the case for substantially larger government investment in alignment R&D, the kind of scale that grants and private philanthropy alone can't reach.
This may be the most important and highest leverage research engagement you have all year as it can catalyze large scale government investment in alignment R&D.
We're especially interested in researchers who have ideas that don't have other outlets. Maybe you have 10 ideas but bandwidth to pursue 2-3. Maybe there's a high-risk hypothesis that isn't a good fit for a grant or isn't supported by your current employer, but is worth getting early signal on.
If you have a testable hypothesis in AI control, alignment, or interpretability and can articulate what signal you'd look for in two weeks then we want to hear from you.
You bring (~2 hours/week):
We deliver (60+ hours of execution):
After the pilot:
You receive a final report with documented results. Promising pilots are recommended to DARPA for a 3-month extended engagement, contingent on your availability.
The application is intentionally lightweight: it takes under 10 minutes. The core of it is a 500-word research abstract addressing three questions:
Selected applicants will be invited to a brief follow-up call to talk through the idea and answer questions about the program. All applicants will be notified of final decisions by late April.
How much time commitment is this? Just four hours! You’ll spend two hours per week for the two-week pilot. This includes an initial planning session, async updates during the sprint, and demo sessions at the end of each week.
Can I participate if I'm affiliated with a university or company? Yes, if you can enter a subcontractor agreement with AE Studio. Most institutions have straightforward consulting processes. The one hour per week commitment typically falls within standard outside activity policies.
What compute and resources are available? Cloud compute from AWS, GCP, Azure, and specialized ML platforms. API access to frontier models for evaluations, synthetic data generation, and related tasks.
What happens after the two-week pilot? You receive a final report with documented results. Strong pilots may be recommended for a 3-month extended engagement, contingent on your availability.
Is there compensation? Yes, researchers receive a $1,000 stipend for approximately 4 hours of work over the 2-week period.
What is the selection process? We review applications after the deadline, invite promising applicants to a brief call, and notify all applicants of final decisions by early-mid April.
Apply here — applications close March 27, 2026 at 11 PM PST.
AICRAFT is funded by DARPA and executed by AE Studio. The views, opinions, and findings contained herein are those of the authors and should not be construed as representing official policies or endorsements of DARPA or the U.S. Government.
2026-03-17 05:20:12
In this post, I’m trying to put forward a narrow, pedagogical point, one that comes up mainly when I’m arguing in favor of LLMs having limitations that human learning does not. (E.g. here, here, here.)
See the bottom of the post for a list of subtexts that you should NOT read into this post, including “…therefore LLMs are dumb”, or “…therefore LLMs can’t possibly scale to superintelligence”.
Consider an algorithm for training a Reinforcement Learning (RL) agent, like the Atari-playing Deep Q network (2013) or AlphaZero (2017), or think of within-lifetime learning in the human brain, which (I claim) is in the general class of “model-based reinforcement learning”, broadly construed.
These are all real-deal full-fledged learning algorithms: there’s an algorithm for choosing the next action right now, and there’s one or more update rules for permanently changing some adjustable parameters (a.k.a. weights) in the model such that its actions and/or predictions will be better in the future. And indeed, the longer you run them, the more competent they get.
When we think of “continual learning”, I suggest that those are good central examples to keep in mind. Here are some aspects to note:
Knowledge vs information: These systems allow for continual acquisition of knowledge, not just information—the “continual learning” can install wholly new ways of conceptualizing and navigating the world, not just keeping track of what’s going on.
Huge capacity for open-ended learning: These examples all have huge capacity for continual learning, indeed enough that they can start from random initialization and “continually learn” all the way to expert-level competence. Likewise, new continual learning can build on previous continual learning, in an ever-growing tower.
Ability to figure things out that aren’t already on display in the environment: For example, an Atari-playing RL agent will get better and better at playing an Atari game, even without having any expert examples to copy. Likewise, billions of humans over thousands of years invented language, math, science, and a whole $100T global economy from scratch, all by ourselves, without angels dropping new training data from the heavens.
I bring these up because I think the LLM-focused discourse sometimes has far too narrow a notion of what problem “continual learning” is supposed to be solving. They tend to think the problem is about “losing track of information”, not “failing to build new knowledge”, and they propose to solve this problem with strategies like “make the context [window] longer” (as Dario Amodei recently mused), or better scratchpads with Retrieval-Augmented Generation (RAG) etc.
But real “continual learning” also includes the ways that AlphaZero changes after a million games of self-play, or the ways that a human brain changes after 20 years in a new career. There is no system of scratchpads that you can give to a 15-year-old, such that it would be an adequate substitute for them spending the next 20 years growing into a 35-year-old world expert in some field. Likewise, there is no context window that can turn GPT-2 into GPT-5.
Suppose you took an actual “country of geniuses in a datacenter”, completely sealed them from the outside world, and gave them a virtual reality environment to hang out in for the equivalent of 100 years. What would you find when you unsealed it? There would be whole new ways of thinking about the world and everything in it—entirely new fields of science, schools of philosophy, and so on.
Can a bunch of LLMs do that? Well consider this thought experiment: suppose you take a whole new field of science, wildly different from anything in the training data, and put a giant textbook for this field purely in an LLM context window, with no weight updates at all. Will this LLM be able to understand, criticize, and build on this field? My opinion is “absolutely not” (see 1, 2) which implies that merely increasing context lengths is definitely not sufficient for a real “country of geniuses in a datacenter”, when the datacenter is sealed shut for the equivalent of 100 years (contra Dario who seems to think that it’s at least in the realm of possibility that more context is sufficient by itself to get continual learning at “country of geniuses” level).
(If we’re talking about what a sealed “country of human geniuses” could do over the course of, like, one minute, rather than over the course of 100 years, then, yeah sure, maybe that could be reproduced with future LLMs! See von Oswald et al. 2022 on how (so-called) “in-context learning” can imitate a small number of steps of actual weight updates.[1])
Now, suppose that I take a generic imitation-learning algorithm (e.g. self-supervised learning in a transformer-architecture neural net, just like LLM pretraining), and have it watch a deep Q network play Atari Breakout, as it starts from random initialization, and gets better and better over 1M iterations. OK, now we have our trained imitation-learner. We freeze its weights, and use it in a similar way as people traditionally used LLM base models, i.e. have it output the most likely next move, and then the most likely move after that, etc.
Question: Is this trained imitation-learner actually a good imitation of the deep Q network? Well, “good” in what respect? I would pull apart a couple topics:
Why not? Well, actually, for an ideal imitation learning algorithm, i.e. Solomonoff induction on an imaginary hypercomputer, my answers would all be “yes”! But in the real world, we don’t have hypercomputers!
These days, when people talk about imitation learning, they’re normally talking about transformers, not hypercomputers, and transformers are constrained to a much narrower hypothesis space:
Imitation-learning a deep-Q RL agent by Solomonoff induction |
Imitation-learning a deep-Q RL agent by training a transformer on next-action prediction |
|
|---|---|---|
Hypothesis space |
The set of all computable algorithms |
A forward pass through T, for the set of all possible trained transformers T |
Ground truth |
The actual deep-Q RL agent, with such-and-such architecture, and Temporal Difference (TD) learning weight updates, etc. |
The actual deep-Q RL agent, with such-and-such architecture, and Temporal Difference (TD) learning weight updates, etc. |
Asymptotic limit |
It converges to the actual deep-Q RL agent |
It converges to whatever trained transformer forward pass happens to be closest to the actual deep-Q RL agent |
I think we should all be very impressed by the set of things that a transformer forward pass[2] can do. But we should not expect a transformer forward pass to reproduce a full-fledged, entirely different, learning algorithm, with its own particular neural network architecture, its own particular methods of updating and querying weights, etc., as it runs and changes over millions of steps.
Running one large-scale learning algorithm is expensive enough; it’s impractical to run a huge ensemble of different large-scale learning algorithms in parallel, in order to zero in on the right one.[3]
I’m going to harp on this because it’s a point of confusion. There are two learning algorithms under discussion: the imitation-learning algorithm (e.g. a transformer getting updated by gradient descent on next-action prediction), and the target continual learning algorithm (e.g. a deep Q network getting updated by TD learning). When the imitation learning is done, the transformer weights are frozen, and the corresponding trained model is given the impossible task of using only its activations, with fixed weights, to imitate what happens when the target continual learning algorithm changes its weights over millions of steps of (in this case) TD learning. That’s the part I’m skeptical of.
In other words: The only practical way to know what happens after millions of steps of some scaled-up continual learning algorithm is to actually do millions of steps of that same scaled-up continual learning algorithm, with actual weights getting actually changed in specifically-designed ways via PyTorch code. And then that’s the scaled-up learning algorithm you’re running. Which means you’re not doing imitation learning.
So back to the human case: for a typical person (call him “Joe”), I think LLMs are good at imitating “Joe today”, and good at imitating “Joe + 1 month of learning introductory category theory”, but can’t imitate the process by which Joe grows and changes over that 1 month of learning—or at least, can’t imitate it in a way that would generalize to imitating a person spending years building a completely different field of knowledge that’s not in the training data.
As mentioned at the top, I’m hoping that this post is a narrow pedagogical point. For example:
I guess I also need to mention the “algorithmic distillation” paper (Laskin et al. 2022), but I’m hesitant to take it at face value, see discussion here.
You can replace “a forward pass” with “10,000 forward passes with chain-of-thought reasoning”; it doesn’t change anything in this post.
Outer-loop search over learning algorithms is so expensive that it’s generally only used for adjusting a handful of legible hyperparameters, not doing open-ended search where we don’t even vaguely know what we’re looking for. Even comparatively ambitious searches over spaces of learning algorithms in the literature have a search space of e.g. ≈100 bits, which is tiny compared to the information content of a learning algorithm source code repository.