2026-03-03 19:30:31
Measuring AI Ability to Complete Long Tasks - METR
In their original 2025 paper, METR noticed that the slope (aka task horizon doubling time) of the trendline for models released in 2024 and later is different from the slope for <2024 models.
First, I decided to check whether a piecewise linear function fits the data better than a simple linear function. If it doesn't, then this change in trend is a random fluke and there is nothing worth talking about.
Here is the data so far (SOTA models only):
Note: the Y axis has human-friendly labels, but the data used in all further calculations is log10(raw value in minutes).
The piecewise linear function clearly provides a better fit, based on the Bayesian information criterion (BIC, lower=better by the way) and based on a qualitative "bro just look at it" assessment. I added RMSE, MAE and R² as extra information, keep in mind that since a piecewise linear function has more parameters, it's not surprising that it fits data better than linear, this is to be expected. BIC penalizes model complexity (number of degrees of freedom), so it's more relevant for this case than RMSE/MAE/R².
However, I'm not satisfied. Let's randomly remove 20% of datapoints from this graph, do it a few thousand times, and see how frequently the piecewise linear function provides a better fit according to BIC.
It's pretty clear that piecewise linear fits the METR data better. But it's possible that this is an artifact of METR's methodology. Is there any other benchmark where a similar change could show up? ECI, Epoch Capabilities Index, doesn't go back in time far enough to be reliable, the oldest SOTA model on ECI is GPT-4 released in 2023. If anyone knows a benchmark that includes models from the oldest (GPT-2 and GPT-3) to the newest, let me know.
Ok, let's say that the change in the trend is real - it's not only the line on the graph that has changed, the underlying reality has changed. What could be the cause?
In conclusion:
2026-03-03 18:29:02
Zurich AI Safety (ZAIS) is hiring a Director, with the goal of professionalizing AI Safety capacity building in Zurich and Switzerland. The Director will lead individualized career advising for promising talent, expand outreach to mid- and senior-level professionals, manage ZAIS's growing volunteer team, and explore AI Safety opportunities in the context of the 2027 AI Summit in Geneva. Funding has been secured for a full-time, 14-month position. Due to Swiss work permit regulations, we can only consider applicants who are citizens of Switzerland or an EU/EFTA member state (or who already hold a valid Swiss work permit). The deadline to apply is the 22nd of March.
Apply here
Know someone who would be a great fit?
Submit an (anonymous) recommendation here.
ZAIS started as the Zurich AI Alignment (ZAIA) reading group and, since 2022, has started to organize speaker events, community gatherings, and educational programs for people interested in AI safety in the Zurich area. The current team consists of 7 volunteers. The activities include: the AI Futures Talk Series, the AI Safety Fundamentals course, co-working days at PEAKS, a paper reading group, social events, participation in international programmes, and individualized career support by EA Switzerland. The first Zurich AI Safety Day (September 27, 2025) was a major success, drawing over 350 applications, 200+ participants, and representatives from 20+ organizations, including UK AISI, Apollo Research, and FAR.AI. Zurich has particularly strong talent streams, with top universities such as ETH Zurich and the University of Zurich, and AI companies like Google and Microsoft. To sustain the building momentum, we are now aiming to transition from volunteer-led activities to professional AI safety capacity building in Zurich and Switzerland. With the announcement of the 2027 AI Summit in Geneva, a strengthened ecosystem of people focusing on safety feels more important now than ever before.
Required
Desired
Please note that you do not need to meet 100% of the qualifications to be considered. We strongly encourage you to apply if this role excites you.
We offer a gross salary of 75.000 CHF per year, along with a comprehensive benefits package, including:
For exceptional candidates with significant relevant experience, we are open to discussing higher compensation.
We review applications on a rolling basis and encourage early submissions. The deadline to apply is the 22nd of March.
Apply here
2026-03-03 18:09:26
Epistemic Status: Further Research Needed, would be a shorter essay if I thought about it for longer.
Conspiracies exist. Some of them are quite large, involve people who met as adults and agreed to do crimes together, and do many heinous things that multiple co-conspirators know about for years without the police noticing.
Some organised crime groups start within families. If you were all raised in the same family-first culture and they're your cousin, you can be pretty sure they're not a cop. Others start in Lawlessness. If it's the jungle and nobody is a cop, you can be pretty sure they're not a cop. These aren't the situations I want to draw attention to here.
Many groups maintain control through massive threats of violence, bribing people more than the cops could ever equal, holding loved ones as hostages, reliably killing snitches, etc. These are closer to sovereign states than shadowy conspiracies, and also not our subject today.
Today, our topic is Conspiring as an Information Theory game. I think it is a very weird game.
Let me first describe a math problem:
You are a standard economist's perfectly rational logician, in a community of other perfect logicians. You are (secretly) a Conspirator, and would like to start a conspiracy. X others in the community are also (secretly) Conspirators, and want to start the same conspiracy, but you don't know who. If the Conspirators can obtain mutual knowledge of their identities, then they win.
To stop you are Y Secret Police, whom you also do not know. They win if they can obtain knowledge of who any of the Conspirators are, or else forever stop the Conspirators from winning. If you wish, you may instead imagine these as snitches, fools, or other non-contributors who must be kept out of the group project at all costs. There are also a much larger number of Normal People, who will report you if you admit to being a Conspirator but aren't actively out to get you.
You have the full variety of communication options: You can directly message anyone under your real name, a pseudonym, or anonymously. You can start group chats. You can put up public announcements on the town notice board or anonymously graffiti information on some wall where everyone will see it. You can use powerful encryption. The Secret Police can also do all these things.
You haven't agreed on a plan with the other Conspirators in advance (how could you, you don't even know who they are yet), but you know everyone is a perfect logician, so if there's a strategy, you'll all deduce the same strategy and do it in perfect synchronization. You're also all very risk-averse, so you'd like to win with at least near certainty, and not rely on lots of random luck.
Do you have a winning strategy?
I'll give you a minute to try to think of the answer.
...
Done yet? Great. There isn't one. It's a simple Strategy Stealing Argument: Whatever your scheme is, the Secret Police figure out the same scheme and do the same moves. At some point, your scheme must tell the other person your real identity, but you can't know they're not a cop because the cops behave identically to your allies. No matter how clever your plan is, at the end of it, you'll be in an anonymous chatroom with X+Y people, every Conspirator and every Secret Police agent, and no one will be able to leave anonymity and meet up in real life with confidence, on account of the pure luck shot in the dark you'd have to take. Since you're risk-averse and clever enough to deduce this all from the get-go, you don't even try in the first place.
Real life is more forgiving environment than this math problem. You can tell, because people have solutions to this problem. There are many examples we could use, but the one I want to pick out is Jeffrey Epstein's alleged child sex trafficking operation.
To run an alleged child sex trafficking operation, you need to be able to find a continuous stream of rich people who are inclined to do sex crimes. You need to let them know that you offer sex crimes on your private island, if they're interested. The overwhelming majority of people (I hope), if someone just told them that out of the blue, would go to the police immediately. You need to be so good at picking them that you only tell people in the tiny minority who'd say yes, or who at least wouldn't turn you in about it.
The rich person you're trying to sell to has to know you're serious. The police do attempt sting operations about this kind of thing, but they mostly get poor idiots. The kind of people Jeffrey is looking for don't fall for that kind of thing, but they (apparently) do take up the offer when it's the real deal. One way to achieve this is by owning a private island: Police sting operations knowably never involve owning a private island, because it's not in their budget. It may or may not be in the budget of something like the CIA, but if that's your threat model, this whole game gets ten levels harder again. Another threat is the sort of person who changes his mind later and turns you in, or being kicked out of every social space because the normal people think you're disgusting. For the operation to work, you have to avoid both of those, too.
Clearly, someone at some point is solving our mysterious secret recognition problem: The villains can identify each other, without tipping their identity in either direction. Meanwhile, the normal people can't identify the villains, even while they are receiving whatever strange secret handshake has been invented by all these aspiring co-conspirators who have never met before and have had no opportunity to coordinate on a secret handshake.
I'd like you to think for a minute about how incredibly hard this problem is, and how bizarre it is that so many people successfully solve it.
If the police could (reliably) spot the bad guys, they'd arrest all the bad guys. If the bad guys couldn't reliably spot the bad guys, they wouldn't be able to form (these types of) conspiracies. But somehow the bad guys (limited resources, no prior coordination, everyone else against them) have such a natural advantage at self-recognition that they can beat the odds, some of them regularly for years. We only know about them at all because they get caught for some completely unrelated mistake. How are they doing it?
In real life, neither Conspirators nor Secret Police are identical perfect logicians, which makes it a lot easier, because in theory almost any material difference offers an easy solution. The simplest way involves something Conspirators can do that Secret Police can't, such as commit murder or share original child sexual abuse material. The police can make asking someone to do murder a crime too, but you can get around that by merely hinting at it (in the first case), or by meeting first under mutual pseudonymity (as in the second case).
Physical Proof of Villainy under initial Mutual Pseudonymity is the clear gold standard here. If that's possible, the bad guys just automatically win and you can't stop them unless you're willing to spend significant amounts of effort faking proof of whatever false claims they pick. Still, most of the time, this isn't the solution used. It's just too high a level of effort, involves working with total strangers (which humans aren't built for), and needs everyone to buy in completely before they can even get started.
A secret society of serial killers could demand you send proof of you committing a bloody murder to their encrypted mail server on the dark web. Epstein couldn't do this. If he demanded Proof of Villainy up front from clients he wouldn't get it (for fear he would immediately blackmail them), and if they demanded it of him they wouldn't get it (for the same reason). For this strategy to work, you need to meet under a pseudonym, which they weren't high-tech enough to be doing.
The next way to distinguish cops from conspirators is what they consider convincing evidence. If they have different evidentiary rules, you can invent the Selective Proof of Villainy. The gang convinces the aspiring member that it's a real gang and not a hundred cops pretending to be a gang. The aspiring member (if they are a cop), still hasn't seen anything that constitutes legal proof of a crime, but (if they are a gangster) has seen plenty that constitutes (to a gangster) proof they are a gang. A hundred cops pretending to be a gang wouldn't use the right lingo, wouldn't meet in an old warehouse they took over, wouldn't visibly look like they're on drugs (though at this point, the aspiring member can't have seen any actual drugs, for fear they're a cop). It wouldn't hold up in court, but it works in real life.
Secret Police believe in "Proof Beyond Reasonable Doubt that is Admissible in a Court and is Persuasive to a Jury of Random Normal People" or at least "Articulable Probable Cause sufficient for a Warrant". Fellow Conspirators believe in "Actual Bayesian Evidence as Best They Can Approximate It". The margin between these two is free space to communicate, but if you have that kind of free space to communicate, you can send any message you want and win on day one.
This shouldn't work on a rationalist, because a rationalist also believes in Actual Bayesian Evidence as Best They Can Approximate It, so there's no gap big enough to sneak a message through. Despite this, I'm pretty sure Epstein wasn't immediately detected by every honest thinker he ever met at a party. If normal people think Jeffrey is a monster, they'll stop inviting him to parties and introducing him to potential clients. Jeffrey needs to maintain a public reputation here.
If an answer to this exists, it means there's something bad guys can do that other bad guys consider to be compelling evidence that someone is a bad guy (and they're right!), that you dear reader (as, I hope, the normal person you are) wouldn't identify as evidence of anything in particular (and you'd be wrong!).
Doesn't that spook you? This guy at the party you're attending is doing the "I'm secretly a child sex trafficker" dance, right in front of you, and the one other sex offender in the party is updating correctly from it while you're not even noticing it's happening.
Don't you at least aspire to do well enough to stop inviting that guy to parties?
A normal person would turn you in to some authority if you said you were a sex trafficker, but not if you told them you were just a little bit edgy. First, you reveal (by your behaviour) that you're a little bit edgy, and watch closely how they react. If they think it was gross, you apologise, return to acting normal, and now you know they're not very fun. If they laugh, you act a little bit more edgy.
So long as there's a continuous spectrum from normal guy to sex trafficker, so long as the other guy could plausibly be at any point along that spectrum, and so long as no one would take huge offense at someone merely 1 step worse than themselves (but would tip off that they're not edgier than that by looking upset), then you can walk down the path and see where the other guy stops. Sometimes it stops at dirty jokes. Sometimes it stops at wishing violent deaths upon the outgroup. Sometimes it just keeps going.
This won't defeat a serious agent of the Secret Police, who will play along at being a fellow villain all the way till you do something he can arrest you for, but it'll defeat all the normal people, and you can exclude the Secret Police in other ways.
How can you respond to this?
Qualia are a convoluted topic in philosophy that nobody has a useful way to talk about. This is a practical problem for practical people, and it would be weird for something as esoteric as the nature of qualia to matter, but I think it's the real secret here.
Every aspiring Conspirator has something in common. They know what it's like, how it feels on the inside, to be an aspiring Conspirator. There might be no good way to write this feeling down, but it's still knowledge, common knowledge prior to any coordination held as a communal secret on which to base a strategy.
Pick an arbitrary detail of how it feels to be a drug dealer, something only drug dealers know. For example, maybe there are a few different types of drug addicts, as to how they interact with their dealers. Maybe they follow a few patterns of behavioural norms. Encode this information in something non-suspicious, such as a joke. "Have you seen all those addicts on the street? <humorous impression>". If the other party is not a drug dealer, they will think you made a funny joke. If the other party is a drug dealer, they will realise your impression of a drug addict contains information about how drug addicts behave when interacting with dealers, information you'd only have if you were yourself a dealer. Nobody else can tell if it was accurate or not, so nobody else thinks they've learnt anything. Selective Proof of Villainy.
A single event won't be enough, because complicated things like that can happen by chance, and maybe the other party will miss it, but it's a little bit of Selective Proof of Villainy, and it scales well. Another aspiring conspirator might notice but be uncertain, and so respond with the same protocol back. After enough repeats they become increasingly sure, and increasingly transparent about it, until they're buying drugs off each other.
Although this is hard to do in real life, I think of it as the strongest option, because in the pure mathematics game it causes the conspirators to win with ease. Someone anonymously sprays graffiti describing a mapping function from qualia to encryption keys, where everyone can see it. You map the qualia of being an aspiring conspirator to the corresponding encryption key and use it to encrypt graffiti of your real name. On the morning of day 3, the conspirators have a list of all their names and win the game.
It also works for defending yourself against posers, idiots, snitches, and unreliable flakes. Just think of what it's like to be a sane, careful, honest con, the kind who really wouldn't screw up or betray his buddies (you are an honest con yourself, aren't you?), and make that your filter.
The part that makes it hard in real life is that you're not a bunch of perfect logicians, a mapping function from qualia to encryption keys hasn't been invented yet, and probably ordinary humans couldn't use it properly even if it had, but those feel like surmountable implementation details to me. In practice, it'd look like a normal first date, watching each others faces closely for specific reactions that only make sense if they know something not directly stated but entangled with your words, and the thing it is they're knowing is what it's like on the inside to be the kind of person you're looking for.
How can you respond to this?
Figuring out how to implement these in practice is left as an exercise for the reader.
Most things that follow conspiracy dynamics are not criminal conspiracies. Romantic love is a conspiracy between two people to do right by each other against the interests of everyone else who'd rather they do right by everyone else, and distinguishing a good romantic co-conspirator who really loves you and doesn't just want your money or your soul is approximately the same game. Ordinary friendship, office politics, startup co-founders who really believe in the mission and aren't planning to make you do all the work, fellow philanthropists who aren't running scams. Same Trait Recognition is an extremely valuable skill to have in almost any context. If Epstein is better at this than you, you should wonder what he knew that you don't.
Figuring out how to do that is also left as an exercise for the reader.
Beliefs are supposed to pay rent, and the contents of this post are supposed to pay rent primarily in new ways to hypothesize around why people say or do things, so let's give an example. It is, annoyingly, a political example, but it's the best one I've got.
Way back in 2002, before any of this was public, Donald Trump said in a magazine:
I’ve known Jeff for fifteen years. Terrific guy. He’s a lot of fun to be with. It is even said that he likes beautiful women as much as I do, and many of them are on the younger side. No doubt about it — Jeffrey enjoys his social life.
What belief state was Donald Trump in when he said this? What did he expect to achieve by saying it? If it was out of force of habit, why would he be in the habit of saying things like this? If he didn't mean the literal content of this statement to a general audience (which seems a crazy thing for him to be doing), who was he trying to speak to, and what was he trying to tell them?
I have my own theory here that I included in an earlier draft, but decided not to put them here as authoritative. I will say I don't think he could have known nothing, because "on the younger side" is a specific fact that someone who didn't know anything couldn't have guessed, and that it's probably intended that nothing in it constitutes admissible evidence of any particular crime.
I think that, whatever the true answer to these sorts of problems is, it should explain why people say things like this. What thought process produced the speech act, what goal they're trying to achieve, what game they think they're winning.
And I want to know what that game is, because I'd really like to be able to tell when people are playing it.
2026-03-03 13:23:44
Last week’s conflict between the Department of War and Anthropic marked a turning point for AI. I’m cautiously hopeful that the parties involved will find some kind of deescalation from the current nuclear option, but irreparable damage has already been done: to Anthropic, to the entire AI industry, and to America’s pre-eminence in AI.
This is a complex, fast-moving situation that is outside my usual beat. Rather than trying to cover it in detail myself, I’m going to link to some of the most useful analysis. But I want to be extremely clear: this is the most important thing that’s happened in AI for a long time and it’s gravely concerning. These are dark times and the road ahead just got more difficult.
Dean Ball’s latest is grim but essential reading.
This strikes at a core principle of the American republic, one that has traditionally been especially dear to conservatives: private property. […]
This threat will now hover over anyone who does business with the government, not just in the sense that you may be deemed a supply chain risk but also in the sense that any piece of technology you use could be as well. […]
Stepping back even further, this could end up making AI less viable as a profitable industry. If corporations and foreign governments just cannot trust what the U.S. government might do next with the frontier AI companies, it means they cannot rely on that U.S. AI at all. Abroad, this will only increase the mostly pointless drive to develop home-grown models within Middle Powers (which I covered last week), and we can probably declare the American AI Exports Program (which I worked on while in the Trump Administration) dead on arrival.
Zvi’s post from this morning is the most comprehensive review of the situation. I highly recommend reading at least the first two sections.
Anthropic isn’t mincing words:
We believe this designation would both be legally unsound and set a dangerous precedent for any American company that negotiates with the government.
No amount of intimidation or punishment from the Department of War will change our position on mass domestic surveillance or fully autonomous weapons. We will challenge any supply chain risk designation in court.
The Pentagon’s designation of Anthropic as a supply chain risk has become the most important part of this story. But the original dispute over using AI for mass domestic surveillance and autonomous weapon systems remains immensely important. Scott Alexander investigates whether OpenAI’s agreement with DoW will meaningfully constrain it from using AI in those ways.
Anthropic has said it will sue, and it has strong legal arguments on multiple independent grounds. Every layer of the government’s position has serious problems, and any one of them could independently be fatal. Together, they make the government’s litigation position close to untenable. […]
The statute wasn’t built for this, the facts don’t support it, and the courts will say so.
We still have a newsletter to do—let’s get started.
Everything changed in November, with Opus 4.5 + Claude Code. Since then, we’ve all been frantically trying to figure out what it all means (when we weren’t preoccupied by building cool things). Steve Newman shares 45 characteristically insightful thoughts about AI agents—some of these will be obvious to you if you already use agents extensively, but I found multiple new ideas here.
39: Agents use vastly more compute than chatbots. Compute usage for chatbots is basically limited by how much output people want to read. An agent can spend virtually unlimited time doing intermediate work that no one will review directly. If 100M desk workers start using AI agents at the level of intensity which requires Anthropic’s current “Max 20x” plan, that would translate into $240 billion in revenue per year. It will be years before there are enough GPU chips to support that level of usage.
Zvi reports on Sonnet 4.6: it’s very good, but you should probably use Opus instead unless price or speed are critical.
Nano Banana 2 is here—looks like the best overall image generator just got a significant upgrade.
Alex Albert would like to remind you that Anthropic has shipped a lot of cool features in spite of the chaos:
We are in the “scaling era”: AI capabilities are improving at a breakneck pace, largely because the big labs have been using exponentially increasing amounts of compute during training. That can continue for three or four more years, but we will soon run into physical constraints that limit how quickly we can bring more compute online.
Does that mean that capability improvements will radically slow down in a few years? Very possibly, but compute capacity isn’t the only thing that contributes to capability improvements. Improvements in algorithms and training data are also important factors, but it’s hard to quantify exactly how much they contributed to recent growth.
EpochAI’s Anson Ho takes a comprehensive look at the question—while he doesn’t find many definitive answers, it’s an excellent piece with plenty of good insights. He finds that algorithmic improvements have been a major factor, with two important caveats:
Daniel Litt is a professional mathematician who’s been closely tracking how well AI can do research-level math. His latest piece provides a very balanced detailed take on current capabilities and near-term trends.
Like many mathematicians, I find much discussion around AI-for-math to be filled with hype or outright quackery, and much of my commentary has focused on this. I’ve been very critical of AI-for-math hype. So I hope you will take me seriously when I say that it’s not all hype.
IEEE Spectrum looks at First Proof and Frontier Math:Open Problems, two new math benchmarks that challenge AI to solve real math research problems. Quoting Greg Burnham:
“AI has gotten to the point where it’s, in some ways, better than most PhD students, so we need to pose problems where the answer would be at least moderately interesting to some human mathematicians, not because AI was doing it, but because it’s mathematics that human mathematicians care about.”
Timothy Lee talks to professional programmers to assess how AI is changing the programming profession. His analysis of current capabilities and impacts is solid, but I expect much faster near-term progress than he does. Recent progress has been incredibly fast (and accelerating), and there’s a huge gap between what the models are already capable of and what most people are using them for. I’m pretty sure 2026 will bring even more change and disruption to programming than 2025 did.
One of the dumbest things people say about AI is that it’s “just next-token prediction”. Plenty of people have already explained why that isn’t meaningfully true, but Scott Alexander takes a different approach:
I want to approach this from a different direction. I think overemphasizing next-token prediction is a confusion of levels. On the levels where AI is a next-token predictor, you are also a next-token (technically: next-sense-datum) predictor. On the levels where you’re not a next-token predictor, AI isn’t one either.
This is the most useful “how to use AI” piece I’ve run across in a while: Luke Bechtel has AI interview him about his ideas as a way to organize his thoughts and prepare for a new piece of writing.
The risk of bad actors (terrorists, perhaps, or extortionists) using AI to create a bioweapon is one of the most serious risks of advanced AI. Transformer explores why biorisk is so concerning, how dangerous current AIs are, and why it’s so hard to assess the danger level.
The latest “things could go very badly” scenario to go viral is THE 2028 GLOBAL INTELLIGENCE CRISIS by Citrini Research. The all-caps, I’m afraid, are in the original.
The central conceit is clever: it purports to be a memo from June 2028 that recaps “the progression and fallout of the Global Intelligence Crisis”, focusing on jobs, the economy, and the financial markets. There are significant technical problems with some parts of it, and it’s almost certain that events won’t actually play out this way. But there are some really good insights and thought experiments here.
Beyond the specifics, it’s valuable as a sample thought experiment in “how might really powerful AI cause massive disruption in non-obvious ways?”
If you want to go deeper, Zvi’s analysis is excellent.
Jacob Steinhardt shares advice for technically skilled people who want to help with AI governance. It’s excellent for that audience but also has some solid insights that are more broadly interesting:
More generally, across domains spanning climate change, food safety, and pandemic response, there are two technological mechanisms that repeatedly drive governance:
- Measurement, which creates visibility, enables accountability, and makes regulation feasible.
- Driving down costs, which makes good behavior economically practical and can dissolve apparent trade-offs.
Anthropic just updated their Responsible Scaling Policy. This has been a controversial move, with many people criticizing them for significantly walking back some important parts of previous versions of the policy. I expect we’ll see more detailed commentary on this soon, but recent events with DoW have pushed it to the sidelines.
For now, I’ll just say that I tentatively agree with many of the changes they made, with the major caveat that I think this is probably the best possible policy for a very challenging world. I’m updating positively about Anthropic’s ability to make good decisions in hard circumstances, and negatively about humanity’s ability to make good collective decisions about AI.
Holden Karnofsky, who played a major role in writing the latest version, discusses the reasoning behind some of the changes.
Like Dean Ball, Anton Leicht came away from the AI Impact Summit deeply concerned about the gap between what Silicon Valley understands about AI and what most people—and in particular the middle powers—believe about AI.
This gap throws the world into danger of capturing all the risks and mitigating most of the benefits of AI.
Dan Williams interviews Anil Seth, who believes consciousness probably requires a biological substrate. Anil’s a very capable guy: he’s a well-regarded neuroscientist, an expert on consciousness, and the director of the Centre for Consciousness Science at the University of Sussex. If you’re interested in AI psychology and consciousness, you should watch this (or read the transcript).
The debate is this: on the one hand, computational functionalists argue that consciousness is the result of computational processes, which in humans happen to run on a biological substrate but could in principle run on computers. Biological naturalists argue that consciousness is specifically linked to biology and that merely simulating the biology won’t produce consciousness. An often-used example is that simulating rain on a computer doesn’t make anything wet.
It’s important to be clear that these are both hypotheses about the world, and we don’t yet have definitive evidence to prove either one. To my mind, though, many advocates of biological naturalism, including Anil, seem to be working backward from a desired conclusion rather than forward from observed facts. His theory that consciousness might result from autopoiesis seems to answer the question “assuming biological naturalism is true, what is a plausible mechanism for it,” rather than “do we observe anything about consciousness that cannot be explained without autopoiesis?”
Regardless, it’s a very interesting interview and Anil has thoughtful ideas about consciousness, intelligence, and computational functionalism.
For many tasks, LLMs are substantially constrained by the size of their context windows. One of the most important tips for using Claude Code, for example, is to avoid letting the context window fill up: performance degrades substantially as it fills up, even before it’s completely full.
That’s a hard problem to solve: the nature of the transformer architecture is that every token in the context window attends to every other token, so the cost of running a model rises quadratically with the size of the context window. There are no magic solutions, but TechTalks reviews some of the most promising technical approaches.
2026-03-03 09:32:26
Join Us for the Memory Decoding Journal Club!
A collaboration of the Carboncopies Foundation and BPF Aspirational Neuroscience
This time, we’re exploring a new preprint on how engram-to-engram wiring may store information in memory:
“Engram cell connectivity as a mechanism for information encoding and memory function”
Authors: Clara Ortega-de San Luis; Maurizio Pezzoli; Esteban Urrieta; Tomás J. Ryan
Institutions: Trinity College Dublin (School of Biochemistry & Immunology; Trinity College Institute of Neuroscience); EPFL (Brain Mind Institute); University of Melbourne (Florey Institute); CIFAR
Engram cells are thought to support memory storage and recall—but what exactly carries the specific information of an experience is still debated. This paper tests the hypothesis that information is encoded in the precise synaptic wiring between engram cells, not only in which cells are recruited. The authors track how learning reshapes connectivity across a defined vCA1 → basal amygdala pathway, then probe causality by artificially activating or inhibiting pre- and post-synaptic components. Finally, they identify a PSD-95–mediated plasticity mechanism that influences these connectivity patterns and may support long-term memory stability.
Presented by: Ariel Zeleznikow-Johnston
When? Tuesday, March 3, 2026 – 3:00 PM PST | 6:00 PM EST | 11:00 PM UTC
Where? Video conference: https://meet.google.com/udr-jcdc-vkp
Register for updates: https://aspirationalneuroscience.org/register-with-us/
Once registered, you'll receive event invites & updates!
#Neuroscience #MemoryResearch #Engrams #SynapticPlasticity #Hippocampus #Amygdala #JournalClub
#Carboncopies #AspirationalNeuroscience
2026-03-03 07:58:20
This is a crosspost of my ICLR 2026 blogpost track post. All code and experiments are available at github.com/andyrdt/iclr_induction.
Park et al., 2025 show that when large language models (LLMs) process random walks on a graph, their internal representations come to mirror the underlying graph's structure. The authors interpret this broadly, suggesting that LLMs can "manipulate their representations in order to reflect concept semantics specified entirely in-context". In this post, we take a closer look at the underlying mechanism, and suggest a simpler explanation. We argue that induction circuits (Elhage et al., 2021; Olsson et al., 2022), a well-known mechanism for in-context bigram recall, suffice to explain both the task performance and the representation geometry observed by Park et al.
We begin by describing the experimental setup of Park et al., 2025 and reproducing their main results on Llama-3.1-8B.

apple bird milk sand sun plane opera ...) where consecutive words are always neighbors. As the sequence length grows, the model begins to predict valid next words based on the graph structure. (c) Surprisingly, the geometry of the model's effective token representations mirrors that of the grid structure: the model comes to represent each node adjacent to its neighbor in activation space. Figure reproduced from Park et al.Park et al. introduce the in-context graph tracing task. The task involves a predefined graph apple, bird, math, etc.). The graph's connectivity structure
Grid structure. The task uses a apple, bird, car, egg, house, milk, plane, opera, box, sand, sun, mango, rock, math, code, phone.[1] Each word occupies a unique position in the grid. Two words are neighbors if they are horizontally or vertically adjacent (not diagonally). This defines an adjacency matrix
Random walk generation. Sequences are generated by random walks on this grid: starting from a random position, the walk moves to a uniformly random neighbor at each step. This produces sequences like apple bird milk sand sun plane opera ... where consecutive words are always grid neighbors. Following Park et al., we use sequence lengths of 1400 tokens.
Measuring accuracy. At timestep
PCA visualization. To assess whether the model's representations come to resemble the grid structure, we extract activations from a late layer (layer 26 out of 32). For each of the 16 words, we compute a class-mean activation by averaging over all occurrences in the final 200 positions of the sequence. We then project these 16 class-mean vectors onto their first two principal components for visualization. If the representation geometry reflects the grid, neighboring tokens should appear nearby in this projection.
Figure 2 shows our reproduction of Park et al.'s main results on Llama-3.1-8B.
Park et al. interpret these findings as evidence that the geometric reorganization plays a functional role in task performance: the model learns the graph structure in its representations, and this learned structure is what enables accurate next-node predictions.
"We see once a critical amount of context is seen by the model, accuracy starts to rapidly improve. We find this point in fact closely matches when Dirichlet energy[2] reaches its minimum value: energy is minimized shortly before the rapid increase in in-context task accuracy, suggesting that the structure of the data is correctly learned before the model can make valid predictions. This leads us to the claim that as the amount of context is scaled, there is an emergent re-organization of representations that allows the model to perform well on our in-context graph tracing task."
— Park et al. (Section 4.1; emphasis in original)
We propose that the grid tracing task can be solved by a much simpler mechanism than the in-context representation reorganization posited by Park et al.: induction circuits (Elhage et al., 2021; Olsson et al., 2022).
An induction circuit consists of two types of attention heads working together. Previous-token heads attend from position
In the grid task, if the model has seen the bigram apple bird earlier in the sequence, then upon encountering apple again, the induction circuit can retrieve and predict bird. Since consecutive tokens in a random walk are always grid neighbors, every recalled successor is guaranteed to be a valid next step. With enough context, the model will have observed multiple successors for each token, and can aggregate over these to assign probability mass to all valid neighbors.[4]
If the model relies on induction circuits to solve the task, then ablating the heads that comprise them should substantially degrade task performance. We test this via zero ablation: setting targeted attention heads' outputs to zero and measuring the causal impact on both task accuracy and in-context representations.
Head identification. Following Olsson et al., 2022, we identify induction heads and previous-token heads using attention pattern analysis on repeated sequences, and rank all 1024 heads in Llama-3.1-8B by their respective scores, yielding two ranked lists.
Ablation procedure. For each head type, we ablate the top-
Both induction heads and previous-token heads are critical to task performance. Figure 3 shows task accuracy under head ablations.Ablating the top-4 induction heads causes accuracy to drop from
In contrast, ablating
While both head types are important for task performance, their ablations have qualitatively different effects on in-context learning dynamics. Ablating induction heads degrades performance, but accuracy continues to ascend as context length increases. In contrast, ablating previous-token heads causes accuracy to plateau entirely.
Ablating previous-token heads disrupts representation geometry. While both head types are important for accuracy, they seem to have different effects on representation geometry. Figure 4 shows that ablating induction heads preserves the grid-like geometric structure in PCA visualizations, as the 2D projections still resemble the spatial grid. However, ablating previous-token heads disrupts this structure, causing representations to lose their apparent spatial organization.
In the previous section, we studied task performance and argued that the model achieves high task accuracy by using induction circuits. We now study the representation geometry, and attempt to explain the grid-like PCA plots. We will argue that this structure is plausibly a byproduct of "token mixing" performed by previous-token heads.
Figure 4 shows that ablating previous-token heads disrupts the grid structure, while ablating induction heads preserves it. This suggests that previous-token heads are somehow necessary for the geometric organization. But what mechanism could link previous-token heads to spatial structure?
Previous-token heads mix information from position
To test whether neighbor-mixing alone can create the observed geometry, we construct a minimal toy model.
We work directly in a 16-token space indexed by the
We then apply a single, "neighbor mixing" step:
where
After this one step, PCA of the 16 mixed vectors
The neighbor-mixing hypothesis makes a further prediction: individual activations should reflect not just the current token, but also its predecessor.
Instead of collapsing each word into a single class mean, we take the final 200 positions of a length-1400 random-walk sequence and project all 200 residual-stream vectors into the same 2D PCA space used for the class means. Each point now corresponds to a specific activation. For each point, we display bigram information: the center color indicates the current token

Individual activations seem to bear the fingerprint of previous-token mixing (Figure 6). For example, activations at positions where the bigram plane math occurred tend to lie between the plane and math centroids, and positions where egg math occurred tend to lie between the egg and math centroids. We see similar "in-between" behavior for all other bigrams. This is what one would expect if the representation of
Our experiments point toward a simple explanation: the model performs in-context graph tracing via induction circuits, and the grid-like PCA geometry is a byproduct of previous-token mixing. However, our understanding remains incomplete in important ways.
The toy model is a significant simplification. Our neighbor-mixing rule assumes that previous-token heads simply add the previous token's activation
Why does the grid structure emerge late in the sequence? Previous-token heads are active from the start of the sequence, yet the grid-like PCA structure only becomes clearly visible after many tokens have been processed. If neighbor-mixing were the whole story, we might expect the geometric structure to appear earlier. Yang et al., 2025 develop a theoretical framework formalizing a graph-convolution-like process across both context and layers, that may offer a more complete account of how the geometric structure emerges.
Limited to the in-context grid tracing task. Our analysis is limited to the
We have argued that the phenomena observed by Park et al., 2025 can be explained by well-known mechanisms in language models. Task performance on in-context graph tracing is well-explained by induction circuits, which recall previously-seen bigrams. The geometric organization visible in PCA plots appears to be a byproduct of previous-token mixing: because random walks traverse graph edges, previous-token heads mix each position's representation with that of a graph neighbor, and this mixing alone is sufficient to produce grid-like structure from unstructured embeddings.
These findings suggest that the "representation reorganization" observed by Park et al. may not reflect a sophisticated in-context learning strategy, but rather an artifact of previous-token head behavior.
All words tokenize to exactly one token when preceded by a space (e.g., apple is a single token). Sequences are tokenized with a leading space before the first word, ensuring single-token-per-word encoding.
Dirichlet energy measures how much a signal varies across graph edges. Low energy means neighboring nodes have similar representations, so Park et al. use it to quantify how well the model's representations respect the graph structure.
In the literature, the term "induction head" is sometimes used to refer to both the individual attention head and the full two-component circuit. We use "induction circuit" for the full mechanism and "induction head" for the specific head that attends to tokens following previous occurrences, to avoid ambiguity.
For example, if the model has seen both apple bird and apple house, it can distribute probability across both bird and house when predicting the next token after apple.