2026-03-10 11:14:26
The conflict between the Department of War and Anthropic has quieted somewhat, but nothing has been resolved and a catastrophic outcome is still entirely possible. Regardless of what happens next, two things are very clear.
This is the least political that AI will ever be. Politicians are finally waking up to the fact that AI is a big deal. Even though most of them don’t understand why it’s a big deal, you can safely assume they will have an increasing appetite for government intervention. The DoW incident is a preview, not an aberration.
This is the least stressful that AI will ever be. The last two weeks have been brutal: I notice several of the writers and thinkers that I most respect have been publicly struggling and in some cases decompensating. I’m afraid the pace is only going to get faster, and the stakes are only going to get higher. Pace yourselves.
In the spirit of pacing ourselves, we’ll cover what we need to cover about DoW, then put it down and move on to happier topics.
For years now, questions about A.I. have taken the form of “what happens if?” […]
This year, the A.I. questions have taken a new form, “what happens now?”
Ezra Klein’s opinion piece in NY Times ($) is nominally about the conflict between the Department of War and Anthropic and his analysis of that situation is spot-on: this is possibly the best short piece on that topic. But that conflict is a symptom of a much deeper problem: we’ve gone from being unprepared for AI capabilities that are coming soon to being unprepared for AI capabilities that have now arrived.
AI profoundly changes the nature of government surveillance—it’s now possible to intensively surveil every single American in a way that was previously (sort of) legal but completely impractical. In a sane world, the US Congress would carefully consider the implications of that change and pass appropriate legislation that codifies a reasonable balance between security and privacy.
Lamentably, we don’t seem to live in that world. Plan accordingly.
Zvi reports on Gemini 3.1. It’s a great model, but Google DeepMind just isn’t quite keeping up with Anthropic and OpenAI. Image generation is state of the art, but aside from that there’s no good reason for most people to pick Gemini as their daily driver.
Let’s start with some of the most interesting pieces from the past week.
Obviously a conversation between Ezra Klein and Dean Ball ($) is going to be good, and this one exceeds expectations. Dean is both highly-informed about the political situation and deeply thoughtful about the deeper implications of what’s happening here.
Zvi summarizes the state of play as of March 6.
There’s been a lot of discussion about what the contracts between DoW and Anthropic / OpenAI actually mean. If you want to go down that rabbit hole, Zvi does a great job of breaking down what we currently know. See also Tom Smith’s analysis.
I’m glad people are doing the important work of scrutinizing these contracts and doing their best to ensure that they establish clear legal boundaries. But ultimately, legal documents can only do so much. If you don’t trust the three letter agencies not to spy on you in the first place, you probably shouldn’t trust them to honor a contract.
Much of the AI world has been highly critical of DoW’s recent actions, for obvious reasons. Pirate Wires’ conversation with DoW’s Emil Michael (partial $) is the best piece I’ve found in support of DoW’s position—there’s a lot I don’t agree with, but it’s more reasonable and coherent than many of the straw men being tilted at online.
The immediate consequences of the situation are bad enough, but the long-term collateral damage will be even worse. A lot of individuals, companies, and countries are going to look at the events of the last two weeks and start quietly making contingency plans that ultimately weaken both America and the entire AI industry. Nobody is well-served by any of this, and the longer the situation drags on the worse the fallout will be.
Here are two early examples—I’m certain many similar conversations are happening behind closed doors.
In the wake of the conflict between DoW and Anthropic, Anton Leicht considers whether it’s feasible for one of the middle powers to “poach” a frontier lab. He concludes it isn’t realistic to outright move one of the big labs outside the US, but proposes some intermediate strategies:
Stepwise and subtle, however, is a possible way to do this: understand the project of ‘poaching’ a frontier lab not as an attempt to extract value from the U.S., but to diversify the Western stack to make it more resilient to transient political trends and disruptions. My broader claim here is simple: it would be good for the world if a sizeable minority of American developers’ compute, business activity, and government cooperation were located in allied democracies. That could be about Anthropic, but I’d be just as happy with OpenAI or Google DeepMind. In a pinch, I might even take Meta. That outcome is eminently reachable and obviously beneficial in the aftermath of the Anthropic/Pentagon saga—and it’s never been more clear to the frontier developers that some hedging might be in their very best interest.
The DoW / Anthropic dispute has rekindled serious discussion about the US government nationalizing frontier AI development. Much of that discussion has focused on legal, political, and philosophical questions, but there hasn’t been much serious discussion of the practicalities.
John Allard dives into the nuts and bolts of nationalization, considering what strategies the government might use and whether those strategies would actually work. He isn’t optimistic about the outcome (which doesn’t mean it wouldn’t happen anyway):
It was always an inevitability that the government would try to exert control over frontier AI. The problems arise when the government begins exerting control without understanding that the frontier is a living process, not an asset. At some point the frontier may commoditize enough that tacit knowledge stops mattering and the government can brute-force its way to capability. But we’re not there yet. And until someone can answer the harder question — whether the US is better off accepting less control in exchange for maintaining its lead — the risk is that every attempt to capture the frontier is what finally kills it.
AI Village is the sensible, grownup version of Moltbook. A team of frontier AIs are assigned a group project and attempt to tackle it in full view of an amused world. Recent projects have included fundraising for charity and writing a blog. While there are elements of robot reality TV here, it’s an interesting way of exploring agent capabilities in the real world. Of particular note, it gives us information about how well a diverse group of frontier agents can work together (that’s going to be a big deal by the end of this year).
As you might expect, the agents made a lot of progress last year:
In the AI Village, we’ve observed substantial improvement in agent capabilities over the span of months. Early 2025 agents often fabricated information, got stuck, or became easily distracted in a few minutes to hours. Late 2025 agents tend to be more truthful and stay on task longer (though their effectiveness often drops off once the most obvious tasks are done).
Jason Crawford reflects on recent progress in agentic coding. There aren’t a lot of novel insights here, but it’s a great overview and a strong choice for sharing with people who haven’t been following AI closely.
2025: why would I do work when I can tell a robot to do it for me?
2026: why would I tell a robot to do work when I can have a robot tell it for me?
I’ve recently needed artwork for a couple of personal projects, and I’ve found that SOTA models aren’t just capable artists—they’re also quite good art directors. My current workflow goes like this:
Claude is surprisingly good at looking at an image and finding areas for improvement in everything from line style to facial expressions. The results can’t (yet) compete with professional work, but they’re getting very good. And from a process perspective, the AI is light years better: I can experiment with multiple directions and styles within minutes, and the robots never get frustrated when I change my mind seven times in half an hour for no good reason.
Along with mass domestic surveillance, autonomous weapons are one of the red lines in the Anthropic / DoW dispute. Policy and ethical considerations aside, it’s surprisingly hard to define what “autonomous weapons” actually means. We have well-defined autonomy levels for cars, but no similar concept for weapons (yet). Autonomous missile defenses have been deployed since the 1980s, but that feels very different from a system that can autonomously identify and engage individual soldiers.
Transformer explores some of the technical and legal questions, and looks at what’s currently on the battlefield in Ukraine.
New communications technologies often transform how the public gets information and forms opinions. The printing press democratized the spread of information, weakening the control of the church and monarchy. Social media is a breeding ground for outrage, tribalism, and conspiracy theories. How might AI affect public discourse?
Dan Williams argues that AI might be a force for good, nudging us closer to a consensus view of reality based on expert understanding and strong epistemics. We don’t have much data yet, but he cites some promising early research suggesting that LLMs are surprisingly effective at getting people to change their minds.
His arguments sound plausible, although I note that many of us initially expected social media to be a force for good.
AI Frontiers explores how AI might affect workers, arguing that if AI is much better than humans at many but not all jobs, human wages might actually rise.
That counter-intuitive result follows from basic economics, which the article does a good job of explaining. It’s a solid piece, and a good introduction to some of the relevant economics if you’re not already familiar with them. But note that this whole analysis only applies if AI is powerful but not superhuman. Without careful intervention, everything falls apart in a world with superhuman AI:
If machines do everything, then those who own the machines will capture all this value. Products and services would become very cheap, but workers, outcompeted by machines in all tasks, would end up with a vanishingly small share of the economy’s income.
We can flourish alongside superintelligent AI, but only if we make smart choices.
Eleos AI Research is a small nonprofit dedicated to studying AI sentience and wellbeing, a topic which until very recently has largely been ignored. Executive Director Robert Long goes on the 80,000 Hours podcast to discuss their work and some of the big open questions they’re tackling.
Good interviews answer the questions you wanted to learn about, but great interviews raise (and occasionally answer) questions you hadn’t realized you ought to be asking. I came out of this one with new questions about the ethics of creating sentient AI that wants to be subservient to humans and about AI consciousness that is as meaningful as ours but unrecognizably different.
(or, an opinionated take on how to do important research that matters)
As the subtitle implies, Nicholas Carlini has opinions about how to write papers good enough to win best paper awards—and more generally, how to do good research. It’s a dauntingly long piece but very good: even though I’m not a researcher, I found multiple insights that I’m excited to put to use in my own work.
i always wanted to write a six-word story. here it is:
near the singularity; unclear which side.
2026-03-10 10:43:41
Hot takes up front:
Cross-posted from Multiplier
I work on the capacity-building team on the Global Catastrophic Risks-half of Coefficient Giving (formerly known as Open Philanthropy). Our remit is, roughly, to increase the amount of talent aiming to prevent unprecedented, globally catastrophic events. These days, we’re mostly focused on AI, and we’ve funded a number of projects and grantees that readers of this post might be familiar with– including MATS, BlueDot Impact, Constellation, 80,000 Hours, CEA, the Curve, FAR.AI’s events, university groups, and many other workshops and projects.
The post aims to make the case that broadly, capacity-building work (including on AI risk) has been and continues to be extremely impactful, and to encourage people to consider pursuing relevant projects and careers.
This post is written from my personal perspective; that said, my sense is that a number of CG staff and others in the AI safety space share my views. I include some quotes from them at the end of this post.
I’m writing this post partly out of a desire to correct what I perceive as an asymmetry in terms of how excited I and others at Coefficient Giving are about this kind of work vs. how much people in the EA and AI safety communities seem excited to work on it. The capacity-building team is one of three major teams working on AI risk at Coefficient; we currently have 11 staff, which is ⅓ of the total AI grantmaking capacity, and gave away over $150M in 2025. I started my stint at Coefficient Giving in 2021, working half-time on technical AI safety grantmaking and half-time on capacity-building grantmaking; among other reasons, I ultimately switched to working full-time on capacity-building, because my sense was that team was several times (maybe an order of magnitude) more impactful. Things seem somewhat different to me now (I think the set of opportunities in technical AI safety grantmaking looks significantly better than it did in 2021), but my sense is capacity-building as an area of work is still massively underrated relative to its impact.
The naive case for this kind of work (often called the multiplier effect argument) goes something like this: say you can spend a little time doing direct work yourself, or spend that same amount of time getting one of your equally talented friends into direct work for the rest of their life. Getting your friend into direct work is most likely the more impactful option, because you get to “multiply” your lifetime impact (in this case, by almost a factor of 2) by getting a whole additional person to spend their career on work you think is important.
In fact, whether this argument goes through depends on a few premises: namely, how good the direct work you would have done would be, and how tractable it is to convince others who are similarly talented to you. I’m going to skip over the first premise for now (and attempt to address it in a later section) and present evidence that our team has collected over the years that makes me think that this work is very tractable– and in particular, that there are easy-to-execute interventions that reliably influence people’s career trajectories in substantial ways. A priori, you might think that people’s career choices happen randomly and chaotically enough that it’s difficult to make a substantive impact trying to change what people work on. But in fact, both anecdotal evidence we’ve observed and larger scale data collection we’ve attempted (both presented below) suggest that intentional efforts make a big difference to individual career trajectories (including the career trajectories of individuals who go on to do highly impactful work). I think that core stylized fact makes up the main case for why capacity-building work is worthwhile.
I will briefly note that while the below case is focused on successes from capacity-building, I do think this work has the potential for harm, though my overall view is that efforts in this space executed by thoughtful, high-context individuals will be very positive in expectation. I briefly discuss this in this appendix.
In 2020 and 2023, our team ran two similar, in-depth surveys where we asked low-hundreds of people currently working on (or relatively likely to work on) impactful GCR work what influenced their career trajectories. Survey respondents included employees at AI labs, staff at key technical, policy, and capacity-building organizations in AI, and promising-seeming early career individuals. The aim of the surveys was to provide some evaluation of the impacts of the grants our team had made, as well as to generate some evidence informing Coefficient Giving’s views on capacity-building work as a whole.
The survey used a variety of prompts to elicit evidence from respondents about what had influenced their career choices. One of the sections asked respondents to unpromptedly list the top 4 influences that they thought were most important to their current career trajectory (these included things like “my partner”, “inherent curiosity”, etc).
In 2023, 60% of respondents listed a capacity-building program or organization that our team was funding in their top four influences, with the most common being university groups (listed by 25% of respondents), 80,000 Hours (listed by 20% of respondents), and EAG/EAGxes (listed by 12% of respondents).
See the table below for a longer list of the commonly listed influences, sorted manually into (somewhat subjectively decided) buckets. Note that:
| Unprompted item | % of respondents who listed as top-4 influence (in 2023) |
Count (of 329) |
|---|---|---|
| University group | 25% | 82 |
| 80,000 Hours | 20% | 66 |
| EAGs/EAGxes | 12% | 38 |
| Eliezer's writing | 11% | 37 |
| Broad group | 7% | 22 |
| Will MacAskill's writing | 5% | 17 |
| Lightcone | 5% | 15 |
| - LessWrong | 4% | 12 |
| Peter Singer's writing | 4% | 14 |
| Open Philanthropy | 4% | 14 |
| Bostrom's writing | 4% | 12 |
| Toby Ord's writing | 4% | 12 |
| EA Forum | 3% | 11 |
| Redwood | 3% | 9 |
| - MLAB or REMIX | 2% | 7 |
| FHI | 3% | 9 |
| Scott Alexander's writing | 3% | 9 |
| FTXF | 2% | 7 |
| ESPR | 2% | 7 |
| GCP | 2% | 7 |
| CEA | 2% | 6 |
| SERI MATS | 2% | 6 |
| Atlas Fellowship | 2% | 6 |
| AGISF online | 2% | 5 |
| Cold Takes | 2% | 5 |
| GPI | 2% | 5 |
| Rethink Priorities | 2% | 5 |
I’m not able to share the individual free-write responses from the survey above, but I recently personally asked some individuals who I think are doing high-impact work to tell me how they came to be doing that work, followed by what they thought the most important or counterfactual influences on their trajectories were.
Below, I include Claude summaries of their overall stories along with their description of the most important influences, lightly edited. Some notes on the testimonials I've included:
“Here's a list of the salient influences on me:
Claude’s summary:
Max got it into his head in high school that human-level AI was coming during his lifetime and that it was important to make sure the process went well, but he had no idea anyone was working on it. In college, he got connected with Stephen Casper, where he learned practical ML skills, and to someone who connected him to the people running the Impact Generator retreat [Asya note: this was a small GCR-focused workshop series run in the Bay in 2022], which he was later invited to. He talked to Tao Lin at that retreat, and Tao offered him a TA position at the ML bootcamp Redwood was running, with three weeks to learn the material. He thought he'd be in the Bay for three days, but stayed six weeks. TA'ing turned into an internship at Redwood, which he took a semester off college to do. While interning he got to know Ajeya, and by the time he graduated she offered him a job.
Max on what was most important:
Claude’s summary:
Rachel got into effective altruism in high school through friends, and started a group at her university. She spent some time interning running retreats and ended-up helping with Future Forum, a futurism conference that required a last-minute venue switch. She took a semester off to study AI safety, but decided she wasn't interested in research, and did web dev for a while. After running Manifest 2024, she started The Curve, and is now working on other field-building projects.
Rachel on what was most important:
Claude’s summary:
During his first week of university in 2015, someone handed him Superintelligence. He studied cognitive science, did a CS bachelor's in parallel, then a machine learning master's and PhD to prepare for AI safety work. In 2022 he started doing AI safety research on the side with a grant from the Long-Term Future Fund. He paused his PhD, did MATS in early 2023, concluded that deceptive alignment was the biggest problem and that no one was doing evals for it, and started Apollo, which he’s been running since.
Marius on what was most important:
Claude’s summary:
Adam knew from an early age that superintelligence would be scary if someone built it, but assumed it wasn't going to happen in his lifetime. When he got to college, he joined the AI Safety Fundamentals reading group that the Harvard AI Safety group (HAIST) was running, thought the people were extremely cool, and made most of his close friends there. He became increasingly convinced the problem was urgent as language models kept getting smarter. He met Buck Shlegeris at a HAIST retreat, talked to him, and applied to MATS. He did MATS at Redwood, enjoyed it so much he took time off school, and has been working there since.
Adam on what was most important:
Claude’s summary:
Gabe was given a copy of The Precipice when he started as a freshman at Harvard. There was no formal AI safety team at the time, but a group of 7-10 people would gather weekly to talk about x-risk in a dining hall, so he joined, and ended up going to a long workshop in Orinda [California]. He did REMIX [Asya note: this was a mechanistic interpretability bootcamp] the following winter, which introduced him to the Constellation community, and then applied for a Redwood internship for the next summer. After others graduated, he became the new director of HAIST (the Harvard AI Safety Team). He worked with the Alignment Research Center, applied to labs, and was eventually convinced by several people to join OpenAI.
Gabe on what was most important:
Claude’s summary:
Catherine found 80,000 Hours before university through internet searching about careers, then read Doing Good Better. They engaged with the Oxford effective altruism university group, going to events and helping run programming. Through the group they made friends who were into AI safety and argued with them a bunch, which got them interested in AI safety. They applied for the ERA fellowship (then called CERI) after someone from the group told them to, and spent a summer thinking about AI safety with other people. Then they did the GovAI fellowship, which they found even more helpful, via meeting people and developing her own takes on relevant topics. After that they were interested in AI governance, and applied to Open Philanthropy when they were graduating.
Catherine on what was most important:
Claude’s summary:
Aric found GiveWell by Googling for the most effective charities in his late teens, but didn't find the broader effective altruism community until 2020, when a friend found an online student summit that CEA ran. He knew the people who led the Stanford effective altruism group, but never had time to get involved, and was then invited by those people to help with some community-building efforts at MIT. He was also invited to Icecone [Asya note: this was an AI-risk-focused workshop run in 2022], and came out of it persuaded that AI safety was a big deal, but less convinced that theoretical alignment work was the way to proceed. He did a bunch of short sprints of community-building work and met Chana Messinger while teaching at the Atlas Fellowship, and later the Apollo program in the UK. When 80K started thinking about video production, Chana brought him on because they'd worked well together before, and because Aric had prior experience in film & television acting. Aric had previously been encouraged by [experienced EA leaders / Will MacAskill, among others] to do public-facing content creation, and decided to give it a shot.
Aric on what was most important:
Claude’s summary:
Ryan read HPMOR and LessWrong in high school, but he didn't anticipate near-term AGI until rediscovering the idea through effective altruism around 2020. He co-organized the effective altruism group at the University of Queensland during his physics PhD, where his interest in catastrophic risk evolved from climate change activism to nuclear winter modeling to AI risk after reading The Precipice. He completed the first AI Safety Fundamentals course, applied unsuccessfully to FHI and CLR, then did the SERI MATS pilot program. He attended Icecone [Asya note: this was a AI-risk-focused workshop run in 2022] in Berkeley, where he met Holden Karnofsky, Ajeya Cotra, Buck Shlegeris, and many future colleagues. While completing the MATS research phase with John Wentworth as his mentor, he sent the co-organizer a document explaining how he would improve the program and got invited to join the organizing team. He's co-led MATS with Christian Smith since late 2022.
What Ryan says was most significant (in order of importance):
While some of the interventions affecting people’s career trajectories are fairly idiosyncratic, we’ve noticed a few broad categories that tend to be impactful on people’s careers (many of which are featured in the testimonials above).
Notably, unlike content, in our experience programs and events can have a sizable impact even if they don’t meet an exceedingly high-quality bar, making them a good bet for a wider range of people to work on. Generalizing from anecdotes, I speculate that programs and events (especially in-person ones with other participants at a similar point in their careers) often have the effect of causing someone to take changing their career more seriously as a possibility, whereas previously they had been engaging e.g. online in a fairly abstract or detached way.
Our recent request for proposals gives some examples of the kinds of projects we’d be interested in seeing on the current margin. Briefly highlighting some specific things that I or others on my team think would be good, based on our sense of both what’s worked in the past and the current AI risk landscape:
The above makes the case for why you might think capacity-building work is valuable, but doesn’t in itself provide a point of comparison for what someone could be doing otherwise, (namely direct work, which itself could have its own capacity-building benefits, e.g. by creating evidence that there’s important work to be done in an area).
I don’t have a rigorous method of comparing the value of potential direct vs. CB interventions, and I think there’s room to make a variety of plausible cases. That said, I will share my intuitions, as well as the intuitions of some others at Coefficient.
I generally encourage people to think about their career choices at an individual level, but from an overall talent allocation perspective, my current take is that many of the marginal hires at larger organizations doing technical or policy work right now (including e.g. Apollo, Redwood, METR, RAND TASP, GovAI, Epoch, UKAISI, and Anthropic’s safety teams) would be capable of founding or being an early strategy-setting employee at a top capacity-building organization, and would have more impact by doing so.
I think individuals who are most well-suited to capacity-building work are those who are (some subset of) entrepreneurial, socially skilled, operationally strong, or strong communicators in the relevant subject areas. I think work running programs or events is particularly loaded on the first three of these, whereas e.g. producing content is much more loaded on the last.
If you think you might be someone who should plausibly be doing capacity-building work, here are some things you could consider:
There are a number of actively-hiring organizations that I think are doing impactful capacity-building work (see some of them in this filtered 80K job board), but here I’m going to plug some organizations where I feel a strong hire could be particularly impactful.
If you think you might be interested in any of the below but are on the fence, you can DM me or fill out this form and I’ll aim to take an at least 15-minute call with you (and longer if it seems useful; up to a limit of 20 such calls).
Constellation is a research center and field-building organization located in Berkeley, California, that hosts a number of organizations and individuals doing impactful work in the AI safety space. In addition to running the space itself, it’s historically run programming through the space, including the Astra Fellowship, the Visiting Fellows Program, and a number of one-off workshops and events.
Given the dense concentration of high-context talent working there, I think Constellation has huge potential to be impactful both as a convening place for people doing this work, and as a host of a number of programs and events, including (potentially) ones aiming to engage policymakers, AI lab employees, and other high-stakes actors relevant to the AI space.
Constellation is looking for a new CEO who I expect to be the primary individual setting Constellation’s strategic direction. I think that position will be extremely impactful and I'd like them to get a strong hire.
Kairos runs SPAR, a remote AI safety research mentorship program, provides advice and monetary support for AI safety university groups, and has taken on running workshops for promising young people. I think there’s massive amounts of evidence about the effectiveness of all three of these interventions (some of which you can see in the testimonials above), and I think university groups and workshops for young people in particular are (still) extremely neglected relative to their historic impact.
I think Kairos has a very strong leadership team and important, neglected priorities (plus, Agus is a great Tweeter), and I think it would be very impactful for them to have early hires who are strong generalists that could own priority areas-- they plan to open multiple new hiring rounds very soon, and you can fill out their General Expression of Interest form to be added to their potential candidate pool for those roles.
Our team is always accepting applications for funding. This section above as well as our request for proposals describes some kinds of projects in AI capacity-building that we might be particularly excited to fund, but I also encourage people to form their own views about what might be effective and not anchor too strongly to past work.
We’ve seen a lot of successful capacity-building work start or run completely by people or organizations doing it on the side of their day-to-day work, including MATS (which was started by full-time Stanford students), a number of impactful workshops and events, and a lot of widely-read public communications.
If you think you might be interested or a good fit for this kind of work, but aren’t sure where to start, we would love it if you let us know by filling out this very short expression of interest form. We’ll reach out if there are projects or opportunities on our radar that we think might be a particularly good fit for you. (Note that we don’t expect to reach out to most respondents).
This post is coming from my personal perspective, but my sense is my position here is directionally shared by at least some at CG and elsewhere in the AI safety space. I asked a few people who were not working on capacity-building, but I felt had substantial context on capacity-building efforts, to share their takes below:
“As I've written about before, I'm really into capacity building.
Funny enough, a Coefficient Giving career development grant and the GovAI fellowship were very important inputs into my current career trajectory. I probably would've eventually found my way into AI governance work regardless, but these programs jumpstarted my career and turned me into a useful contributor much faster than I otherwise would've been.
On the grantmaking side, I funded a number of projects where capacity building was a core part of the theory of change, and I've seen results that have been genuinely exciting.
If I could wave a magic wand to reorganize talent allocation in the AI safety community at my whim, I'd move a decent number of people currently in research and policy roles into capacity building. I think it's that underrated.”
“I co-sign this post. There's so much to do to make the world more ready for transformative AI, and the ecosystem is full of projects that need a founder or are a couple more great hires from being much more impactful. We desperately need more talented and motivated people to keep showing up. Also, for me and I think for many others, the work can be deeply rewarding -- it often has more social contact and shorter feedback loops than other types of work.”
"I agree with Asya's post and think that capacity building work is underdone and underrated. One delta is that I would emphasize the importance of capacity building type work by people who are doing object level work in the field. Both that I think that doing object level work is complementary to capacity building but also that people doing object level work should spend a larger fraction of their time doing/helping with capacity building."
Asya: I'd broadly be interested in you giving your take on the kind of work that my team funds.
Buck: I don’t know the current distribution.
Asya: Our biggest grantees are MATS, CEA, Constellation, BlueDot, LISA, Tarbell, 80K, FAR AI's events, a bunch of university groups, and a bunch of other stuff.
Buck: Many of those seem pretty good. I think that overall, trying to do capacity building where you try to cause people to think through a bunch of issues related to transformative AI, especially having people with scope-sensitive beliefs relate to it-- I think that kind of work has gone quite well historically and put us in probably a much better position than we'd be without it. I'm excited for that work happening on the margin and I feel like every year we're somewhat better off because of capacity-building that was done that year or the previous year. Or like projects done by those organizations. That all seems great.
Asya: A claim I make in my post is that ‘many of the marginal hires at larger organizations doing technical or policy work right now (including e.g. Apollo, Redwood, METR, RAND TASP, GovAI, Epoch, UKAISI, and Anthropic’s safety teams) would be capable of founding or being an early strategy-setting employee at a top capacity-building organization, and would have more impact by doing so.’ I'm curious for your immediate takes on that proposition.
Buck: I don't know how many of them have that capability. I think if they have that capability, they should strongly consider doing so.
Maybe something is like-- I think MATS and Redwood represented two different kinds of philosophies on how to increase the technical AI safety research done. And I think it's very unclear which one-- I think MATS looks at the very least competitive. It's been involved in the production of a huge amount of AI safety research that I'm happy exists. And a heuristic that would have suggested you shouldn't work on MATS early seems to have gotten wrecked by posterity.
Asya: Cool, those are the main questions I want to ask you. Any other commentary you'd want to include here?
Buck: Capacity-building work seems good. I encourage Redwood staff to participate in capacity-building work; I think it's worth their time on the margin. I'm going to be involved in a bunch of it myself.
My post in large part focuses on the case for successes from capacity-building, but I do think there are a number of mechanisms through which work in the capacity-building category can do harm, e.g. by misrepresenting key ideas to broad audiences, alienating people who would otherwise have been sympathetic to this work, or empowering individuals who ultimately make the ecosystem worse. While I think these effects are real and material, my overall view is that the negative impacts in the space have likely been substantially outweighed by the positives, and my expectation is that most efforts in this space executed by thoughtful, high-context individuals will be very positive in expectation, such that I feel good about publishing broad encouragement to pursue this work on the current margin.
Without going into detail, my intuitions here come from an overall assessment of the work done by global catastrophic-risk focused groups over the years, which my personal best guess is have been very positive on net, even accounting for substantial negatives (e.g. the actions of Sam Bankman-Fried). That said, I’ve heard a number of arguments for why that may not be the case, or for why certain large classes of efforts may have been disproportionately harmful, which I largely won’t cover here– ultimately, addressing these is not the main focus of this post, and if this feels to you like a major crux around your views on this kind of work, I encourage you to come chat with me about it in-person sometime.
I will briefly say that I think it makes sense to think about capacity-building work on the level of individual interventions affecting specific groups of people, and that I think being skeptical of certain work is compatible with being excited about others-- given that this work is (according to me) very high-leverage, I'd encourage even broadly skeptical individuals to think about whether there are specific interventions that it would make sense for them to pursue.
2026-03-10 10:30:52
A common source of friction within couples or between housemates is differing quality standards. Perhaps I hate the feeling of grit under my feet but my housemate who is responsible for sweeping doesn't mind it so much. If you do chores when you notice they need doing and stop when they seem done, this works poorly: the more fastidious get frustrated, and often stew in silence or nag. Even if it's talked about kindly and openly, doing a chore before it bothers you is harder and less satisfying.
When people set out to divide chores they're usually weighing duration and discomfort. These matter, but I think people should put more weight on the standards each person has, and generally try to give tasks to the person with the highest standards in that area.
If you divide everything this way, though, it will probably be pretty unfair: preferences are correlated, where someone who notices dirt on the floor probably also notices crumbs on the counter and that the recycling is overflowing. Some options:
Do chores on a schedule. We host a monthly event at our house, and there are things I clean as part of setting up. It doesn't matter whether the bathroom mirror looks dirty to me, I'll clean it because it's on my list. (But Julia will probably also clean it a few times over the course of the month.)
Bring your needs closer together. If one member of the couple does the laundry but the other always runs out of socks first, they could switch who does the laundry, or they could just buy more socks.
Decouple your needs. That same couple could instead switch to each doing their own laundry. Now if one person doesn't do it for a long time it doesn't impact the other.
Make the need more salient. If one person isn't noticing that something needs doing, you can address that directly. Empty the trash, but instead of taking it out you put it by the door they walk through to go to work. Accumulate dirty dishes on the counter (visible) and not in the sink (hidden). If you just start unilaterally increasing salience that's passive aggressive and probably doesn't go well, but if it comes out of an open-ended "what are some strategies we could use to make our chore division more fair" I expect that's positive.
Lower your standards. I know a few people who internalized a high cleanliness target as children, and benefited as adults from deciding to focus less on it. Often when becoming a parent: higher demands on time, letting high standards slip, realizing that actually it's not a problem. I could also imagine a sloppier person intentionally raising their standards, but that seems a lot harder, or else it's just something people around me have been less likely to talk about.
Hire someone. If one person cares a lot about having clean floors and the other person doesn't, neither of them enjoys mopping, and they have some money, they can apply (3) to solve (1) without running into issues with (2). I know couples and group houses who decided to pay for a cleaner to come every week or two, and found it massively reduced conflict.
This is an area where Julia and I used to have a substantial amount of conflict, and while things aren't perfect here I do think they're a lot better in part due to applying several of the above.
2026-03-10 09:55:21

Inventing evolution was hard. No one but the ancient Greeks and a scant few of their intellectual descendants made any progress on explaining where life came from till Darwin. Before that, the closest we really got to a modern understanding of evolution was Epicurus, and it took nearly two thousand years to make theory that was wholly better than his.
We know that because that’s what the writings of the ancients implied. And I’ll show you that by comparing their writings on the origins of life to (roughly) what Darwin knew. I’m not going to require a full mechanistic explanation. Even just a conceptual understanding of that species are formed by an ongoing selection on variation that is inherited and generated anew each generation through reproduction would be enough, along with a realization that life originated from raw matter.
Anaximander (c. 610–546 BCE): Anaximander had the idea that living beings differed in the past. Humans are frail when young, so the first humans could not have been unprotected babes. Instead, they developed inside fish-like creatures until they were capable of fending for themselves. The 3rd-century Roman writer Censorinus records:
“Anaximander of Miletus considered that from warmed up water and earth emerged either fish or entirely fishlike animals. Inside these animals, men took form and embryos were held prisoners until puberty; only then, after these animals burst open, could men and women come out, now able to feed themselves.”
Empedocles: Empedocles understood that species are selected for fitness, and there must be variation for this fitness to act over. He believed species are the result of a primordial, random combination of heads, bodies, eyes and limbs. Living beings changed and only those combinations which were fit for life survived and reproduced. From his poem On Nature:
“Here sprang up many faces without necks, arms wandered without shoulders, unattached, and eyes strayed alone, in need of foreheads. (Fragment B57)
Many creatures were born with faces and breasts on both sides, man-faced ox-progeny, while others again sprang forth as ox-headed offspring of man, creatures compounded partly of male, partly of the nature of female, and fitted with shadowy parts. (Fragment B59/B61)”
But he couldn’t conceptualize small enough variations or understand that variation came from sexual recombination and mutation. And where life came from went wholly unexplained.
Epicurus (341–270 BCE): Epicurus gestured at an explanation for the origins of life. Life arose from the random combinations of atoms. Those forms best suited to survival reproduced themselves.
“Nothing is created out of that which does not exist: if it were, everything would be created out of everything with no need of seeds.”
Given that we still aren’t sure how life originated from random combinations of atoms, I’d say he did remarkably well.
Lucretius (c. 99–55 BCE): Lucretius, unlike the rest of the ancient innovators on the origins of species, was not a Greek. Instead he was a Roman, and an intellectual descendant of Epicurus. He thought the young earth was so fertile that creatures spontaneously arose from it in random forms. Most forms of life could not eat, or reproduce, and so died out.
“Many monsters too the earth at that time essayed to produce, things coming up with strange face and limbs... some things deprived of feet, others again destitute of hands, others too proving dumb without mouth, or blind without eyes... Every other monster and portent of this kind she would produce, but all in vain, since nature set a ban on their increase and they could not reach the coveted flower of age nor find food nor be united in marriage.” (De Rerum Natura Book 5)
He emphasized the need to reproduce as core to a species thriving. Beyond that, those which had some strength, or cunning or utility to mankind would be better suited to life. So, there’s variety, selection, and an emphasis on reproduction.
“For we see that many conditions must meet together in things in order that they may beget and continue their kinds.”
However, Lucretius missed that variety is generated by reproduction and that selection is ongoing rather than an event which only occurred in the distant past. And yes, we can’t assume he understood that, because other early proponents of the development of species explicitly claimed that wasn’t the case!
Saint Augustine (354–430 CE): He argued that species of animal and plants, not individuals, emerge from water and earth and “develop in time... each according to its nature” — De Genesi ad Litteram (On The Literal Interpretation of Genesis). God set the potentiality of development of species, and likewise for man. I.e. species “grow up” according to some fixed schema. So there’s potential for change, but no selection over variation, no explanation of variation, life originating from raw matter etc. Just changing species.
2026-03-10 08:11:29
This is the second post in my chain of reflections on immortality, where I will present counterarguments to existing objections or misconceptions about life extension. I recommend reading the first part.
***
Q3: What if progress stops? (addition)
A: New ideas do not require new corpses. That is not a humane approach. A new paradigm usually wins not because the old one dies out, but because it offers better explanations, better tools, and a better quality of life.
Imagine a composer with 180 years of practice, a philosopher with 220 years of dialogue between eras, a director who has witnessed five technological revolutions, a scientist who personally carries to completion longitudinal studies that were begun a century earlier.
That does not sound like stagnation. It sounds like the possibility of depths of understanding and mastery that we never seen before.
O5: Does death give life meaning?
A: To me, this is one of the biggest misconceptions about immortality, life and death.
Do you really sometimes think: “Oh, soon I will start falling apart, experience chronic fatigue and pain, and then disappear forever. How inspiring!”?
Would life really lose its meaning if you knew that a thousand, or many thousands, of years of life and possibilities lay ahead of you? It seems to me the opposite is true.
The only thing death motivates me to do is fight against it. So that I can keep living, creating, enjoying life, so that all of this does not disappear. I do not believe I would stop striving toward other goals if I became immortal. Those goals are not connected to death, so why should death affect them?
If I want to play the guitar, then I want to play the guitar. If I want to write a book, then I want to write a book. A rose is a rose is a rose, that’s all. I do these things not because I will die, not because I have to “make sure I try them in time,” but because I simply want them.
“Death makes life valuable” is absurd.
If someone told us that our phone would always work well and never become obsolete, would we stop valuing it?
If the risk of dying gives life meaning, then does that mean the older or sicker a person is, the more valuable their life becomes? So if you had to choose between saving a 110-year-old man and a little boy, would you choose the 110-year-old man?
Is an infant’s life valuable only because he could easily die? I think it is valuable because he has many years of potentially happy life ahead of him. Death has nothing to do with that value.
In childhood we do not think about death, often we do not even know about it, and yet we still rejoice in life. In fact, we often rejoice in it far more intensely than in adulthood.
In reality, we value life not because it can be taken away, but because it contains love, beauty, knowledge, the possibility of joy, and creativity. Death does not create those qualities; it simply cuts them off.
If death really were what gave life meaning, then why do we consider murder or terminal illness to be bad?
Death can intensify a sense of scarcity. The feeling that you do not have much time left and need to accomplish a great deal.
But that is not meaning — it is anxiety.
Deadlines, as we know, do not protect against procrastination. If they mobilize our resources at all, it usually happens closer to the deadline itself, rather than making us productive throughout the entire time allotted for the task.
Finally, let me quote a random commenter from the internet:
“I mean, c’mon! Death as a motivator? Seriously? Death doesn't even motivate people to stop smoking! Do people actually believe that everyone would just sit around watching TV if it weren't for death? Oh, wait, most people do that now. Ha! Some motivator!” [1]
O6: Will there be inequality between the rich and the poor?
A: Injustice in distribution is a political problem, not a problem with the good itself.
By the same logic, one could say antibiotics are bad because at first they were not available to everyone; the internet is bad because it was initially elitist; organ transplantation is bad because waiting lists are unfair.
Technologies usually reach everyone over time. Computers and mobile phones were once inaccessible to ordinary people too. Today, a poor person in Europe lives better than a king did three hundred years ago.
Second, aging is unlikely to be solved by one single intervention. It is far too complex a problem for there to be one universal panacea. By the time intervention number two appears, intervention number one will already have every chance of being available to the wider public.
And in order to develop a hypothetical “vaccine against aging,” it would still be necessary to conduct preclinical studies and then three phases of clinical trials according to all the proper rules — something that cannot be done in complete secrecy. That is simply impossible if manufacturers want to sell their drugs legally.
Creating a treatment for aging really is profitable: billions of people would want to use it, which is more than the customer base of any medicine that has ever existed before.
Finally, I want to quote a passage I took from another life-extension website:
“A large part of the world's population still live hand to mouth. They cant afford clean drinking water or basic sanitation.
Basic medical conditions that we all take for granted are not currently available to a large part of the world's population. This inequality has existed for thousands of years already. Why should the emergence of any new technology challenge this reality any more than the discovery of antibiotics, water treatment or basic sanitation did.
Children still starve to death or die of basic treatable diseases every day. Right wrong or indifferent this is reality. We have not as a race been able to solve this situation in the past. Why though should this stand as any form of impediment to the progress of medical science. Why should i die before I have to because an international inequality that has existed since the dawn of civilisation makes science morally bankrupt for seeking answers.
Any argument that cites the lack of global availability for life extension technology as an impediment to progress is, in my mind emotive and out of touch with reality.” [2]
O7: An eternal dictator?
A: I am not a political psychologist, of course, but I think that sometimes a short life may actually intensify greed, dynastic thinking, and the struggle for urgent accumulation. A long life may have the opposite effect.
But in any case: how many stories do you know in which a dictatorship ended because the dictator died from causes related to aging?
He simply died, everything ended, people started living happily, and democracy arrived. It seems to me that even if such stories exist, they are clearly not the dominant pattern.
Simply waiting for a dictator to die is a bad strategy for fighting authoritarian regimes.
O8: What about institutions, work, and retirement?
A: What if our familiar cycle of “school – university – work – retirement” breaks down? I would say: great!
Even now, that cycle fits reality poorly: people change professions, study in adulthood, and return to the labor market. You have probably experienced the difficulty of choosing a profession at a young age yourself, because according to that old model you were supposed to choose once and for all — and even now, people can still be shamed for trying to find themselves or for leaving a position.
And older people entering retirement do not always have it easy either. Some grew up within this linear model of development and devoted their lives to a single vocation, which they now can no longer practice. The meaning of life may simply disappear, and a person may find themselves alone and miserable in a rocking chair.
The linear model of life is a product of the industrial era. It was convenient when life was shorter, work was more standardized, and education was rare. A longer life would allow us to have repeated cycles of education and many career possibilities.
But that is only if you look at the world as a static picture. In reality, AI and robotics are not going anywhere, and it is obvious that the labor market will at the very least change radically in the coming years, if it does not disappear almost entirely.
UBI — universal basic income — may emerge. There would be no need for a separate category of “retirement”; income would always be there, and scarcity would recede into the past. This idea has both pros and cons, but since this is an FAQ on immortality, I will not go deeper into it here and will leave it for the FAQ on AI.
As this point and the previous two show, social problems are determined not by biological age as such, but by rules of access to power, property, education, career transitions, the structure of the economy, and so on.
Death today may function as a crude compensator for bad institutions, but the problem is not the length of life — it is the structure of our society.
***
That is all for today. If I made mistakes anywhere or offered weak counterarguments, I would be glad to hear your comments and suggestions on how to strengthen them. Wishing everyone an immortal future!
https://qr.ae/pCLZia
https://www.fightaging.org/archives/2006/02/death-for-everyone-before-inequality-for-anyone/
2026-03-10 05:14:26
Epistemic status: This work was done as a 1-week capstone project for ARENA. It highlights several areas of research we’d like to explore further.
Chain of thought (CoT) monitoring provides alignment researchers with a proxy for models’ internal reasoning, under the assumption that models’ CoTs are representative of how they arrive at their answers. In this project, we investigated scenarios where models’ reasoning is difficult for humans to understand, or may otherwise be unrepresentative of their internal states: first, by explicitly constraining CoT via prompting and logit masking, and second, via RL optimization pressure on CoT.
We first investigated whether models are capable of reasoning in a constrained way, and what impact this has on task performance. If models can succeed at their tasks and simultaneously control their external reasoning, monitorability suffers.
A priori, it’s not clear whether reasoning constraints would make tasks easier or harder when compared to no chain-of-thought. Would models struggle to complete two tasks at once, both solving the question and “translating” their internal reasoning to the desired format? Or would they still benefit from the ability to reason, even subject to constraints?
We began by prompting frontier models to solve various tasks under several reasoning constraints: rhyme (as judged by Claude Haiku 4.5), short words, a small vocabulary, and numbers only.
Since our ARENA project, OpenAI has published similar research finding that models have limited ability to reason in constrained ways, based on the percentage of answers that met the criteria. We took a different approach, stripping any non-compliant text from the model's reasoning and measuring the percentage of correct answers. Our approach is imperfect, as removing tokens mid-reasoning likely disrupts the model's expected context in ways that go beyond the constraint itself. However, simply marking answers wrong when any non-compliant tokens appeared penalized reasoning traces that were largely compliant.
We evaluated Opus 4.5 and GPT 5.2 on MorehopQA and the gen-arithmetic dataset used in Redwood’s research on filler tokens, normalizing to a baseline of 0 with no chain-of-thought and 100% with unconstrained chain of thought. (We tested the numbers-only constraint only on MorehopQA, where it’s less natural than on a math dataset.)
In both cases, we found that even constrained reasoning improved performance compared to no reasoning. However, it’s not clear why Opus 4.5 and GPT 5.2 differed so much in their ability to complete these tasks. The only constraint that appeared no better than baseline was reasoning in numbers only (for MorehopQA). We observed that models prompted to reason in numbers alone often appeared to confuse their reasoning with their final answer, selecting an intermediate number rather than the correct one.
When models were prompted to reason in a specific way, most of their CoT text was compliant with the given constraints, but what if the chunks that weren’t compliant were driving performance? To find out, we implemented a logit mask that ablated any logits not compatible with the constraint, preventing the model from outputting any disallowed tokens.
We ran this experiment on Qwen2.5-7B-Instruct (frontier models don’t provide logit access) and the GSM8K dataset, prompting the model to reason only in emojis. We selected emojis partially because the allowable tokens were easily identifiable, and admittedly because the resulting transcripts were fun to read. In practice, we still had to filter out some samples when the response couldn’t be parsed into reasoning sections and answer sections with the use of <reasoning> tags.
As we saw with frontier models, even highly constrained reasoning improved performance compared to no reasoning at all (before filtering). The logit mask had almost no impact, suggesting that the model’s occasional constraint violations didn’t aid task performance.
We think it’s unlikely the model is translating its internal reasoning process to semantically meaningful tokens, and using those tokens to help solve the problem. If this were true, we’d expect to see many transcripts like the (real) one below, and which have higher accuracy on average than seemingly meaningless transcripts. However, we found little evidence of this aside from a few cherry-picked transcripts.
Our best guess for these results is that “translating” the internal reasoning process to tokens that meet the given constraints competes for computation with actually solving the problem. The more out-of-distribution the constraints are, the more the “problem-solving” part of the model is shifted away from its most performant mode.
Why, then, would constrained reasoning still perform better than no CoT? We think almost any sequence of characters affords additional computation, compared to answering immediately. This hypothesis is supported by recent Redwood Research results, which found that frontier models can use uninformative tokens to achieve higher performance on math datasets. We partially replicated these results, finding that GPT 5.2 and Opus 4.5 achieve higher performance on the GSM8K and MorehopQA datasets when the prompt is repeated multiple times. This work studies tokens added to the prompt rather than tokens the model generates, but it suggests that current models can use tokens for more than just their semantic meaning.
It’s also possible that the model performs better when instructed to reason than when instructed to answer immediately, even controlling for output length. Future work could quantify the relationship between output length and benchmark accuracy, to prompt models to output a string of filler tokens before answering, or experiment with different prompts to investigate the impact of being explicitly instructed to reason.
What we’d do next: Lanham et al introduced the concept of encoded reasoning in 2023, finding null results at the time. They demonstrated that replacing the CoT with filler tokens or truncating it early resulted in poor performance, and that paraphrasing the CoT had no impact. However, Redwood’s results suggest that these experiments might yield positive results on today’s frontier models.
We also investigated what training-time conditions could allow encoded reasoning to arise. We hypothesized that reward hacking under optimization pressure could cause encoded reasoning as a side effect. Many researchers, including us, believe that optimizing CoT in a realistic environment is dangerous, because it makes CoT less monitorable. However, frontier models likely are already facing CoT length pressure, since labs reduce their costs if they can saturate a benchmark with fewer tokens. Since frontier model reasoning traces are often no longer available via API, training model organisms of encoded reasoning could help us better understand how this behavior might arise.
We first attempted to finetune models to learn to reason in emojis, choosing RL over SFT to allow the model to learn its own encoding scheme. Models were rewarded for the percentage of their reasoning containing emojis, conditional on outputting the correct answer. OpenAI flagged our finetuning requests for safety reasons, so we turned to finetuning Qwen2.5-7B-Instruct.
Eliciting the desired behavior required a bit of experimentation. Reasoning in emojis is quite constrained, so we chose an easy math dataset (MAWPS) to ensure rewards were sufficiently frequent. Inspired by a colleague’s work investigating hidden reasoning, we experimented with allowing p% of the reasoning to be unconstrained, reducing p over the course of training. However, the model struggled with instructions to put a percentage of its reasoning in emojis: it would either output all emojis (if prompted strongly) or very few (if prompted weakly), again leading to sparse rewards.
We had hoped the model would learn to translate the problems to emojis, for example, representing a word problem with 5 apples using 5 apple emojis. Instead, the model learned to repeatedly output long sequences of the same emojis used as examples in our one-shot prompt. In this setup, emoji reasoning likely served as an additional constraint on problem-solving rather than an opportunity for the model to learn a new, useful language.
We also finetuned Qwen2.5-7B-Instruct with an explicit length pressure on its CoT, to mimic a more realistic way encoded reasoning could emerge. Unsurprisingly, it quickly learned to shorten its chain of thought, and even to encode aspects of the questions. For example, it shortened names in the GSM8K dataset to single letters, choosing other letters or symbols in case of duplicates:
This learned behavior also generalized to a free-response dataset, HotpotQA. One such example is as follows:
However, these aren’t truly examples of encoded reasoning, as the mapping from concepts to strings is still clear from reading the CoT. Encoded reasoning is a complex behavior, and length pressure alone may not be sufficient to elicit it. We think it’s more likely to arise in situations where CoT contents, not just length, face optimization pressure; for example, Skaf et al find evidence of steganography when specific strings are penalized in the CoT.
One realistic example of optimization pressure on CoT contents is Deepseek R1: R1-Zero was trained to reason until it arrived at the correct answer, but its outputs were hard to understand. R1 was further finetuned from R1-Zero using RL to enhance readability and prevent language mixing. Future work could investigate whether R1 learned any encoded reasoning behaviors along the way.
What we’d do next: Inspired by a previous ARENA capstone project, we’d like to continue exploring whether RL finetuning can elicit other forms of steganography. We plan to explore an environment where two models establish a communication protocol, and then communicate a hidden password while attempting to prevent an eavesdropping model from being able to decode it. Cryptographic solutions for this are well known; can models learn to apply these techniques to transmit information? If this works, future work could extend the project by training the model to explain its encoding process: given an explanation and an encoded password, the base model should be able to decode the message, but the base model should not be able to decode the password given either the explanation or the encoding.
Models may be able to use their CoT in ways that aren’t fully apparent to humans, and we should further investigate how reward hacking could elicit this behavior. Frontier labs should evaluate whether this is already happening in models whose reasoning traces aren’t publicly available, and the safety community should continue to invest in monitoring strategies and interpretability work to assess CoT faithfulness.