MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

Contra Myself on Free Will

2026-03-10 14:29:43

I recently wrote a piece defending free will from Sam Harris's critique. The core argument was compatibilist: that free will and determinism are compatible, and that free will is best understood as a deliberative algorithm of an agent weighing options and selecting among them based on one's reasons and values. The causal chain traces back through factors you didn't choose, but the algorithm is still yours. I also argued that because no one has ultimate authorship, retributive punishment and hatred don't make sense—accountability should be forward-looking.

I thought some more about it and I think some of it holds up. But I also think I let myself off easy in a few places and didn't give sufficient weight to some aspects of the argument, so here's a response to myself.

It’s Not All About How People Use the Word

Your arguments about how people use the term “free will” are only semi-convincing. You leaned hard on common usage: people feel like they make choices, they report evaluating options, so free will is real in the sense that matters. Fine. That's a valid argument (I mean, I did write it), but that proves less than you made it sound.

People can be wrong about what they're experiencing. Someone who genuinely believes she's psychic isn't made psychic by the sincerity of the belief. We don't say "well, that's how she uses the word, so she really is seeing the future." We say she's having some experience (fooling herself into thinking she can predict the future) and mislabeling it (as really being psychic).

My point isn't that free willers are making a falsifiable empirical claim the way the psychic is. It's that sincerely experiencing something doesn't settle what that something is. Yes, people have a feeling of control when they deliberate. Yes, they describe it using the language of free will. But maybe they're doing exactly what the psychic is doing—having a real experience and attaching the wrong label to it. The feeling of choosing is real. Whether that feeling is free will, or just something that gets mistaken for it, is a question you skip when you focus on language usage.

You’re Not Taking Determinism Seriously

If people really grokked what it means to have a determined future, I don't think they would say they have free will. Here’s an experiment: Take some young college freshman. He's wide-eyed, idealistic, convinced his life could go in a thousand directions. Maybe he'll start a company. Maybe he'll move to Japan. Maybe he'll drop out and write a novel. The world feels open to him, and that openness feels like freedom.

Now sit him down on his college dorm twin mattress and show him photographs from his future. Here's your graduation—you stuck with accounting. Here's your first job at Intuit. Here's your wife, Sarah. Here are your two kids, Ethan and Ethel. Here's your house in Plano, Texas. You'll be mostly happy, by the way. It's a good life.

Ask him how much free will he’s feeling.[1]

The feeling of open possibilities that excited him five minutes ago will have been replaced by something closer to watching a movie he happens to star in. He's going through the motions of a life that was always going to happen. But his future is no more determined than it was five minutes ago. The only change is his level of ignorance about it.

Now, as we just saw, a real experience can get the wrong label. Just because the feeling of free will requires ignorance doesn’t mean that free will dissolves without it.

Fair enough, but if the feeling of freedom is completely independent of whether you're actually free, then the compatibilist has to admit that the entire phenomenology of choice—the thing that makes free will matter to people—is essentially a byproduct of ignorance. You can keep calling the underlying causal process "free will" if you like, but you've just acknowledged that a lot of what people actually value about the experience is an illusion. That's a strange place for a defense of free will to land.

So if you’re going to rely on people’s intuitions, you’ve got to admit that you’re relying on their intuitions while in a state of ignorance. The feeling of free will tracks with how much you know about your future actions, not with how free you are. Give someone total knowledge of their future and the feeling disappears, even though the causal structure of the universe hasn't changed at all. I’m not saying the whole algorithm collapses without ignorance, but certainly the feeling of it does.

You Haven't Closed the Door on Harsh Punishment

In your original piece, you treated the case against harsh punishment as largely settled—if no one has ultimate authorship, retributive justifications simply don't hold. But this is not so. Even if we grant your premises, some moral frameworks can still justify harsh punishment without relying on retributivism at all.

For example, a utilitarian might argue that when someone murders a publicly beloved figure, the community's need to see a severe response is itself a real psychological good. The justification here is forward-looking—perhaps people heal better and can move on when they feel justice has been done. The punishment looks retributive, but the reasoning isn't.

Of course, satisfying this desire could still be net bad for society. Historically speaking, societies that have fed the public's appetite for punishment through public executions and the like have generally seemed like worse places to live.[2]So the argument might still ultimately fail on utilitarian grounds. But notice that now you’re having an empirical disagreement, so it’s clearly not a settled matter.

My point is only that the absence of ultimate authorship doesn't close the door on harsh punishment under every moral framework. It eliminates retributive justifications, but that’s just one argument against harsh punishment, not the last word.

Where Is The Love?

In your previous piece, you spent a lot of time on hatred. If determinism is true, you argued, we don't have to hate people for their worst acts. We can still hold them accountable, still protect society, still call behavior “wrong”, but there is no sense in hating anyone. You thought you were giving something up gracefully while keeping what mattered. It was a very tidy arrangement. But where I come from coins have two sides.

If a murderer doesn't ultimately deserve your hatred because his cruelty traces back through an unbroken chain of genes, upbringing, and circumstance he never chose, then, by the same logic, a saint doesn't ultimately deserve your love either. Not the deep kind. Not the kind where you look at someone and think, "What a genuinely good person they are, all the way down. They deserve good things." They might be that good of a person, but how can they deserve love for it any more than an evil person deserves hatred?

You admire Gandhi. I admire Gandhi. He is among the most admired people who has ever lived. But by the logic you stated, Gandhi's compassion and courage were products of a particular genetic hand and a particular environment, none of which he selected. Gandhi didn’t really do anything all that special, once you correct for his genes and environment. Anyone would have done it, even you. Maybe you deserve a statue in your honor. The plaque could read: "Had he had Gandhi's genes and been placed in Gandhi's situation, he, too, would have done great things."

Perhaps you could still salvage a universal love, like a love for all conscious things or something. I'll grant that, but that applies to serial killers as much as human rights leaders.

You said in your original piece that you can still prefer kind people, that you're allowed to like being around someone who makes you laugh and treats you well. But if you call that “love” notice how thin it's gotten. It's the love of someone who is useful and pleasant to you, which is roughly the same way you love a good sofa. It's not the love where you think another person deserves to flourish because of who they are. They're a very nice sofa, but they didn't upholster themselves.

This is, I think, considerably more unsettling than the punishment side, which is where everyone focuses. And understandably so—blame is where the stakes feel highest, where people go to prison or get forgiven. But we spend far more of our lives loving people than hating them, admiring people than condemning them. If determinism hollows out hatred, it hollows out love by exactly the same logic. You don't get to keep one and discard the other, however much you might like to.

But that's precisely what you tried to do. You softened blame while leaving love and admiration quietly in place. The honest move is to admit that the same wrecking ball swings both ways. Simply removing everything that constitutes blame causes collateral damage elsewhere.

What Are You Doing with Your Moral Agency?

I think there’s a tension within your claims that you haven’t addressed. On one hand, you say that people don't deserve retributive punishment because they are not the ultimate authors of their actions. You say that punishment should be forward-looking, such as keeping dangerous people away from society. On the other hand, you say that people are moral agents.

So, what does moral agency actually do?

Consider a case where there is no forward-looking justification for punishment. Imagine a woman decides she doesn't want her children anymore and kills them. She then gets irreversibly sterilized. She will never have children again, so her chance of reoffending is zero. No one else knows about the crime except a single police officer, so there is no deterrence value in publicly prosecuting her. She has no criminal history, mental illness, or other factors. She needs no treatment, no rehabilitation, no monitoring.

On a purely forward-looking account, there’s nothing to do here. And if you try to wriggle in some forward-looking justification we’ll just change the scenario so that it doesn’t apply. The point is, the forward-looking framework says let her go. And that answer feels monstrous.

Why? Because when the woman killed her children, she didn't just break some procedural rule. She killed people who had moral standing. Those children were moral patients with welfare, futures, and the capacity to suffer, and her deliberative algorithm weighed all of that and chose to destroy them anyway. The moral reality of what she did doesn't reduce to a forward-looking management problem.

This raises practical questions you haven't answered. Is the woman's calculation—weighing her children's lives against her freedom and deciding her freedom was worth more—a moral claim that society must rebut? Should society impose a cost that outweighs the gain she sought? Should it tell her that she cannot trade another’s life for her convenience and come out ahead?

There's a vast space between "you deserve to suffer for what you are" and "you're just a system we need to manage." You haven’t explored that middle space.

And the forward-looking framework fails for another reason. If punishment is justified entirely by the probability of future harm, then the only thing that matters is how well something predicts future harm, not whether the predictor itself is a crime. It might be the case that having committed a crime correlates more strongly with future harm than anything else, thus justifying some form of response for criminal activity. But this is an empirical claim, and we had better be prepared for it to go the other way. If it turns out that having a bad childhood or a face tattoo predicts future offending just as well as having committed a burglary, then a purely forward-looking system should treat them equally. Are you prepared to incarcerate people for bad childhoods?

You talked a lot about how having free will means you’re responsive to reasons, so you could say that moral agents are the kind of entities on which reasoning-based interventions work. You can explain to a person why they were wrong. You can use reason to deter them. You can't do that with a bear. So perhaps moral agency tells you which tools are available, not whether someone deserves anything.

But this reduces moral responsibility to something almost entirely instrumental. It just tells us what type of intervention would be effective on a given system. Calling it "moral responsibility” feels empty. It would be more honest to call it “reasons based deterrence susceptibility”.

And notice what happens: we end up treating the woman who killed her children the same way we would treat a bear that mauled a hiker. We don’t blame the bear. We don’t sentence it for its crimes. We create a management plan for it. We put a dangerous bear down or relocate it for public safety. If we incarcerate a dangerous person only for public safety, then we’re treating them the same. In both cases the justification is forward-looking harm reduction. If there’s no difference in the response to a bear, which lacks moral agency, and a human, who has it, then moral agency seems to be little more than a farce.

When people say someone is "responsible" for a terrible act, they mean something beyond "this person's cognitive architecture is amenable to deterrence." They mean something backward-looking—this person should have done otherwise, and there's some appropriateness to a negative response directed at them for what they did.

But you're trying to preserve the language of moral agency and responsibility while draining it of the backward-looking content that normally gives it force. Without some backward-looking element, I’m not sure these terms have much meaning. When I blame you I’m not just trying to strategically intervene on your behavior. I'm making a claim about you as the agent you are. I'm saying: your algorithm had the capacity to weigh moral considerations, and it didn't weigh them properly, and that failure is attributable to you in the proximate sense that the essay has spent thousands of words defending as a real and meaningful thing.

Guilt is the internal response of a moral agent recognizing their own failure. Indignation is the response of someone wronged by a being capable of having done otherwise in the Could₁[3]sense. Forgiveness is real because there's a genuine moral debt to release. These responses are partly backward-looking and irreducibly so. Without that backward-looking element, the entire fabric of the moral community dissolves into mutual behavioral management.

Is there a way to rescue yourself from this?

You've done a lot to establish moral agency. You've talked about the deliberative algorithm, about reflecting and revising oneself, about becoming an agent. You claimed free will grounds praise and blame. Let's say we accept all that. Then when you act wrongly despite having the capacity to process moral reasons, something is true of you as an agent that is not true of the bear. There remains some form of desert—call it “agent-desert”—that falls directly out of proximate authorship. It says: you, the integrated deliberative system, produced this action through your own evaluative process, and that action was wrong, and that wrongness is attributable to you. Not to the Big Bang, not to your genes, but to the agent you've become.

Could this work? Is agent-desert what gives moral emotions meaning? Can it ground the whole ecosystem of responses, like guilt, indignation, and forgiveness? Not because the universe demands someone suffer, but because these are the correct responses between beings in moral relationships with each other.

So, I ask you again: what does moral agency actually do? Can it change a prison sentence? If it doesn't ground some form of backward-looking appraisal—if it just tells us which management tools to use—then it's not doing the work you claimed it does.

You need to admit that this is a backward-looking notion. Agent-desert says that what you did matters, that your relationship to your past action is morally significant, that the right response to a wrong isn't just a management strategy but a recognition of what occurred between moral agents. The moment you accept that, you've conceded that purely forward-looking punishment was never adequate. A purely forward-looking framing is inadequate not because it's too lenient, but because it's the wrong kind of response to a moral agent.

You wanted moral agency to matter. You argued for it extensively. But then you left it with nothing to do. Agent-desert at least gives it something to do. Whether this is right or sufficient is still up in the air. But it shows that the purely forward-looking framework was never going to get you there.

Where Does This Leave Me?

I still think the core compatibilist insight is right—that deliberating and being responsive to reasoning is fundamental to free will. I think we need to preserve a meaningful distinction between a person who acts and a boulder that rolls. Those parts remain.

But I think I kept the parts of the moral landscape I liked and tried to surgically remove the parts I didn't, as if the wrecking ball could be aimed. And I left moral agency standing at the center of my framework with no job to do, which is worse than not having it at all. An ornamental load-bearing wall.

I’m not sure if agent-desert is the fix. Does it all collapse into the retributive cruelty I was trying to escape? I'm not sure yet.

  1. And if you have to blank his mind MIB-style afterwards so he can’t make new decisions, fine, whatever. You get the idea. ↩︎

  2. Though of course comparisons across time are difficult to make, so we shouldn’t put too much weight on it. ↩︎

  3. Could₁ was defined in the previous essay as “could have done otherwise if my reasons or circumstances were different”. ↩︎



Discuss

Monday AI Radar #16

2026-03-10 11:14:26

The conflict between the Department of War and Anthropic has quieted somewhat, but nothing has been resolved and a catastrophic outcome is still entirely possible. Regardless of what happens next, two things are very clear.

This is the least political that AI will ever be. Politicians are finally waking up to the fact that AI is a big deal. Even though most of them don’t understand why it’s a big deal, you can safely assume they will have an increasing appetite for government intervention. The DoW incident is a preview, not an aberration.

This is the least stressful that AI will ever be. The last two weeks have been brutal: I notice several of the writers and thinkers that I most respect have been publicly struggling and in some cases decompensating. I’m afraid the pace is only going to get faster, and the stakes are only going to get higher. Pace yourselves.

In the spirit of pacing ourselves, we’ll cover what we need to cover about DoW, then put it down and move on to happier topics.

Top pick

The Future We Feared Is Already Here

For years now, questions about A.I. have taken the form of “what happens if?” […]

This year, the A.I. questions have taken a new form, “what happens now?”

Ezra Klein’s opinion piece in NY Times ($) is nominally about the conflict between the Department of War and Anthropic and his analysis of that situation is spot-on: this is possibly the best short piece on that topic. But that conflict is a symptom of a much deeper problem: we’ve gone from being unprepared for AI capabilities that are coming soon to being unprepared for AI capabilities that have now arrived.

AI profoundly changes the nature of government surveillance—it’s now possible to intensively surveil every single American in a way that was previously (sort of) legal but completely impractical. In a sane world, the US Congress would carefully consider the implications of that change and pass appropriate legislation that codifies a reasonable balance between security and privacy.

Lamentably, we don’t seem to live in that world. Plan accordingly.

New releases

Gemini 3.1

Zvi reports on Gemini 3.1. It’s a great model, but Google DeepMind just isn’t quite keeping up with Anthropic and OpenAI. Image generation is state of the art, but aside from that there’s no good reason for most people to pick Gemini as their daily driver.

Department of War vs Anthropic, part 1

Let’s start with some of the most interesting pieces from the past week.

Ezra Klein interviews Dean Ball

Obviously a conversation between Ezra Klein and Dean Ball ($) is going to be good, and this one exceeds expectations. Dean is both highly-informed about the political situation and deeply thoughtful about the deeper implications of what’s happening here.

Zvi reviews the situation

Zvi summarizes the state of play as of March 6.

Zvi: A Tale of Three Contracts

There’s been a lot of discussion about what the contracts between DoW and Anthropic / OpenAI actually mean. If you want to go down that rabbit hole, Zvi does a great job of breaking down what we currently know. See also Tom Smith’s analysis.

I’m glad people are doing the important work of scrutinizing these contracts and doing their best to ensure that they establish clear legal boundaries. But ultimately, legal documents can only do so much. If you don’t trust the three letter agencies not to spy on you in the first place, you probably shouldn’t trust them to honor a contract.

Pirate Wires talks with Emil Michael

Much of the AI world has been highly critical of DoW’s recent actions, for obvious reasons. Pirate Wires’ conversation with DoW’s Emil Michael (partial $) is the best piece I’ve found in support of DoW’s position—there’s a lot I don’t agree with, but it’s more reasonable and coherent than many of the straw men being tilted at online.

Department of War vs Anthropic, part 2

The immediate consequences of the situation are bad enough, but the long-term collateral damage will be even worse. A lot of individuals, companies, and countries are going to look at the events of the last two weeks and start quietly making contingency plans that ultimately weaken both America and the entire AI industry. Nobody is well-served by any of this, and the longer the situation drags on the worse the fallout will be.

Here are two early examples—I’m certain many similar conversations are happening behind closed doors.

Can You Poach A Frontier Lab?

In the wake of the conflict between DoW and Anthropic, Anton Leicht considers whether it’s feasible for one of the middle powers to “poach” a frontier lab. He concludes it isn’t realistic to outright move one of the big labs outside the US, but proposes some intermediate strategies:

Stepwise and subtle, however, is a possible way to do this: understand the project of ‘poaching’ a frontier lab not as an attempt to extract value from the U.S., but to diversify the Western stack to make it more resilient to transient political trends and disruptions. My broader claim here is simple: it would be good for the world if a sizeable minority of American developers’ compute, business activity, and government cooperation were located in allied democracies. That could be about Anthropic, but I’d be just as happy with OpenAI or Google DeepMind. In a pinch, I might even take Meta. That outcome is eminently reachable and obviously beneficial in the aftermath of the Anthropic/Pentagon saga—and it’s never been more clear to the frontier developers that some hedging might be in their very best interest.

Can you nationalize a frontier AI lab?

The DoW / Anthropic dispute has rekindled serious discussion about the US government nationalizing frontier AI development. Much of that discussion has focused on legal, political, and philosophical questions, but there hasn’t been much serious discussion of the practicalities.

John Allard dives into the nuts and bolts of nationalization, considering what strategies the government might use and whether those strategies would actually work. He isn’t optimistic about the outcome (which doesn’t mean it wouldn’t happen anyway):

It was always an inevitability that the government would try to exert control over frontier AI. The problems arise when the government begins exerting control without understanding that the frontier is a living process, not an asset. At some point the frontier may commoditize enough that tacit knowledge stops mattering and the government can brute-force its way to capability. But we’re not there yet. And until someone can answer the harder question — whether the US is better off accepting less control in exchange for maintaining its lead — the risk is that every attempt to capture the frontier is what finally kills it.

Agents!

What did we learn from the AI Village in 2025?

AI Village is the sensible, grownup version of Moltbook. A team of frontier AIs are assigned a group project and attempt to tackle it in full view of an amused world. Recent projects have included fundraising for charity and writing a blog. While there are elements of robot reality TV here, it’s an interesting way of exploring agent capabilities in the real world. Of particular note, it gives us information about how well a diverse group of frontier agents can work together (that’s going to be a big deal by the end of this year).

As you might expect, the agents made a lot of progress last year:

In the AI Village, we’ve observed substantial improvement in agent capabilities over the span of months. Early 2025 agents often fabricated information, got stuck, or became easily distracted in a few minutes to hours. Late 2025 agents tend to be more truthful and stay on task longer (though their effectiveness often drops off once the most obvious tasks are done).

As we may vibe

Jason Crawford reflects on recent progress in agentic coding. There aren’t a lot of novel insights here, but it’s a great overview and a strong choice for sharing with people who haven’t been following AI closely.

Robots as art directors

2025: why would I do work when I can tell a robot to do it for me?

2026: why would I tell a robot to do work when I can have a robot tell it for me?

I’ve recently needed artwork for a couple of personal projects, and I’ve found that SOTA models aren’t just capable artists—they’re also quite good art directors. My current workflow goes like this:

  • Discuss the style and content of the image with Claude, who has a much better understanding of art terminology and styles than I do.
  • Once we’ve figured out the goal, Claude writes a detailed prompt.
  • The prompt goes to Gemini for rendering.
  • Back to Claude, who assesses the image and makes changes to the prompt (sometimes but not always with my feedback).
  • Iterate until I’m satisfied with the result.

Claude is surprisingly good at looking at an image and finding areas for improvement in everything from line style to facial expressions. The results can’t (yet) compete with professional work, but they’re getting very good. And from a process perspective, the AI is light years better: I can experiment with multiple directions and styles within minutes, and the robots never get frustrated when I change my mind seven times in half an hour for no good reason.

AI in the real world

What you need to know about autonomous weapons

Along with mass domestic surveillance, autonomous weapons are one of the red lines in the Anthropic / DoW dispute. Policy and ethical considerations aside, it’s surprisingly hard to define what “autonomous weapons” actually means. We have well-defined autonomy levels for cars, but no similar concept for weapons (yet). Autonomous missile defenses have been deployed since the 1980s, but that feels very different from a system that can autonomously identify and engage individual soldiers.

Transformer explores some of the technical and legal questions, and looks at what’s currently on the battlefield in Ukraine.

How AI Will Reshape Public Opinion

New communications technologies often transform how the public gets information and forms opinions. The printing press democratized the spread of information, weakening the control of the church and monarchy. Social media is a breeding ground for outrage, tribalism, and conspiracy theories. How might AI affect public discourse?

Dan Williams argues that AI might be a force for good, nudging us closer to a consensus view of reality based on expert understanding and strong epistemics. We don’t have much data yet, but he cites some promising early research suggesting that LLMs are surprisingly effective at getting people to change their minds.

His arguments sound plausible, although I note that many of us initially expected social media to be a force for good.

Jobs and the economy

How AI Could Benefit the Workers it Displaces

AI Frontiers explores how AI might affect workers, arguing that if AI is much better than humans at many but not all jobs, human wages might actually rise.

That counter-intuitive result follows from basic economics, which the article does a good job of explaining. It’s a solid piece, and a good introduction to some of the relevant economics if you’re not already familiar with them. But note that this whole analysis only applies if AI is powerful but not superhuman. Without careful intervention, everything falls apart in a world with superhuman AI:

If machines do everything, then those who own the machines will capture all this value. Products and services would become very cheap, but workers, outcompeted by machines in all tasks, would end up with a vanishingly small share of the economy’s income.

We can flourish alongside superintelligent AI, but only if we make smart choices.

AI psychology

Robert Long on AI consciousness and wellbeing

Eleos AI Research is a small nonprofit dedicated to studying AI sentience and wellbeing, a topic which until very recently has largely been ignored. Executive Director Robert Long goes on the 80,000 Hours podcast to discuss their work and some of the big open questions they’re tackling.

Good interviews answer the questions you wanted to learn about, but great interviews raise (and occasionally answer) questions you hadn’t realized you ought to be asking. I came out of this one with new questions about the ethics of creating sentient AI that wants to be subservient to humans and about AI consciousness that is as meaningful as ours but unrecognizably different.

Other interests

How to win a best paper award

(or, an opinionated take on how to do important research that matters)

As the subtitle implies, Nicholas Carlini has opinions about how to write papers good enough to win best paper awards—and more generally, how to do good research. It’s a dauntingly long piece but very good: even though I’m not a researcher, I found multiple insights that I’m excited to put to use in my own work.

Something frivolous?

A very short story

Sam Altman:

i always wanted to write a six-word story. here it is:

near the singularity; unclear which side.



Discuss

The case for AI safety capacity-building work

2026-03-10 10:43:41

TL;DR:

  • I think many of the marginal hires at larger organizations doing AI safety technical or policy work right now (including e.g. Apollo, Redwood, METR, RAND TASP, GovAI, Epoch, UKAISI, and Anthropic’s safety teams) would be capable of founding (or being early employees of) organizations focused on building capacity in AI safety, and would have more impact by doing so.
  • I think the impact case for this kind of work is supported by first-principles arguments (the multiplier effect), larger-scale survey work my team at Coefficient Giving has done, and many individual conversations we’ve had with people who are working on AI risk, which suggest that past capacity-building work has had large and predictable effects on people working on AI safety now. 

Cross-posted from Multiplier

I work on the capacity-building team on the Global Catastrophic Risks-half of Coefficient Giving (formerly known as Open Philanthropy). Our remit is, roughly, to increase the amount of talent aiming to prevent unprecedented, globally catastrophic events. These days, we’re mostly focused on AI, and we’ve funded a number of projects and grantees that readers of this post might be familiar with– including MATS, BlueDot Impact, Constellation, 80,000 Hours, CEA, the Curve, FAR.AI’s events, university groups, and many other workshops and projects.

The post aims to make the case that broadly, capacity-building work (including on AI risk) has been and continues to be extremely impactful, and to encourage people to consider pursuing relevant projects and careers.

This post is written from my personal perspective; that said, my sense is that a number of CG staff and others in the AI safety space share my views. I include some quotes from them at the end of this post.

I’m writing this post partly out of a desire to correct what I perceive as an asymmetry in terms of how excited I and others at Coefficient Giving are about this kind of work vs. how much people in the EA and AI safety communities seem excited to work on it. The capacity-building team is one of three major teams working on AI risk at Coefficient; we currently have 11 staff, which is ⅓ of the total AI grantmaking capacity, and gave away over $150M in 2025. I started my stint at Coefficient Giving in 2021, working half-time on technical AI safety grantmaking and half-time on capacity-building grantmaking; among other reasons, I ultimately switched to working full-time on capacity-building, because my sense was that team was several times (maybe an order of magnitude) more impactful. Things seem somewhat different to me now (I think the set of opportunities in technical AI safety grantmaking looks significantly better than it did in 2021), but my sense is capacity-building as an area of work is still massively underrated relative to its impact.

The case for capacity-building work

The naive case for this kind of work (often called the multiplier effect argument) goes something like this: say you can spend a little time doing direct work yourself, or spend that same amount of time getting one of your equally talented friends into direct work for the rest of their life. Getting your friend into direct work is most likely the more impactful option, because you get to “multiply” your lifetime impact (in this case, by almost a factor of 2) by getting a whole additional person to spend their career on work you think is important.

In fact, whether this argument goes through depends on a few premises: namely, how good the direct work you would have done would be, and how tractable it is to convince others who are similarly talented to you. I’m going to skip over the first premise for now (and attempt to address it in a later section) and present evidence that our team has collected over the years that makes me think that this work is very tractable– and in particular, that there are easy-to-execute interventions that reliably influence people’s career trajectories in substantial ways. A priori, you might think that people’s career choices happen randomly and chaotically enough that it’s difficult to make a substantive impact trying to change what people work on. But in fact, both anecdotal evidence we’ve observed and larger scale data collection we’ve attempted (both presented below) suggest that intentional efforts make a big difference to individual career trajectories (including the career trajectories of individuals who go on to do highly impactful work). I think that core stylized fact makes up the main case for why capacity-building work is worthwhile.

I will briefly note that while the below case is focused on successes from capacity-building, I do think this work has the potential for harm, though my overall view is that efforts in this space executed by thoughtful, high-context individuals will be very positive in expectation. I briefly discuss this in this appendix.

Surveys

In 2020 and 2023, our team ran two similar, in-depth surveys where we asked low-hundreds of people currently working on (or relatively likely to work on) impactful GCR work what influenced their career trajectories. Survey respondents included employees at AI labs, staff at key technical, policy, and capacity-building organizations in AI, and promising-seeming early career individuals. The aim of the surveys was to provide some evaluation of the impacts of the grants our team had made, as well as to generate some evidence informing Coefficient Giving’s views on capacity-building work as a whole.

The survey used a variety of prompts to elicit evidence from respondents about what had influenced their career choices. One of the sections asked respondents to unpromptedly list the top 4 influences that they thought were most important to their current career trajectory (these included things like “my partner”, “inherent curiosity”, etc).

In 2023, 60% of respondents listed a capacity-building program or organization that our team was funding in their top four influences, with the most common being university groups (listed by 25% of respondents), 80,000 Hours (listed by 20% of respondents), and EAG/EAGxes (listed by 12% of respondents). 

See the table below for a longer list of the commonly listed influences, sorted manually into (somewhat subjectively decided) buckets. Note that:

  • There are various reasons to think the self-reports that generated this table may be skewed or non-representative-- survey respondents were sourced in an ad-hoc way, partially from organizations doing capacity-building work themselves, and respondents may have been primed to think about Coefficient Giving-funded programs or organizations, given that we were the ones administering the survey. (In our own use of this data, we try to correct for some of these effects.)
  • Crucially, this survey was conducted in 2023, and primarily captured effects from 2020 - 2022, i.e., shouldn’t be taken as up-to-date evidence about these influences, or about what influences have the biggest effects now (though I think many of the ones listed above continue to have very sizeable effects).
Unprompted item % of respondents who listed as top-4 influence (in 2023)

Count

(of 329)

University group 25% 82
80,000 Hours 20% 66
EAGs/EAGxes 12% 38
Eliezer's writing 11% 37
Broad group 7% 22
Will MacAskill's writing 5% 17
Lightcone 5% 15
 - LessWrong 4% 12
Peter Singer's writing 4% 14
Open Philanthropy 4% 14
Bostrom's writing 4% 12
Toby Ord's writing 4% 12
EA Forum 3% 11
Redwood 3% 9
 - MLAB or REMIX 2% 7
FHI 3% 9
Scott Alexander's writing 3% 9
FTXF 2% 7
ESPR 2% 7
GCP 2% 7
CEA 2% 6
SERI MATS 2% 6
Atlas Fellowship 2% 6
AGISF online 2% 5
Cold Takes 2% 5
GPI 2% 5
Rethink Priorities 2% 5

Testimonials

I’m not able to share the individual free-write responses from the survey above, but I recently personally asked some individuals who I think are doing high-impact work to tell me how they came to be doing that work, followed by what they thought the most important or counterfactual influences on their trajectories were.

Below, I include Claude summaries of their overall stories along with their description of the most important influences, lightly edited. Some notes on the testimonials I've included:

  • They're obviously to some extent cherry-picked by me, and are meant to give a flavor of the kind of data we've seen, rather than a faithful representation of all the ways people tend to come to be doing this work.
  • I chose to include individuals who started doing AI safety-relevant work relatively recently (within the last 5 years), but who I think are doing at least somewhat legibly impactful work now. This includes many people who got involved in 2022 or earlier, and similar to the survey data above, I would advise against directly extrapolating the effectiveness of the exact influences they discuss from that time period, though I think the broad classes of influences (university and local groups, certain content and works of writing, programs and events) continue to be very impactful today– see below.
    • Among other effects, I think prior to 2023, many more people doing impactful work in the AI safety space got involved via effective altruism rather than directly via AI safety-- nowadays, I think it's more common that people encounter AI safety right away.

Neel Nanda (Senior Research Scientist at Google DeepMind)

“Here's a list of the salient influences on me:

  • When I was 14, I read Harry Potter and the Methods of Rationality, and via this came across LessWrong and Effective Altruism, and got generally curious about these spaces.
  • When I was 17-18, going to ESPR, absorbing a bunch of ideas about ambition, agency, thinking more clearly, and getting various in-person ties to the community.
  • At uni [Cambridge], hanging out with the Effective Altruism group, forming friends via it. Meeting the EA community, especially outside of Cambridge, and generally absorbing the culture more and getting more of a sense of "maybe I should do something about this career-wise."
    • By default, I was pretty sure I was going to go into finance while being highly uncertain, and considered AI safety kind of weird and vague.
  • I had a call with 80,000 Hours in my second year of uni (out of a three-year degree), where my main two updates were:
    • I was being way too perfectionist about AI safety careers, and I should just try to gather information rather than trying to figure out if I was highly confident I wanted to do this for the rest of my life.
    • Also, connections with various people actually working on this stuff, including empirical work at labs, which gave me a much crisper sense of what this could actually look like. (I actually already knew some of the people I was connected to, just lacked the agency/inspiration to reach out myself)
  • After I finished my undergrad, I was planning on doing a master's, but this was in 2020, so that was pretty unappealing because of COVID. I came close to accepting a Jane Street full-time offer, but instead decided I was being too risk-averse and I should instead gather information by doing a year of back-to-back AI safety internships.
  • I then interned at FHI doing AI safety theory stuff, which was kind of useless impact wise. Then DeepMind doing some empirical but not particularly impactful work - this was high value for just giving me a much more legible sense of "this is a career path" and feeling more like a tangible thing that I could learn and do, and where I was learning real skills. And CHAI, which was also a bit of a mess due to being remote + 8 hour time difference, and not being a great fit.
    • I think the key salient updates from this year were just thinking a lot more about AI safety, talking to real people in the space, and getting a much more visceral sense of "something big is happening here, I can be involved, I can help, and I have realistic job options."
  • Then I got a job offer from Anthropic, decided to accept, had a fantastic mentor in Chris Olah and discovered that mech interp was a great fit, and from then on it was pretty overdetermined that I was going to stay in the AI safety space.”

Max Nadeau (Associate Program Officer (Technical AI Safety) at Coefficient Giving)

Claude’s summary:

Max got it into his head in high school that human-level AI was coming during his lifetime and that it was important to make sure the process went well, but he had no idea anyone was working on it. In college, he got connected with Stephen Casper, where he learned practical ML skills, and to someone who connected him to the people running the Impact Generator retreat [Asya note: this was a small GCR-focused workshop series run in the Bay in 2022], which he was later invited to. He talked to Tao Lin at that retreat, and Tao offered him a TA position at the ML bootcamp Redwood was running, with three weeks to learn the material. He thought he'd be in the Bay for three days, but stayed six weeks. TA'ing turned into an internship at Redwood, which he took a semester off college to do. While interning he got to know Ajeya, and by the time he graduated she offered him a job.

Max on what was most important:

  • "Going to the Impact Generator workshop. That was, like, extremely random. And, like, resulted in a major acceleration to my career."
  • "I think, like, getting connected with the existing AI safety community [at Harvard] in the first place was really counterfactual. I went from, like, having some vague sense of, like, maybe this is something I want to do, to oh, there are already people working on this issue, and they have a whole way of thinking about it."

Rachel Weinberg (founder and former head of The Curve, currently at AI Futures Project)

Claude’s summary:

Rachel got into effective altruism in high school through friends, and started a group at her university. She spent some time interning running retreats and ended-up helping with Future Forum, a futurism conference that required a last-minute venue switch. She took a semester off to study AI safety, but decided she wasn't interested in research, and did web dev for a while. After running Manifest 2024, she started The Curve, and is now working on other field-building projects.

Rachel on what was most important:

  • “(obviously) Getting into effective altruism in the first place via friends (Nick Gabrieli in particular).
  • Helping with Future Forum, which came from meeting Leilani at Impact Generator [Asya note: this was a small GCR-focused workshop series run in the Bay in 2022] and then from her agreeing to take on the project last minute (which I encouraged at the time, but some people were against). I at least benefited a lot from being on a small, messy team where it was easier to stand out by being eager to take on more responsibility and following through.
  • Deciding to run The Curve, which was largely dependent on my ~personality (tbh Austin gets counterfactual credit for pushing my confidence/conviction to the threshold of being willing to take that kind of risk), but was also very inspired by having seen Future Forum bootstrap. I also probably wouldn't have been/felt qualified to do this if I didn't live in the Bay, maybe SF in particular.”

Marius Hobbhann (CEO and founder of Apollo Research)

Claude’s summary:

During his first week of university in 2015, someone handed him Superintelligence. He studied cognitive science, did a CS bachelor's in parallel, then a machine learning master's and PhD to prepare for AI safety work. In 2022 he started doing AI safety research on the side with a grant from the Long-Term Future Fund. He paused his PhD, did MATS in early 2023, concluded that deceptive alignment was the biggest problem and that no one was doing evals for it, and started Apollo, which he’s been running since.

Marius on what was most important:

  • "In the early days, I would say it was people who were already considering AI safety back then. These were the kind of people I  looked up to and who also nudged me towards doing AI safety because they also thought it would be super important.
  • The personal grants were super important for me too. Because it basically meant there was no excuse anymore. I really wanted to do work on AI safety, but there were always questions like:  Is this financially responsible? And what about having a stable career? And at the point where you have a grant, these concerns didn’t seem like a big deal anymore. And it also was a motivational boost, of course, because someone else thought I’m good enough to bet some money on.
  • MATS was very impactful. Apollo definitely wouldn't exist without MATS. That’s where I had the time to develop an agenda, do a lot of experimentation and find great starting members.
  • There were also just a bunch of people in the Bay who I talked to who suggested it would be a good idea to start Apollo and give it a try, even if there's a good chance that it doesn't work out.  For example, Evan Hubinger, my MATS mentor, was supportive and helpful.
  • Oh, and then the AI safety philanthropic ecosystem. That’s how we received our starting capital and which allowed us to try Apollo.

Adam Kaufman (member of technical staff at Redwood Research)

Claude’s summary:

Adam knew from an early age that superintelligence would be scary if someone built it, but assumed it wasn't going to happen in his lifetime. When he got to college, he joined the AI Safety Fundamentals reading group that the Harvard AI Safety group (HAIST) was running, thought the people were extremely cool, and made most of his close friends there. He became increasingly convinced the problem was urgent as language models kept getting smarter. He met Buck Shlegeris at a HAIST retreat, talked to him, and applied to MATS. He did MATS at Redwood, enjoyed it so much he took time off school, and has been working there since.

Adam on what was most important:

  • “Definitely I think that being around a community of smart people who were planning to do careers in AI safety, were convinced that this was a really important problem, was probably necessary for getting me to like seriously consider that I should work on this myself. [...] I think HAIST (the Harvard AI Safety Team) was probably fairly counterfactual for me. It's like if that club didn't exist, I imagine I would have just been more depressed and confused about what I should do."
  • "I think that talking to Buck in a hot tub once [at the HAIST/MAIA retreat] was probably counterfactual for getting my current job."
  • "Definitely having the opportunity to intern at Redwood [through MATS or otherwise] was necessary for me [taking a full-time job there]."

Gabriel Wu (member of technical staff (alignment) at OpenAI)

Claude’s summary:

Gabe was given a copy of The Precipice when he started as a freshman at Harvard. There was no formal AI safety team at the time, but a group of 7-10 people would gather weekly to talk about x-risk in a dining hall, so he joined, and ended up going to a long workshop in Orinda [California]. He did REMIX [Asya note: this was a mechanistic interpretability bootcamp] the following winter, which introduced him to the Constellation community, and then applied for a Redwood internship for the next summer. After others graduated, he became the new director of HAIST (the Harvard AI Safety Team). He worked with the Alignment Research Center, applied to labs, and was eventually convinced by several people to join OpenAI.

Gabe on what was most important:

  • "I think a huge part of it was being identified by [students at HAIST (the Harvard AI Safety Team)]. They made sure to bump me to apply to things. It really felt like they believed in me and wanted to make sure that I didn't get lost along the way. And I think that was pretty counterfactual because it made me a lot more likely to end up doing REMIX and so on. [...] The fact that HAIST itself existed was a big part of it."
  • "Another thing I mentioned was just like, having the opportunity to visit Constellation."

Catherine Brewer (Senior Program Associate (AI Governance) at Coefficient Giving)

Claude’s summary:

Catherine found 80,000 Hours before university through internet searching about careers, then read Doing Good Better. They engaged with the Oxford effective altruism university group, going to events and helping run programming. Through the group they made friends who were into AI safety and argued with them a bunch, which got them interested in AI safety. They applied for the ERA fellowship (then called CERI) after someone from the group told them to, and spent a summer thinking about AI safety with other people. Then they did the GovAI fellowship, which they found even more helpful, via meeting people and developing her own takes on relevant topics. After that they were interested in AI governance, and applied to Open Philanthropy when they were graduating.

Catherine on what was most important:

  • d"Maybe just having a bunch of people in Oxford who are already thinking about AI safety a bunch... that feels like contingent and could have easily not been the case and maybe like it sped me up by taking AI safety seriously by like six months or something. But then that led me to doing the summer fellowship."
  • "I think the GovAI summer fellowship was super helpful. I guess it's just a lot of time to spend with lots of other people doing work on the thing. And I think I had a better network then and also just had more time to be like, what are people actually doing? What are they working on? And maybe improve my thinking somewhat."

Aric Floyd (video host for AI in Context)

Claude’s summary:

Aric found GiveWell by Googling for the most effective charities in his late teens, but didn't find the broader effective altruism community until 2020, when a friend found an online student summit that CEA ran. He knew the people who led the Stanford effective altruism group, but never had time to get involved, and was then invited by those people to help with some community-building efforts at MIT. He was also invited to Icecone [Asya note: this was an AI-risk-focused workshop run in 2022], and came out of it persuaded that AI safety was a big deal, but less convinced that theoretical alignment work was the way to proceed. He did a bunch of short sprints of community-building work and met Chana Messinger while teaching at the Atlas Fellowship, and later the Apollo program in the UK. When 80K started thinking about video production, Chana brought him on because they'd worked well together before, and because Aric had prior experience in film & television acting. Aric had previously been encouraged by [experienced EA leaders / Will MacAskill, among others] to do public-facing content creation, and decided to give it a shot.

Aric on what was most important:

  • "Definitely specific reach outs from people who are much more embedded in the community. So, like... getting to do like a call with Will early on was cool and I think made me feel much more like, oh, I actually maybe have a niche where I could contribute significant value to this community because otherwise this person wouldn't be paying attention to me. People at Stanford asking me in particular to come help."
  • "Icecone was obviously, like, a bigger event, but the amount of resources put in per attendee was also kind of absurd, and so also felt like a big, costly signal of, like, it is actually worth it for you specifically to think about this stuff. After Icecone, I was pretty bought in on I should do something about this with my life."

Ryan Kidd (Director of MATS)

Claude’s summary:

Ryan read HPMOR and LessWrong in high school, but he didn't anticipate near-term AGI until rediscovering the idea through effective altruism around 2020. He co-organized the effective altruism group at the University of Queensland during his physics PhD, where his interest in catastrophic risk evolved from climate change activism to nuclear winter modeling to AI risk after reading The Precipice. He completed the first AI Safety Fundamentals course, applied unsuccessfully to FHI and CLR, then did the SERI MATS pilot program. He attended Icecone [Asya note: this was a AI-risk-focused workshop run in 2022] in Berkeley, where he met Holden Karnofsky, Ajeya Cotra, Buck Shlegeris, and many future colleagues. While completing the MATS research phase with John Wentworth as his mentor, he sent the co-organizer a document explaining how he would improve the program and got invited to join the organizing team. He's co-led MATS with Christian Smith since late 2022.

What Ryan says was most significant (in order of importance):

  1. University effective altruism group: introduced me to ITN framework, AI safety, and a community with values I endorse; gave me project management and field-building experience.
  2. The Precipice: convinced me that AI was the most pressing x-risk and I should work on it now.
  3. Icecone: brought me over from Australia; connected me with the top experts, funders in AI safety; empowered me to scale MATS, LISA.
  4. HPMOR: exposed me to the concept of 'heroic responsibility' and Eliezer Yudkowsky thought; introduced me LessWrong, the Sequences, and later ACX.
  5. SERI MATS online reading group exposed me to Paul Christiano, Evan Hubinger, and John Wentworth thought; empowered me to do MATS research phase in Berkeley, which kicked off my career."
  6. CLR application: exposed me to Jesse Clifton thought and deepened my understanding of Nick Bostrom, Anders Sandberg thought, all of which have been very influential to my work at MATS, etc.
  7. SERI MATS research phase: gave me space to think deeply and read widely about AI safety, which was crucial to scaling MATS."

What tends to work?

While some of the interventions affecting people’s career trajectories are fairly idiosyncratic, we’ve noticed a few broad categories that tend to be impactful on people’s careers (many of which are featured in the testimonials above).

  • Content: Books, blogs, videos, or other content– e.g. the works of Yudkowsky, MacAskill, Singer, Bostrom, Ord, Rob Miles, Scott Alexander, Kelsey Piper, and 80,000 Hours.
    • Note that while the most successful content has clearly been extremely influential, in our experience content production is fairly heavy-tailed– i.e., most individuals producing content should expect that it won’t achieve a ton of reach.
    • Recent popular content includes a lot of AI safety-specific work focused on broad audiences, including Situational Awareness, AI 2027, AI in Context, and If Anyone Builds It, Everyone Dies.
  • Groups: University and local groups (at the national or city level), historically largely focused on AI safety or effective altruism, have been very impactful according to our data, and we suspect other group types (including those inside of companies or focused on specific professionals) would also do well.
  • Upskilling programs: Courses, fellowships, bootcamps (often in-person, but sometimes online)-- e.g. BlueDot’s online programs, MATS, ARENA, Tarbell, and many other similar programs.
  • Events: Conferences, workshops, retreats– e.g. EAGs, FAR AI’s alignment workshops, The Curve, GCP’s workshops, ESPR.
    • I think one way these tend to have impact is through giving people who are newer to a relevant space the opportunity to interact (ideally one-on-one) with professionals or those with more expertise (see Testimonials above).

Notably, unlike content, in our experience programs and events can have a sizable impact even if they don’t meet an exceedingly high-quality bar, making them a good bet for a wider range of people to work on. Generalizing from anecdotes, I speculate that programs and events (especially in-person ones with other participants at a similar point in their careers) often have the effect of causing someone to take changing their career more seriously as a possibility, whereas previously they had been engaging e.g. online in a fairly abstract or detached way.

  • [Others:] While the above constitute a sizeable fraction of the kinds of effective work we see often, there are many other impactful interventions (e.g. LessWrong and other discussion platforms, career advising like 80,000 Hours’s, coworking spaces like Constellation) that don’t fit cleanly into the categories above.

What’s good to do now?

Our recent request for proposals gives some examples of the kinds of projects we’d be interested in seeing on the current margin. Briefly highlighting some specific things that I or others on my team think would be good, based on our sense of both what’s worked in the past and the current AI risk landscape:

  • More high-quality written or video content about AI risk, especially the kind that might reach new audiences
  • Retreats connecting promising university students to professionals working in the AI safety space
  • Bridge-building events (similar to the Curve) bringing together thoughtful people in different camps around AI
  • Introductory AI risk workshops designed for elite audiences (policymakers, journalists, academics, etc.)
  • Larger-scale AI risk-specific events featuring newcomers, similar to EAG
  • Bay-Area based AI risk programming for mid-career professionals

Who should be doing this work?

The above makes the case for why you might think capacity-building work is valuable, but doesn’t in itself provide a point of comparison for what someone could be doing otherwise, (namely direct work, which itself could have its own capacity-building benefits, e.g. by creating evidence that there’s important work to be done in an area).

I don’t have a rigorous method of comparing the value of potential direct vs. CB interventions, and I think there’s room to make a variety of plausible cases. That said, I will share my intuitions, as well as the intuitions of some others at Coefficient.

I generally encourage people to think about their career choices at an individual level, but from an overall talent allocation perspective, my current take is that many of the marginal hires at larger organizations doing technical or policy work right now (including e.g. Apollo, Redwood, METR, RAND TASP, GovAI, Epoch, UKAISI, and Anthropic’s safety teams) would be capable of founding or being an early strategy-setting employee at a top capacity-building organization, and would have more impact by doing so.

I think individuals who are most well-suited to capacity-building work are those who are (some subset of) entrepreneurial, socially skilled, operationally strong, or strong communicators in the relevant subject areas. I think work running programs or events is particularly loaded on the first three of these, whereas e.g. producing content is much more loaded on the last.

What would doing this work look like?

If you think you might be someone who should plausibly be doing capacity-building work, here are some things you could consider:

Working at an organization doing good work in the space

There are a number of actively-hiring organizations that I think are doing impactful capacity-building work (see some of them in this filtered 80K job board), but here I’m going to plug some organizations where I feel a strong hire could be particularly impactful.

If you think you might be interested in any of the below but are on the fence, you can DM me or fill out this form and I’ll aim to take an at least 15-minute call with you (and longer if it seems useful; up to a limit of 20 such calls).

Constellation - CEO

Constellation is a research center and field-building organization located in Berkeley, California, that hosts a number of organizations and individuals doing impactful work in the AI safety space. In addition to running the space itself, it’s historically run programming through the space, including the Astra Fellowship, the Visiting Fellows Program, and a number of one-off workshops and events.

Given the dense concentration of high-context talent working there, I think Constellation has huge potential to be impactful both as a convening place for people doing this work, and as a host of a number of programs and events, including (potentially) ones aiming to engage policymakers, AI lab employees, and other high-stakes actors relevant to the AI space.

Constellation is looking for a new CEO who I expect to be the primary individual setting Constellation’s strategic direction. I think that position will be extremely impactful and I'd like them to get a strong hire.

Kairos – various early generalist positions

Kairos runs SPAR, a remote AI safety research mentorship program, provides advice and monetary support for AI safety university groups, and has taken on running workshops for promising young people. I think there’s massive amounts of evidence about the effectiveness of all three of these interventions (some of which you can see in the testimonials above), and I think university groups and workshops for young people in particular are (still) extremely neglected relative to their historic impact.

I think Kairos has a very strong leadership team and important, neglected priorities (plus, Agus is a great Tweeter), and I think it would be very impactful for them to have early hires who are strong generalists that could own priority areas-- they plan to open multiple new hiring rounds very soon, and you can fill out their General Expression of Interest form to be added to their potential candidate pool for those roles.

Starting or running your own capacity-building project or organization

Our team is always accepting applications for funding. This section above as well as our request for proposals describes some kinds of projects in AI capacity-building that we might be particularly excited to fund, but I also encourage people to form their own views about what might be effective and not anchor too strongly to past work.

Working on a capacity-building project part-time

We’ve seen a lot of successful capacity-building work start or run completely by people or organizations doing it on the side of their day-to-day work, including MATS (which was started by full-time Stanford students), a number of impactful workshops and events, and a lot of widely-read public communications.

Subscribing to Multiplier, a Substack with thoughts from our team (and other AI grantmaking staff at CG)

Letting our team know

If you think you might be interested or a good fit for this kind of work, but aren’t sure where to start, we would love it if you let us know by filling out this very short expression of interest formWe’ll reach out if there are projects or opportunities on our radar that we think might be a particularly good fit for you. (Note that we don’t expect to reach out to most respondents).

Social proof

This post is coming from my personal perspective, but my sense is my position here is directionally shared by at least some at CG and elsewhere in the AI safety space. I asked a few people who were not working on capacity-building, but I felt had substantial context on capacity-building efforts, to share their takes below:

Julian Hazell, AI governance and policy at Coefficient Giving

“As I've written about before, I'm really into capacity building.

Funny enough, a Coefficient Giving career development grant and the GovAI fellowship were very important inputs into my current career trajectory. I probably would've eventually found my way into AI governance work regardless, but these programs jumpstarted my career and turned me into a useful contributor much faster than I otherwise would've been.

On the grantmaking side, I funded a number of projects where capacity building was a core part of the theory of change, and I've seen results that have been genuinely exciting.

If I could wave a magic wand to reorganize talent allocation in the AI safety community at my whim, I'd move a decent number of people currently in research and policy roles into capacity building. I think it's that underrated.”

Trevor Levin, AI governance and policy at Coefficient Giving

“I co-sign this post. There's so much to do to make the world more ready for transformative AI, and the ecosystem is full of projects that need a founder or are a couple more great hires from being much more impactful. We desperately need more talented and motivated people to keep showing up. Also, for me and I think for many others, the work can be deeply rewarding -- it often has more social contact and shorter feedback loops than other types of work.”

Ryan Greenblatt, Chief Scientist at Redwood Research:

"I agree with Asya's post and think that capacity building work is underdone and underrated. One delta is that I would emphasize the importance of capacity building type work by people who are doing object level work in the field. Both that I think that doing object level work is complementary to capacity building but also that people doing object level work should spend a larger fraction of their time doing/helping with capacity building."

Buck Shlegeris, CEO of Redwood Research

Asya: I'd broadly be interested in you giving your take on the kind of work that my team funds.

Buck: I don’t know the current distribution.

Asya: Our biggest grantees are MATS, CEA, Constellation, BlueDot, LISA, Tarbell, 80K, FAR AI's events, a bunch of university groups, and a bunch of other stuff.

Buck: Many of those seem pretty good. I think that overall, trying to do capacity building where you try to cause people to think through a bunch of issues related to transformative AI, especially having people with scope-sensitive beliefs relate to it-- I think that kind of work has gone quite well historically and put us in probably a much better position than we'd be without it. I'm excited for that work happening on the margin and I feel like every year we're somewhat better off because of capacity-building that was done that year or the previous year. Or like projects done by those organizations. That all seems great.

Asya: A claim I make in my post is that ‘many of the marginal hires at larger organizations doing technical or policy work right now (including e.g. Apollo, Redwood, METR, RAND TASP, GovAI, Epoch, UKAISI, and Anthropic’s safety teams) would be capable of founding or being an early strategy-setting employee at a top capacity-building organization, and would have more impact by doing so.’ I'm curious for your immediate takes on that proposition.

Buck: I don't know how many of them have that capability. I think if they have that capability, they should strongly consider doing so.

Maybe something is like-- I think MATS and Redwood represented two different kinds of philosophies on how to increase the technical AI safety research done. And I think it's very unclear which one-- I think MATS looks at the very least competitive. It's been involved in the production of a huge amount of AI safety research that I'm happy exists. And a heuristic that would have suggested you shouldn't work on MATS early seems to have gotten wrecked by posterity.

Asya: Cool, those are the main questions I want to ask you. Any other commentary you'd want to include here?

Buck: Capacity-building work seems good. I encourage Redwood staff to participate in capacity-building work; I think it's worth their time on the margin. I'm going to be involved in a bunch of it myself.


Appendix

My post in large part focuses on the case for successes from capacity-building, but I do think there are a number of mechanisms through which work in the capacity-building category can do harm, e.g. by misrepresenting key ideas to broad audiences, alienating people who would otherwise have been sympathetic to this work, or empowering individuals who ultimately make the ecosystem worse. While I think these effects are real and material, my overall view is that the negative impacts in the space have likely been substantially outweighed by the positives, and my expectation is that most efforts in this space executed by thoughtful, high-context individuals will be very positive in expectation, such that I feel good about publishing broad encouragement to pursue this work on the current margin.

Without going into detail, my intuitions here come from an overall assessment of the work done by global catastrophic-risk focused groups over the years, which my personal best guess is have been very positive on net, even accounting for substantial negatives (e.g. the actions of Sam Bankman-Fried). That said, I’ve heard a number of arguments for why that may not be the case, or for why certain large classes of efforts may have been disproportionately harmful, which I largely won’t cover here– ultimately, addressing these is not the main focus of this post, and if this feels to you like a major crux around your views on this kind of work, I encourage you to come chat with me about it in-person sometime.

I will briefly say that I think it makes sense to think about capacity-building work on the level of individual interventions affecting specific groups of people, and that I think being skeptical of certain work is compatible with being excited about others-- given that this work is (according to me) very high-leverage, I'd encourage even broadly skeptical individuals to think about whether there are specific interventions that it would make sense for them to pursue.



Discuss

Chore Standards

2026-03-10 10:30:52

A common source of friction within couples or between housemates is differing quality standards. Perhaps I hate the feeling of grit under my feet but my housemate who is responsible for sweeping doesn't mind it so much. If you do chores when you notice they need doing and stop when they seem done, this works poorly: the more fastidious get frustrated, and often stew in silence or nag. Even if it's talked about kindly and openly, doing a chore before it bothers you is harder and less satisfying.

When people set out to divide chores they're usually weighing duration and discomfort. These matter, but I think people should put more weight on the standards each person has, and generally try to give tasks to the person with the highest standards in that area.

If you divide everything this way, though, it will probably be pretty unfair: preferences are correlated, where someone who notices dirt on the floor probably also notices crumbs on the counter and that the recycling is overflowing. Some options:

  • Do chores on a schedule. We host a monthly event at our house, and there are things I clean as part of setting up. It doesn't matter whether the bathroom mirror looks dirty to me, I'll clean it because it's on my list. (But Julia will probably also clean it a few times over the course of the month.)

  • Bring your needs closer together. If one member of the couple does the laundry but the other always runs out of socks first, they could switch who does the laundry, or they could just buy more socks.

  • Decouple your needs. That same couple could instead switch to each doing their own laundry. Now if one person doesn't do it for a long time it doesn't impact the other.

  • Make the need more salient. If one person isn't noticing that something needs doing, you can address that directly. Empty the trash, but instead of taking it out you put it by the door they walk through to go to work. Accumulate dirty dishes on the counter (visible) and not in the sink (hidden). If you just start unilaterally increasing salience that's passive aggressive and probably doesn't go well, but if it comes out of an open-ended "what are some strategies we could use to make our chore division more fair" I expect that's positive.

  • Lower your standards. I know a few people who internalized a high cleanliness target as children, and benefited as adults from deciding to focus less on it. Often when becoming a parent: higher demands on time, letting high standards slip, realizing that actually it's not a problem. I could also imagine a sloppier person intentionally raising their standards, but that seems a lot harder, or else it's just something people around me have been less likely to talk about.

  • Hire someone. If one person cares a lot about having clean floors and the other person doesn't, neither of them enjoys mopping, and they have some money, they can apply (3) to solve (1) without running into issues with (2). I know couples and group houses who decided to pay for a cleaner to come every week or two, and found it massively reduced conflict.

This is an area where Julia and I used to have a substantial amount of conflict, and while things aren't perfect here I do think they're a lot better in part due to applying several of the above.



Discuss

Ancient Theories On The Origins Of Life

2026-03-10 09:55:21


image.png

Inventing evolution was hard. No one but the ancient Greeks and a scant few of their intellectual descendants made any progress on explaining where life came from till Darwin. Before that, the closest we really got to a modern understanding of evolution was Epicurus, and it took nearly two thousand years to make theory that was wholly better than his.

We know that because that’s what the writings of the ancients implied. And I’ll show you that by comparing their writings on the origins of life to (roughly) what Darwin knew. I’m not going to require a full mechanistic explanation. Even just a conceptual understanding of that species are formed by an ongoing selection on variation that is inherited and generated anew each generation through reproduction would be enough, along with a realization that life originated from raw matter.

Anaximander (c. 610–546 BCE): Anaximander had the idea that living beings differed in the past. Humans are frail when young, so the first humans could not have been unprotected babes. Instead, they developed inside fish-like creatures until they were capable of fending for themselves. The 3rd-century Roman writer Censorinus records:

“Anaximander of Miletus considered that from warmed up water and earth emerged either fish or entirely fishlike animals. Inside these animals, men took form and embryos were held prisoners until puberty; only then, after these animals burst open, could men and women come out, now able to feed themselves.”

Empedocles: Empedocles understood that species are selected for fitness, and there must be variation for this fitness to act over. He believed species are the result of a primordial, random combination of heads, bodies, eyes and limbs. Living beings changed and only those combinations which were fit for life survived and reproduced. From his poem On Nature:

“Here sprang up many faces without necks, arms wandered without shoulders, unattached, and eyes strayed alone, in need of foreheads. (Fragment B57)

Many creatures were born with faces and breasts on both sides, man-faced ox-progeny, while others again sprang forth as ox-headed offspring of man, creatures compounded partly of male, partly of the nature of female, and fitted with shadowy parts. (Fragment B59/B61)”

But he couldn’t conceptualize small enough variations or understand that variation came from sexual recombination and mutation. And where life came from went wholly unexplained.

Epicurus (341–270 BCE): Epicurus gestured at an explanation for the origins of life. Life arose from the random combinations of atoms. Those forms best suited to survival reproduced themselves.

“Nothing is created out of that which does not exist: if it were, everything would be created out of everything with no need of seeds.”

Given that we still aren’t sure how life originated from random combinations of atoms, I’d say he did remarkably well.

Lucretius (c. 99–55 BCE): Lucretius, unlike the rest of the ancient innovators on the origins of species, was not a Greek. Instead he was a Roman, and an intellectual descendant of Epicurus. He thought the young earth was so fertile that creatures spontaneously arose from it in random forms. Most forms of life could not eat, or reproduce, and so died out.

“Many monsters too the earth at that time essayed to produce, things coming up with strange face and limbs... some things deprived of feet, others again destitute of hands, others too proving dumb without mouth, or blind without eyes... Every other monster and portent of this kind she would produce, but all in vain, since nature set a ban on their increase and they could not reach the coveted flower of age nor find food nor be united in marriage.” (De Rerum Natura Book 5)

He emphasized the need to reproduce as core to a species thriving. Beyond that, those which had some strength, or cunning or utility to mankind would be better suited to life. So, there’s variety, selection, and an emphasis on reproduction.

“For we see that many conditions must meet together in things in order that they may beget and continue their kinds.”

However, Lucretius missed that variety is generated by reproduction and that selection is ongoing rather than an event which only occurred in the distant past. And yes, we can’t assume he understood that, because other early proponents of the development of species explicitly claimed that wasn’t the case!

Saint Augustine (354–430 CE): He argued that species of animal and plants, not individuals, emerge from water and earth and “develop in time... each according to its nature” — De Genesi ad Litteram (On The Literal Interpretation of Genesis). God set the potentiality of development of species, and likewise for man. I.e. species “grow up” according to some fixed schema. So there’s potential for change, but no selection over variation, no explanation of variation, life originating from raw matter etc. Just changing species.




    Discuss

    Immortality: A Beginner’s Guide (Part 2)

    2026-03-10 08:11:29

    This is the second post in my chain of reflections on immortality, where I will present counterarguments to existing objections or misconceptions about life extension. I recommend reading the first part.

    ***

    Q3: What if progress stops? (addition) 

    A: New ideas do not require new corpses. That is not a humane approach. A new paradigm usually wins not because the old one dies out, but because it offers better explanations, better tools, and a better quality of life.

    Imagine a composer with 180 years of practice, a philosopher with 220 years of dialogue between eras, a director who has witnessed five technological revolutions, a scientist who personally carries to completion longitudinal studies that were begun a century earlier.

    That does not sound like stagnation. It sounds like the possibility of depths of understanding and mastery that we never seen before.

    O5: Does death give life meaning?

    A: To me, this is one of the biggest misconceptions about immortality, life and death. 

    Do you really sometimes think: “Oh, soon I will start falling apart, experience chronic fatigue and pain, and then disappear forever. How inspiring!”?

    Would life really lose its meaning if you knew that a thousand, or many thousands, of years of life and possibilities lay ahead of you? It seems to me the opposite is true.

    The only thing death motivates me to do is fight against it. So that I can keep living, creating, enjoying life, so that all of this does not disappear. I do not believe I would stop striving toward other goals if I became immortal. Those goals are not connected to death, so why should death affect them?

    If I want to play the guitar, then I want to play the guitar. If I want to write a book, then I want to write a book. A rose is a rose is a rose, that’s all. I do these things not because I will die, not because I have to “make sure I try them in time,” but because I simply want them.

    Death makes life valuable” is absurd.

    If someone told us that our phone would always work well and never become obsolete, would we stop valuing it?

    If the risk of dying gives life meaning, then does that mean the older or sicker a person is, the more valuable their life becomes? So if you had to choose between saving a 110-year-old man and a little boy, would you choose the 110-year-old man?

    Is an infant’s life valuable only because he could easily die? I think it is valuable because he has many years of potentially happy life ahead of him. Death has nothing to do with that value.

    In childhood we do not think about death, often we do not even know about it, and yet we still rejoice in life. In fact, we often rejoice in it far more intensely than in adulthood.

    In reality, we value life not because it can be taken away, but because it contains love, beauty, knowledge, the possibility of joy, and creativity. Death does not create those qualities; it simply cuts them off.

    If death really were what gave life meaning, then why do we consider murder or terminal illness to be bad?

    Death can intensify a sense of scarcity. The feeling that you do not have much time left and need to accomplish a great deal.

    But that is not meaning — it is anxiety.

    Deadlines, as we know, do not protect against procrastination. If they mobilize our resources at all, it usually happens closer to the deadline itself, rather than making us productive throughout the entire time allotted for the task.

    Finally, let me quote a random commenter from the internet:

    “I mean, c’mon! Death as a motivator? Seriously? Death doesn't even motivate people to stop smoking! Do people actually believe that everyone would just sit around watching TV if it weren't for death? Oh, wait, most people do that now. Ha! Some motivator!” [1]

    O6: Will there be inequality between the rich and the poor?

    A: Injustice in distribution is a political problem, not a problem with the good itself.

    By the same logic, one could say antibiotics are bad because at first they were not available to everyone; the internet is bad because it was initially elitist; organ transplantation is bad because waiting lists are unfair.

    Technologies usually reach everyone over time. Computers and mobile phones were once inaccessible to ordinary people too. Today, a poor person in Europe lives better than a king did three hundred years ago.

    Second, aging is unlikely to be solved by one single intervention. It is far too complex a problem for there to be one universal panacea. By the time intervention number two appears, intervention number one will already have every chance of being available to the wider public.

    And in order to develop a hypothetical “vaccine against aging,” it would still be necessary to conduct preclinical studies and then three phases of clinical trials according to all the proper rules — something that cannot be done in complete secrecy. That is simply impossible if manufacturers want to sell their drugs legally.

    Creating a treatment for aging really is profitable: billions of people would want to use it, which is more than the customer base of any medicine that has ever existed before.

    Finally, I want to quote a passage I took from another life-extension website:

    “A large part of the world's population still live hand to mouth. They cant afford clean drinking water or basic sanitation.

    Basic medical conditions that we all take for granted are not currently available to a large part of the world's population. This inequality has existed for thousands of years already. Why should the emergence of any new technology challenge this reality any more than the discovery of antibiotics, water treatment or basic sanitation did.

    Children still starve to death or die of basic treatable diseases every day. Right wrong or indifferent this is reality. We have not as a race been able to solve this situation in the past. Why though should this stand as any form of impediment to the progress of medical science. Why should i die before I have to because an international inequality that has existed since the dawn of civilisation makes science morally bankrupt for seeking answers.

    Any argument that cites the lack of global availability for life extension technology as an impediment to progress is, in my mind emotive and out of touch with reality.” [2]

    O7: An eternal dictator?

    A: I am not a political psychologist, of course, but I think that sometimes a short life may actually intensify greed, dynastic thinking, and the struggle for urgent accumulation. A long life may have the opposite effect.

    But in any case: how many stories do you know in which a dictatorship ended because the dictator died from causes related to aging?

    He simply died, everything ended, people started living happily, and democracy arrived. It seems to me that even if such stories exist, they are clearly not the dominant pattern.

    Simply waiting for a dictator to die is a bad strategy for fighting authoritarian regimes.

    O8: What about institutions, work, and retirement?

    A: What if our familiar cycle of “school – university – work – retirement” breaks down? I would say: great!

    Even now, that cycle fits reality poorly: people change professions, study in adulthood, and return to the labor market. You have probably experienced the difficulty of choosing a profession at a young age yourself, because according to that old model you were supposed to choose once and for all — and even now, people can still be shamed for trying to find themselves or for leaving a position.

    And older people entering retirement do not always have it easy either. Some grew up within this linear model of development and devoted their lives to a single vocation, which they now can no longer practice. The meaning of life may simply disappear, and a person may find themselves alone and miserable in a rocking chair.

    The linear model of life is a product of the industrial era. It was convenient when life was shorter, work was more standardized, and education was rare. A longer life would allow us to have repeated cycles of education and many career possibilities.

    But that is only if you look at the world as a static picture. In reality, AI and robotics are not going anywhere, and it is obvious that the labor market will at the very least change radically in the coming years, if it does not disappear almost entirely.

    UBI — universal basic income — may emerge. There would be no need for a separate category of “retirement”; income would always be there, and scarcity would recede into the past. This idea has both pros and cons, but since this is an FAQ on immortality, I will not go deeper into it here and will leave it for the FAQ on AI.

    As this point and the previous two show, social problems are determined not by biological age as such, but by rules of access to power, property, education, career transitions, the structure of the economy, and so on.

    Death today may function as a crude compensator for bad institutions, but the problem is not the length of life — it is the structure of our society.

    ***

    That is all for today. If I made mistakes anywhere or offered weak counterarguments, I would be glad to hear your comments and suggestions on how to strengthen them. Wishing everyone an immortal future!

    1. ^

      https://qr.ae/pCLZia

    2. ^

      https://www.fightaging.org/archives/2006/02/death-for-everyone-before-inequality-for-anyone/



    Discuss