MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

Political Violence Is Never Acceptable

2026-04-13 23:20:56

Nor is the threat or implication of violence. Period. Ever. No exceptions.

It is completely unacceptable. I condemn it in the strongest possible terms.

It is immoral, and also it is ineffective. It would be immoral even if it were effective. Nothing hurts your cause more.

Do not do this, and do not tolerate anyone who does.

The reason I need to say this now is that there has been at least one attempt at violence, and potentially two in quick succession, against OpenAI CEO Sam Altman.

My sympathies go out to him and I hope he is doing as okay as one could hope for.

Awful Events Amid Scary Times

Max Zeff: NEW: A suspect was arrested on Friday morning for allegedly throwing a Molotov cocktail at OpenAI CEO Sam Altman’s home. A person matching the suspect’s description was later seen making threats outside of OpenAI’s corporate HQ.

Nathan Calvin: This is beyond disturbing and awful. Whatever disagreements you have with Sam or OpenAI, this cannot be normalized or justified in any way. Everyone deserves to be able to be safe with their families at home. I feel ill and hope beyond hope this does not become a pattern.

Sam Altman wrote up his experience of the first attack here.

After that, there was a second incident.

Jonah Owen Lamb: OpenAI CEO Sam Altman’s home appears to have been the target of a second attack Sunday morning, a mere two days after a 20-year-old man allegedly threw a Molotov cocktail at the property, The Standard has learned.

The San Francisco Police Department announced (opens in new tab) the arrest of two suspects, Amanda Tom, 25, and Muhamad Tarik Hussein, 23, who were booked for negligent discharge.

Stephen Sorace Fox News (Fox News): An OpenAI spokesperson told Fox News Digital Monday morning that the incident was unrelated and had no connection to Altman, adding that there was no indication that Altman’s home was being targeted.

We have no idea what motivated the second incident, or even if it was targeted at Altman. I won’t comment further on the second incident until we know more.

Nor is this confined to those who are worried about AI, the flip side is alas there too:

Gary Marcus: One investor today called for violence against me. Another lied about me, in a pretty deep and fundamental way. They are feeling the heat.

It also is not confined to the AI issue at all.

As Santi Ruiz notes, there has been a large rise in the salience of potential political violence and violence against public figures in the past few years, across the board.

That holds true for violence and threats against both Republicans and Democrats.

This requires a non-AI explanation.

Things still mostly don’t spiral into violence, the vast vast majority of even deeply angry people don’t do violence, but the rare thing is now somewhat less rare. A few years ago I would have been able to say most people definitively oppose such violence, but polls indicate this is no longer true for large portions of the public. This is terrifying.

Indeed, the scariest reaction known so far has been a comments section on Instagram (click only if you must), a place as distinct from AI and AI safety spaces of all kinds as one can get. This is The Public, as in the general public, for reasons completely unrelated to any concerns about existential risk, basically cheering this on and encouraging what would become the second attack. It seems eerily similar to the reaction of many to the assassination of the CEO of United Healthcare.

The stakes of AI are existential. As in, it is likely that all humans will die. All value in the universe may be permanently lost. Others will be driven to desperation from loss of jobs or other concerns, both real and not. The situation is only going to get more tense, and keeping things peaceful is going to require more work over time. It will be increasingly difficult to both properly convey the situation and how dire it is, and avoid encouraging threats of violence, and even actual attempts at violence.

Then on the other side are those who see untold wonders within their grasp.

This goes hand in hand with what Altman calls the ‘Shakespearean drama’ going on inside OpenAI, and between the major labs.

Most Of Those Worried About AI Do As Well As One Can On This

The vast majority of major voices in Notkilleveryonism, those worried that we might all die from AI, have been and continue to be doing exactly the right thing here, and have over many years consistently warned against and condemned all violence other than that required by the state’s enforcement of the law.

Almost all of those who are worried about AI existential risk are very much passing this test, and making their positions against violence exceedingly clear, pushing back very hard against any and all extralegal violence and extralegal threats of violence.

Demands for impossible standards here are common, where someone who did not cause the problem is attacked for not condemning the thing sufficiently loudly, or in exactly the right away. This is a common political and especially culture war tactic.

Perhaps the worst argument of all is ‘you told people never to commit or threaten violence because it is ineffective, without explicitly also saying it was immoral, therefore you would totally do it if you thought it would work, you evil person.’

They will even say ‘oh you said it was immoral, and also you said it wouldn’t work, but you didn’t explicitly say you would still condemn it even if it would work, checkmate.’

The implicit standard here, that you must explicitly note that you would act a certain way purely for what someone thinks are the right reasons or else you are guilty of doing the thing, is completely crazy, as you can see in any other context. It is the AI version of saying ‘would you still love me if I was a worm?’ and getting mad that you had to ask the question to get reassurance, as opposed to being told unprompted.

The reason why people often focus on ‘it won’t work’ is because this is the non-obvious part of the equation. With notably rare exceptions, we all agree it is immoral.

Andy Masley offers thoughts, calling for caution when describing particular people. He draws a parallel to how people talk about abortion. Here is Nate Soares at length.

This is Eliezer Yudkowsky’s latest answer on violence in general, one of many over the years trying to make similar points.

Some Who Are Worried About AI Need To Address Their Rhetoric

Almost all and vast majority are different from all.

There are notably rare exceptions, where people are at least flirting with the line, and one of these has some association to this attempt at violence, and a link to another past incident of worry about potential violence. Luckily no one has been hurt.

Speaking the truth as you see it is not a full free pass on this, nor does condemning violence unless it is clear to all that you mean it. There are some characterizations and rhetorical choices that do not explicitly call for violence, but that bring far more heat than light, and carry far more risk than they bring in benefits.

Everyone involved needs to cut that right out.

In particular, I consider the following things that need to be cut right out, and I urge everyone to do so, even if you think that the statements involved are accurate:

  1. Calling people ‘murderers’ or ‘evil.’
  2. Especially calling them ‘mass murderer’ or ‘child murderer.’
  3. Various forms of ‘what did you expect.’
  4. Various forms of the labs ‘brought this on themselves.’
  5. Saying such violence is the ‘inevitable result’ of the labs ‘not being stopped.’

You can and should get your point across without using such words.

Also, no matter what words you are using, continuously yelling venom at those you disagree with, or telling those people they must be acting in bad faith and to curry lab favor, especially those like Dean Ball and even myself, or anyone and everyone who associates with or praises any of the AI labs at all, does not convince those people, does not convince most observers and does not help your cause.

Note, of course, that mainstream politicians, including prominent members of both parties, very often violate the above five rules on a wide variety of topics that are mostly not about AI. They, also, need to cut that right out, with of course an exception for people who are (e.g.) literally murderers as a matter of law.

Also: There are not zero times and places to say that someone does not believe the things they are saying, including telling that person to their face or in their replies. I will do that sometimes. But the bar for evidence gathered before doing this needs to be very high.

Please, everyone, accept that:

  1. Those who say they are worried that AI will kill everyone are, with no exceptions I know about, sincerely worried AI will kill everyone.
    1. Even if you think their arguments and reasons are stupid or motivated.
  2. Those who say they are not worried AI will kill everyone are, most of the time, not so worried that AI will kill everyone.
    1. Even if you think their arguments and reasons are stupid or motivated.
  3. A bunch of people have, in good faith, concerns and opinions you disagree with.

(Dean Ball there also notes the use of the term ‘traitor.’ That one is… complicated, but yes I have made a deliberate choice to avoid it and encourage others to also do so. It is also a good example of how so many in politics, on all sides, often use such rhetoric.)

My current understanding is the first suspect was a participant of the PauseAI (Global) discord server, posting 34 messages none of which were explicit calls to violence. He was not a formal part of the organization, and participated in no formal campaigns.

We do not know how much of this is the rhetoric being used by PauseAI or others reflecting on this person, versus how much is that this is him being drawn to the server.

PauseAI has indeed unequivocally condemned this attack, which is good, and I believe those involved sincerely oppose violence and find it unacceptable, which is also good.

I think they still need to take this issue and the potential consequences of its choices on rhetoric more seriously than they have so far. Its statement here includes saying that PauseAI ‘is that peaceful path’ and avoiding extreme situations like this is exactly why we need a thriving pause movement. This is an example of the style of talk that risks inflaming the situation further without much to gain.

There is one thing that they are clearly correct about: You are not responsible for the actions of everyone who has posted on your public discord server.

I would add: This also applies to anyone who has repeated your slogans or shares your policy preferences, and it does not even mean you casually contributed at all to this person’s actions. We don’t know.

For the second attack, for now, we know actual nothing about the motivation.

But yes, if you find your rhetoric getting echoed by those who choose violence, that is a wake up call to take a hard look at your messaging strategy and whether you are doing enough to prevent such incidents, and avoid contributing to them.

Similarly, I think this statement from StopAI’s Guido Reichstadter was quite bad.

Speak The Truth Even If Your Voice Trembles

If one warns that some things are over the line or unwise to say, as I did above, one should also note what things you think are importantly not over that line.

Some rhetoric that I think is entirely acceptable and appropriate to use, if and only if you believe the statements you are making, include, as examples:

  1. Gambling with humanity’s future.’
  2. ‘If [X] then [Y]’ if your conditional probability is very high (e.g. >90%), or of stating your probability estimate of [Y] given [X], including in the form of a p(doom).
  3. Calling Mythos or something else a ‘warning shot.’
  4. Calling Mythos or other similarly advanced AIs a ‘weapon of mass destruction.’
  5. Most of all: To call all the act of creating minds more powerful than humans an existential threat to humanity. It obviously is one.

If you believe that If Anyone Builds It, Everyone Dies, then you should say that if anyone builds it, then everyone dies. Not moral blame. Cause and effect. Note that this is importantly different from ‘anyone who is trying to build it is a mass murderer.’

I could be convinced that I am wrong about one or more of these particular phrases. I am open to argument. But these seem very clear to me, to the point where someone challenging them should be presumed to either be in bad faith or be de facto acting from the assumption that the entire idea that creating new more powerful minds is risky is sufficiently Obvious Nonsense that the arguments are invalid.

Here is a document about how Pause AI views the situation surrounding Mythos. It lays out what they think are the key points and the important big picture narrative. It is a useful document. Do I agree with every interpretation and argument here? I very much do not. Indeed, I could use this document as a jumping off point to explain some key perspective and world model differences I have with Pause AI.

I consider the above an excellent portrayal of their good faith position on these questions, and on first reading I had no objection to any of the rhetoric.

False Accusations And False Attacks Are Also Unacceptable

There has been quite a lot of quite awful rhetoric in the other direction, both in general and in response to this situation. We should also call this out for what it is.

There are those who would use such incidents as opportunities to impose censorship, and tell people that they cannot speak the truth. They equate straightforward descriptions of the situation with dangerous calls for violence, or even attack any and all critics of AI as dangerous.

At least one person called for an end to ‘non-expert activism’ citing potential violence.

We have seen threats, taunting, deliberate misinterpretation, outright invention of statements and other bad faith towards some worried about AI, often including Eliezer Yudkowsky, including accusing people of threatening violence on the theory that of course if you believed we were all going to die you would threaten or use violence, despite the repeated clear statements to the contrary, and the obvious fact that such violence would both be immoral and ineffective.

This happened quite a bit around Eliezer’s op-ed in Time in particular, usually in highly bad faith, and this continues even now, equating calls for government to enforce rules to threats of violence, and there are a number of other past cases with similar sets of facts.

At other times, those in favor of AI accelerationism have engaged in threats of and calls for violence against those who oppose AI, on the theory that AI can cure disease, thus anyone who does anything to delay it is a murderer. The rhetoric is the same all around.

Some Examples Of Attempts To Create Broad Censorship

This is from someone at the White House, trying to equate talking about logical consequences with incitement to violence. This is a call to simply not discuss the fact that if anyone builds superintelligence, we believe that it is probable that everyone will die.

I that kind of attack completely unacceptable even from the public, and especially so from a senior official.

One asks what would happen if we applied even a far more generous version of this standard to many prominent people, including for example Elon Musk, or other people I will decline to name because I don’t need to.

Here is the Platonic form:

Shoshana Weissmann, Sloth Committee Chair: This is insane behavior. And those promoting the idea of AI ending humanity are contributing to this. It has to stop.

As in, you need to stop promoting the idea of AI ending humanity. Never mind how you present it, or whether or not your statement is true. No argument is offered on whether it is true.

This is the generalization of the position:

florence: It would appear that, according to many, one of the following are true:

1. It is a priori impossible for a new technology to be an existential threat.
2. If a new technology is an existential threat, you’re not allowed to say that.

Indeed, one of the arguments people often literally use is, and this is not a strawman:

  1. You straightforwardly say sufficiently advanced AI might kill everyone.
  2. But if someone did believe that, they might support using violence.
  3. Therefore you can’t say that, or we should be able to use violence against you.

While I don’t generally try to equate different actions, I will absolutely equate implicit calls for violence in one direction to other implicit calls for violence or throwing your political enemies in jail for crimes they obviously are not responsible for, indeed for the use of free speech, in the other direction, such as this by spor or Marc Andreessen.

Nate Soares (MIRI): “even talking about the extinction-level threat is incitement towards violence”

No. High stakes don’t transform bad strategies into good ones. Let’s all counter that misapprehension wherever we find it.

michael vassar: This is probably my number one complaint about the current culture. The false dichotomy between ‘not a big deal, ignore’ and ‘crisis, panic, centralize power and remove accountability’.

That’s the same thing or worse, especially in this particular case, where the accusation is essentially ‘you want government to pass and enforce a law, we don’t like that, therefore we want the government to arrest you.’

There is also the version, which I would not equate the same way, where #3 is merely something like ‘so therefore you have a moral responsibility to not say this so plainly.’ For sufficiently mid versions, as I discuss above, one can talk price.

A variation is when someone, often an accelerationist, will say:

  1. These people claim to be worried about AI killing everyone.
  2. But you keep condemning violence.
  3. Therefore, you must not care about these supposed beliefs.

Or here’s the way some of them worded it:

bone: Nice to see all the LessWrong people fold completely on their philosophy. Very good for humanity. They have no beliefs worth dying or killing for. It’s nonsense from a guy who never had the balls to stand up for his words once push came to shove.

Yudkowsky stands for nothing.

bone: remember: if they actually believe all this stuff and they are unwilling to be violent, it means they are cowards, that they refuse to measure up to their own words, that they will not do what they believe needs to be done to save mankind.

they are weak, they believe in nothing

Zy: AI doomers are like “attacking key researchers in the AI race is an ineffective strategy to prevent AI doom which pales in comparison to my strategy of paying them $200 a month to fund capabilities research”

Lewis: Rare Teno L. If you actually think Sam Altman is going to genocide children then it makes sense to try to hurt him. So you need to pick one. It’s either completely insane or it’s totally sensible. Which one is it?

L3 Tweet Engineer (replying to Holly Elmore): If you’re such a good person, and stopping AI is so important, why don’t you go bomb a data center? Why waste your breath tweeting about this stuff and writing grand narratives, go make it happen.

phrygian: You’ve already talked about how it would be moral to nuke other countries to stop asi. The only logical reasons you have for not engaging in smaller forms of violence to stop ASI is that they aren’t as effective. On a fundamental level, your views justify violence of any kind.

Ra: maybe this is just me and explains some things about me, but *personally* i would much rather be seen as a potentially dangerous radical than as a feckless and insincere grifter, especially if i believed the world was ending soon and was personally responsible for stopping that.

Trey Goff: Look do you people not realize how silly you look

“AI is going to literally kill your children and all future humans, but we strongly condemn any violence committed in order to stop that from happening”

Have the courage of your convictions or STFU

The Platonic version of this is the classic: ‘If you believed that, why wouldn’t you do [thing that makes no sense]?’

The trap or plan is clear. Either you support violence, and so you are horrible and must be stopped, or you don’t, in which case you can be ignored. The unworried mind cannot fathom, in remarkably many cases, the idea that one can want to do only moral things, or only effective things, and that the stakes being higher doesn’t change that.

Teortaxes: Uncritical support
this is a bad faith attempt to elicit a desirable mistake
essentially a false flag by proxy of stupidity
I think decels are holding up well btw

Eliezer started a thread to illustrate people using such tactics, from which I pulled the above examples, but there are many more.

João Camargo (replying to a very normal post by Andy Masley): No one believes you actually think this. If you think that Altman and other pivotal AI leaders/researchers will likely bring human extinction, assassinations are clearly justified. “This guy is gonna cause human extinction, but no one must prevent him by force” is not coherent.

Other times, they simply make fun of Eliezer’s hat.

Or they just lie.

taoki: i assume eliezer yudkowsky and his pause ai friends love this?

Oliver Habryka: False, they definitely hate it.

taoki (May 6, 2024): also, i LIE like ALL THE TIME. JUST FOR FUN.

Or they flat out assert ‘oh you people totally believe in violence and all the statements otherwise are just PR.’

Another tactic of those trying to shut down mention of the truth of our situation is to attack both any attempt to put a probability on existential risk, and also anyone who (in a way I disagree with, but view as reasonable) treats existential risk as high likely if we build superintelligence soon on known principles, including dismissing any approach that takes any of it seriously as not serious, or that it is ‘using probability as a weapon’ to point out that the probability of everyone dying if we stay the current course is uncomfortably and unacceptably high.

I close this section by turning it over to Tenobrus:

Tenobrus: “stochastic terrorism” is, quite frankly, complete fucking bullshit. it’s a unfalsifiable term used to try to tie your political opponents speech to actions that have fucking nothing to do with them, attempting to weaponize tragedy and mental illness for debate points. it was bullshit when AOC tried to accuse the republicans of “stochastic terrorism” for criticizing her, it was bullshit when the right claimed the left was committing “stochastic terrorism” for engaging in anti-ICE protests, and it remains bullshit now when you assign responsibility for attacks against sam altman to AI safety advocates and journalists who wrote negative things about him.

fuck your garbage rhetorical device! that’s not how responsibility or blame works! you do not get to suppress any and all speech you disagree with and can find a way to vaguely deem “dangerous”!

Kitten: “who will rid me of this troublesome priest”

Tenobrus: yeah that’s an entirely different thing. that’s not “stochastic terrorism” dawg that’s just straightforward incitement of violence.

Grant us the wisdom to know the difference.

The Most Irresponsible Reaction Was From The Press

I really do not understand how you can be this stupid. I realize that yes, you could still get this information if you wanted it, but my lord this is nuts from the SF Standard.

The San Francisco Standard: Just in: Sam Altman’s home appears to have been the target of a second attack Sunday morning, a mere two days after a 20-year-old man allegedly threw a Molotov cocktail at the property: Jonah Owen Lamb.

spor: printed his home address and even added a picture of the exterior, for good measure… in an article about how his home is being targeted by psychos that want to kill him !!!

this reporter, their editor, and the entire Standard should be ashamed of this

Mckay Wrigley: this is absolutely disgusting and anyone involved in the publishing of this has absolutely zero morals.

Sam Altman Reacts

Sam Altman has my deepest sympathies in all of this. This must be terrifying. No one should have a Molotov Cocktail thrown at their house, let alone face two attacks in a week. I hope he is doing as well as one can when faced with something like this, and that he is staying safe.

I have no idea how I would respond to such a thing if it happened to me.

Sam Altman’s public reaction was to post this statement.

I very much appreciate that Sam Altman has explicitly said that he regrets the word choice in the passage below. ‘Tough day’ is absolutely a valid excuse here, and most of the statement is better than one can reasonably expect in such circumstances given Altman’s other public statements on all things AI.

But I do need to note that this importantly missed the mark and the unfortunate implication requires pushback.

Sam Altman (CEO OpenAI): Words have power too. There was an incendiary article about me a few days ago. Someone said to me yesterday they thought it was coming at a time of great anxiety about AI and that it made things more dangerous for me. I brushed it aside.

Now I am awake in the middle of the night and pissed, and thinking that I have underestimated the power of words and narratives. This seems like as good of a time as any to address a few things.

The article in question, presumably the piece in The New Yorker I discussed at length last week, was an extremely long, detailed and as far as I could tell fair and accurate retelling of the facts and history around Sam Altman and OpenAI. To the extent it was incindiacy, the facts are incendiary.

Those who are not Sam Altman do not get the same grace, when they say things like this in reference to that article:

Kelly Sims: It turns out when you string a bunch of quarter-truths together exclusively from someone’s bitter competitors it has consequences.

Given what we know about who attacked Altman, and various details, I find it unlikely that the timing of these two events was meaningful for the first attack. My guess is the trigger to someone already ready to blow was anxiety around Mythos, but even if it that article was the triggering event, it was not an example of irresponsible rhetoric.

For the second attack, unfortunately, we should worry that it was triggered in large part by coverage of the first attack, including publishing details about Altman’s home.

Sam Altman Reflects

The rest of the post is personal reflections and predictions about AI overall, so I’m going to respond to it the way I would any other week.

Sam Altman (CEO OpenAI): [AI] will not all go well. The fear and anxiety about AI is justified; we are in the process of witnessing the largest change to society in a long time, and perhaps ever. We have to get safety right, which is not just about aligning a model—we urgently need a society-wide response to be resilient to new threats. This includes things like new policy to help navigate through a difficult economic transition in order to get to a much better future.

AI has to be democratized. … I do not think it is right that a few AI labs would make the most consequential decisions about the shape of our future.

Adaptability is critical. We are all learning about something new very quickly; some of our beliefs will be right and some will be wrong, and sometimes we will need to change our mind quickly as the technology develops and society evolves. No one understands the impacts of superintelligence yet, but they will be immense.

Altman is essentially agreeing with his most severe critics, that he should not be allowed to develop and deploy superintelligence on its own. He tries to have it both ways, where he says things like this and also tries to avoid any form of meaningful democratic control when time comes to pass laws or regulations.

His call for adaptability is closely related to the idea of building the ability to control development and deployment of AI, and having the ability to pause in various ways, should we find that to be necessary.

His disagreement is that he thinks we collectively should want him to proceed. Which might or might not be either the decision we make, or a wise decision, or a fatal one.

He mentions that it ‘will not all go well’ but this framing rejects by omission the idea that there is existential risk in the room, and it might go badly in ways where we cannot recover. To me, that makes this cheap talk and an irresponsible statement.

The second section is personal reflections.

He believes OpenAI is delivering on their mission. I would say that it is not, as their mission was not to create AGI. The mission was to ensure AGI goes safety, and OpenAI is not doing that. Nor is Anthropic or anyone else, for the most part, so this is not only about OpenAI.

He calls himself conflict-averse, which seems difficult to believe, although if it is locally true to the point of telling people whatever they want to hear then this could perhaps explain a lot. I was happy to hear him admit he handled the situation with the previous board, in particular, badly in a way that led to a huge mess, which is as much admission as we were ever going to get.

His third section is broad thoughts.

​My personal takeaway from the last several years, and take on why there has been so much Shakespearean drama between the companies in our field, comes down to this: “Once you see AGI you can’t unsee it.” It has a real “ring of power” dynamic to it, and makes people do crazy things. I don’t mean that AGI is the ring itself, but instead the totalizing philosophy of “being the one to control AGI”.

We can all agree that we do not want any one person to be in control of superintelligence (ASI/AGI), or any small group to have such control. The obvious response to that is ‘democracy’ and to share and diffuse ASI, which is where he comes down here. But that too has its fatal problems, at least in its default form.

If you give everyone access to superintelligence, even if we solve all our technical and alignment problems, and find a way to implement this democratic process, then everyone is owned by their own superintelligence, in fully unleashed form, lest they fall behind and lose out, or is convinced of this by the superintelligence, and we quickly become irrelevant. Humanity is disempowered, and likely soon dead.

Thus if you indeed want to do better you have to do Secret Third Thing, at least to some extent. And we don’t know what the Secret Third Thing is, yet we push ahead.

He concludes like this:

Sam Altman (CEO OpenAI): A lot of the criticism of our industry comes from sincere concern about the incredibly high stakes of this technology. This is quite valid, and we welcome good-faith criticism and debate. I empathize with anti-technology sentiments and clearly technology isn’t always good for everyone. But overall, I believe technological progress can make the future unbelievably good, for your family and mine.

While we have that debate, we should de-escalate the rhetoric and tactics and try to have fewer explosions in fewer homes, figuratively and literally.

It is easy to agree with that, and certainly we want fewer explosions. But it is easy for calls to ‘de-escalate’ to effectively become calls to disregard the downside risks that matter, or to not tackle seriously with the coming technical difficulties, dilemmas and value clashes, or to shut down criticism and calls to action of all kinds.

Violence Is Never The Answer

Once again: I condemn these attacks, and any and all such violence against anyone, in the strongest possible terms. I do this both because it is immoral, and also because it is illegal, and also because it wouldn’t work. Nothing hurts your cause more.

My sympathies go out to Sam Altman at this time, and I hope he comes through okay.

Most people worried about AI killing everyone have handled this situation well, both before and after it happened, and not only take strong stances against violence but also use appropriate language, at a standard vastly higher than that of any of:

  1. Those who are worried about those worried about AI killing everyone.
  2. Those who are worried about mundane AI concerns like data centers or job loss.
  3. Politicians and ordinary citizens of both major American political parties, and the media, on a wide variety of issues.

I call upon all three of those groups of people to do way better across the board. Over a several year timeline, I predict that most concern about AI-concern-related violence will have nothing to do with concerns about existential risk.

But there are a small number of those worried about AI existential risks who have gone over where I see the line, as discussed above, and I urge those people to cut it right out. I have laid out my concerns on that above. We should point out what actions have what consequences, and urge that we choose better actions with better consequences, without having to call anyone murderers or evil.

Eliezer has an extensive response on the question of violence on Twitter, Only Law Can Prevent Extinction, that echoes points he has made many times, in two posts.

I also condemn those who would use who use this situation as an opportunity to call for censorship, to misrepresent people’s statements and viewpoints, and generally to blame and discredit people for the crime of pointing out that the world is rapidly entering existential danger. That, too is completely unacceptable, especially when it rises to its own incitements to violence, which happens remarkably often if you hold them to the standards they themselves assert.

 

 

 

 



Discuss

AI Safety's Biggest Talent Gap Isn't Researchers. It's Generalists.

2026-04-13 22:47:25

This post was cross posted to the EA Forum

TL;DR: One of the largest talent gaps in AI safety is competent generalists: program managers, fieldbuilders, operators, org leaders, chiefs of staff, founders. Ambitious, competent junior people could develop the skills to fill these roles, but there are no good pathways for them to gain skills, experience, and credentials. Instead, they're incentivized to pursue legible technical and policy fellowships and then become full-time researchers, even if that’s not a good fit for their skills. The ecosystem needs to make generalist careers more legible and accessible.

Kairos and Constellation are announcing the Generator Residency as a first step. Apply here by April 27.

Epistemic status: Fairly confident, based on 2 years running AI safety talent programs, direct hiring experience, and conversations with ~30 senior org leaders across the ecosystem in the past 6 months.

The problem

Over the past few years, AI safety has moved from niche concern toward a more mainstream issue, driven by pieces like Situational Awareness, AI 2027, If Anyone Builds It, Everyone Dies, and the rapidly increasing capabilities of the models themselves.

During this period, over 20 research fellowships have launched, collectively training thousands of fellows, with 2,000-2,500 fellows anticipated this year alone[1]. The talent situation for strong technical and policy researchers is far from solved, but meaningful progress has been made.

The story for non-research talent is very different. By our count, there are roughly 7 fellowships for non-research talent (producing around 300 fellows this year[2]), spread thin across an array of role types. As a result, many critical functions within AI safety remain acutely talent-constrained.

More broadly, the ecosystem has a lot of people who are great at thinking about ideas. We need more people who are great at thinking about people and projects. Read more about this here.

The consistent feedback we hear from senior people across the ecosystem is that the hardest roles to fill are not research roles. They are:

  • Generalists: operators, executors, fieldbuilders, people and program managers, grantmakers, recruiters. People who can ideate, manage, and execute a broad range of non-research projects.
  • Founders, both technical and non-technical, for new research and non-research organizations.
  • Communications professionals who can work on policy and research comms.
  • Chief-of-Staff types who can support senior leaders and multiply their impact.
  • Senior operational people with domain expertise in areas like cybersecurity, policy, or large-scale project management.

Based on our experience and anecdotes from organizations in our networks[3], many organizations trying to hire find that research postings attract dozens of qualified applicants, while non-research postings often surface only 0-5 applicants who meet the core requirements (strong mission alignment, meaningful AI safety context, and general competence) despite receiving hundreds of applications.

Why the pipeline is broken

The fellowship landscape is massively skewed toward research. 

Around 20 research fellowships together produce 2,000-2,500 fellows per year. For fieldbuilding, the current options are essentially Pathfinder (where the vast majority of fellows still intend to pursue research careers) and a few dedicated fieldbuilding spots at Astra. These produce an estimated 5-10 fieldbuilding generalists hired per year. This asymmetry signals that the primary route into full-time AI safety work runs through research. And, while research is a core part of safety, it is also necessary to find and develop people who can manage research projects, run organizations, and implement and communicate research ideas.

There is no clear career ladder for generalists. 

A research-oriented person has a well-worn trajectory: BlueDot → ARENA → SPAR → MATS → junior researcher → senior researcher. And while this path isn't perfect, nothing comparable exists for generalists. The typical route involves running a strong university group, then hope you get hired directly at a fieldbuilding org, with no intermediate steps or clear progression path afterwards. The risk discourages people who might otherwise be excellent generalists from committing to the path.

There is no credentialing or proving ground. 

Unlike research, where fellowship participation provides a track record and hiring signal, aspiring generalists have no equivalent way to demonstrate competence. Organizations won't hire untested junior talent for critical operational roles, but there's nowhere for junior talent to get tested[4].

There is no routing infrastructure. 

Matching people to opportunities happens through ad hoc referrals and personal networks. This doesn't scale, and it means we regularly miss promising candidates. As the field has matured and institutional structure has grown, coordination overhead and established networks make it harder for aspiring generalists to self-start projects and stand out the way that was possible a few years ago. 

Why this matters now

We believe that there are now more good policy and technical ideas ready for implementation than there is coordination ability and political will to implement them in governments and AI companies. On the margin, we think we're receiving smaller returns from additional researchers entering the field, especially outside the top 10% of research talent. It’s also plausible that AI safety research will be automated more quickly during takeoff than most other types of work.

Many expect the funding landscape for AI safety will expand significantly over the next two to three years, which makes this bottleneck more urgent. More capital will be available, but without the people to deploy it effectively, that capital will stay inert. This already appears to be a bottleneck for current grantmakers, and it could get much worse.

Naively, we expect the world to get a lot weirder as capabilities progress. In a world where the demands on the AI safety ecosystem rapidly increase and evolve, training people with strong thinking, agency, and executional abilities, rather than narrow technical skills, seems highly leveraged.

This is particularly important because it enables us to diversify our bets and cover a large surface of opportunities for impact. There’s no shortage of project ideas for growing the field of AI safety, scaling up our policy efforts, or communicating to the public, but we simply don’t have enough talent to plan, design, and execute on all of them. Our bottleneck isn’t funding or ideas, it’s people.

Counter-Arguments

"You said hundreds of people are applying to these roles. Why can't some of them be good fits? Aren't there many people who could fill operations positions?"

We draw a distinction between "hard ops" and "soft ops." Hard ops roles (finance, legal, HR, etc.) benefit from  expertise, and hiring experienced professionals without AI safety context is typically sufficient. Soft ops roles (program management, talent management, generalist positions, etc,) are different. Domain expertise matters less than having strong inside-view models of the field and generalist competency. Succeeding in these roles requires real mission alignment and enough context to spot high-EV opportunities that someone without that background would miss.

"I'm not sure I agree that research talent is less important than generalist talent."

We're deliberately not making a strong comparative claim about the impact of generalists versus technical and policy researchers. What we are saying is that generalist talent is currently the binding constraint. It is harder to source than research talent and, in our models, represents the tighter bottleneck for the ecosystem's ability to convert funding and ideas into impact.

"How important is generalist talent in shorter timelines worlds?"

Our sense is that generalist talent is crucial across all timelines. While shorter timelines do compress the window for upskilling, our experience is that motivated junior people can skill up relatively quickly and help add urgently needed capacity, making the counterfactual value of pipeline-building here quite high even in shorter timeline worlds (sub 3 years).

"You argue there are all these research fellowships and no programs for non-research talent. But couldn't those programs just produce generalists?"

The existing research fellowships are well-optimized and have a strong track record of producing researchers who get placed into AI safety roles. Some fellows have gone on to non-research roles, but anecdotally this is rare. These programs seem to have a much stronger track record of taking talent who are open to different career paths and funneling them toward research, than of producing researchers who are open to different career paths.

"Aren't there a lot of non-research roles currently in AI safety?"

A few hundred people do this work today versus a few thousand researchers. There used to be a steadier stream of talent aiming for these roles, but short-timelines anxiety, the expansion of research programs, and the disappearance of some entry points that used to exist have contracted the pipeline considerably.

The Generator Residency

As a first step toward addressing these problems, Constellation and Kairos are announcing the Generator Residency: a 15-30 person, 3-month program focused on training, upskilling, credentialing, and placing generalists. The program runs June 15 through August 28, 2026 and applications close April 27.

Learn more and apply here

How it works:

Residents will work out of Constellation and receive ideas, resources (funding, office space), and mentorship from successful generalists at organizations like Redwood, METR, AI Futures Project, and FAR.AI.

For the first few weeks, residents will write and refine their own project pitches while meeting the Constellation network and building context in the field. They will then create and execute roughly 3-month projects, individually or in groups, with generous project budgets. Throughout the program, we’ll provide seminars, 1:1s, and other opportunities for residents to deeply understand current technical and policy work, theories of change, and gaps in the ecosystem.

During and after the program, we’ll support residents in finding roles at impactful organizations, spinning their projects into new organizations, or having their projects acquired by existing ones. Selected residents can continue their projects for an additional three months (full-time in-person or part-time remote), with continued stipend, office access, and housing.

We hope to place a majority of job-seeking residents into full-time roles at impactful organizations within 12 months of the program ending.

Examples of projects we’d be excited about hosting include:

  • Workshops and conferences: Run a domain-specific conference like ControlConf or the AI Security Forum, or one that brings new talent into AI safety like GCP, targeting high-leverage new audiences or emerging subfields.
  • AI comms fellowship: Design and manage a short fellowship for skilled communicators to produce AI safety content. Draft a curriculum, identify mentors, secure funding, and prepare a pilot cohort.
  • Recruiting pipelines: Partner with 2-3 small AI safety orgs to build the systems they need to scale: work tests, candidate sourcing, referral pipelines.
  • Travel grants program: Fund visits to AI safety hubs like LISA and Constellation by promising students and professionals. Set criteria, build an application flow, line up partner referrals, and run a pilot round.
  • Shared compute fund: Scope a fund to cover compute needs of independent safety researchers, model whether a cluster is needed, and distribute a pilot round of grants.
  • Strategic awareness tools: Scale AI-powered superforecasting and scenario planning in safety infrastructure, build support among impactful stakeholders, and run a pilot.
  • AI policy career pipeline: Build workshops, practitioner talks, and handoffs into policy career programs to route talent toward the institutions shaping policy.
  1. ^

    This estimate draws on a separate analysis that projected the number of fellows using both publicly and privately available information, as well as extrapolations from actual data through late 2024. The fellowships included in this analysis were: AI Safety Camp, Algoverse (AI Safety Research Fellowship), Apart Fellowship, Astra Fellowship, Anthropic Fellows Program, CBAI (Summer/Winter Research Fellowship), GovAI (Summer/Winter Research Fellowship), CLR Summer Research Fellowship, ERA, FIG, IAPS AI Policy Fellowship, LASR Labs, PIBBSS, Pivotal, MARS, MATS, SPAR, XLab Summer Research Fellowship, MIRI Fellowship, and Dovetail Fellowship.

  2. ^

    The programs included in this analysis were: Tarbell (AI Journalism), Catalyze Impact Incubator (AI Safety Entrepreneurship), Seldon Lab (AI Resilience Entrepreneurship), Horizon Institute for Public Service Fellowship (US AI Policy/Politics), Talos Fellowship (EU AI Policy/Politics), Frame Fellowship (AI Communications), and The Pathfinder Fellowship. Fellow counts were derived primarily from publicly available data.

  3. ^

    We're deliberately vague about which organizations we're referring to here since we haven't asked permission to disclose the outcomes of recent hiring rounds. For research roles, we're mainly referring to technical AI safety nonprofits, policy nonprofits, and think tanks. For non-research roles, we're mainly referring to fieldbuilding nonprofits and technical and policy nonprofits that have recently tried hiring non-research talent requiring meaningful AI safety context beyond a BlueDot course.

  4. ^

    Several years ago, aspiring generalists could more easily test their fit by self-starting projects in an ecosystem with minimal infrastructure and ample white space. As the field has grown, more institutional structure exists, and with it, more coordination overhead. The blank slate is gone, and the ecosystem's complexity now deters people without strong inside-view models, reputations, or existing connections from trying ambitious projects. We're not sure this is net negative in most cases, but it does mean fewer people gain the experience needed to position themselves for these roles.



Discuss

Clique, Guild, Cult

2026-04-13 22:32:06

This is the first in a sequence of articles on organizational cultures, inspired largely by my experiences with the LessWrong meetup community.

Clique Guild Cult
Small Medium Any size
Exit Voice Loyalty
Consensus Majority Counsel
Deontology Consequentialism Virtue
  1. "Let's talk this over"
  2. "This isn't working out"
  3. "Point of order, Mr. Chairman"
  4. "Verily I say unto you"
  5. "This isn't what our Founder would've wanted"
  6. "If you don't like it, you can leave"
  7. "Yeah, so if you could go ahead and get that done, that'd be great"
  8. "Carried u-... [nervous glances] ...-nanimously"

Clique

A clique is a small, intimate group of friends who all know each other very well. If you're in a clique, you might not know what kind of culture you're in because there might never have been any significant sources of conflict. But if there are, they will be addressed in one of two ways.

#1. "Let's talk this over"

An egalitarian clique will put great effort into resolving conflicts through interpersonal connections in order to keep the group together. This may involve long hours on the metaphorical therapist's couch - NVC, Authentic Relating, etc. - or perhaps, if two friends have a falling-out, a mutual friend of theirs might try to help smooth things over.

#2. "This isn't working out"

In a more authoritarian clique, people will be quicker to concede that their differences are irreconcilable and that the group (at least in its current form) should break up. However, this is seen by all parties as a fairly benign outcome, since there is not much investment in the clique "as such" (rather, the investment is in the individual 1:1 relationships) and it's not hard to start a new one. There is no sense that somebody needs to be "right" and somebody else "wrong".

Guild

#3. "Point of order, Mr. Chairman"

A guild is a medium-sized group where each member may have a few close connections, but will have a much larger number of "weak ties" that are connected to them only indirectly. However, the group is united (and distinguished from the wider society) by a shared institutional identity that makes it "a thing" and not merely a collection of individuals or cliques. This manifests in the use of bureaucratic procedures to resolve conflicts, since the group is too large to expect unanimity, and entrenched enough that schism is seen as more undesirable than having some disagreement over any particular decision.

In my opinion, the guild has become something of a lost art, which ought to be revived. (Future articles will go into this point further.)

Cult

A cult is a group based on personal authority. This authority derives from the inherent virtue of the leader (charisma, strength, wealth, etc.) and not any notion of popular support. A cult's size can exceed Dunbar's Limit because it is held together not by the members' relationships with each other, but by their loyalty to the leader. However, small- and medium-sized cults can also exist, and are perhaps more common than large cults. (Rare is the person who has what it takes to lead a large cult, but you may find yourself at the center of a small cult quite inadvertently.)

#4. "Verily I say unto you"

What the leader says, goes. Members are expected to subordinate their own will and desires to that of the leader. They may advise the leader one way or another, and may bring their disputes to him/her for resolution, but the leader has the ultimate authority and responsibility for the decision.

However, in addition to this "straightforward" kind of cult, there are also various kinds of dysfunctional cults, which (perhaps) give the rest a bad name.

Fractious cults

#5. "This isn't what our Founder would've wanted"

If a cult loses its leader, and if the leader has not raised up a worthy successor, the group will find itself in an unstable zone where its culture is too egalitarian to persist in its super-Dunbar size, because there was never any hierarchy amongst the rank-and-file, only in relation to the leader. Therefore, the group will decay into a more stable configuration (indicated by the dotted arrows), either by someone gaining sufficient personal virtue to become the new leader, or (more likely) splitting into several cliques or guilds, each of which will claim to be the legitimate heir of the original group.

Embarrassed cults

A cult is "embarrassed" when it doesn't want to admit that it's a cult, because the leadership lacks the personal virtue necessary to operate a straightforward cult but still wants to maintain control. They may do this through some combination of pretending that the group's culture is more egalitarian than it actually is, and/or pretending that its size is smaller than it actually is. (This is denoted on the diagram by an arrow with an open circle on its base - the arrowhead is what the group pretends to be, and the base is what the group really is.)

#6. "If you don't like it, you can leave"

"...but we know you're not going to."

The leader of such a group may pretend that they are not claiming any personal authority at all, but "just" observing that the current clique isn't working out (see #2). However, there is an obvious asymmetry in that it is one particular party who is taunting the other one to quit, and not vice-versa. Therefore the subordinate party stands to lose a lot more, and is thus likely to accept a considerable amount of dissatisfaction before they finally decide to leave.

#7. "Yeah, so if you could go ahead and get that done, that'd be great"

In the classic corporate dystopia, HR and management want you to think of your team as a small clique, so that your desire for personal connection will be redirected towards the company. They may ask for your opinion, but have no intention of listening to it. Critics rightly warn young professionals against getting sucked into environments like this, where one is prone to being manipulated into accepting substandard pay and working conditions. The warning usually given is: You should be as loyal to the company as they are to you, i.e. not at all.

#8. "Carried u-... [nervous glances] ...-nanimously"

(Clip 1, clip 2)

A group may put on the trappings of a guild to disguise the fact that it is still exercising top-down authority rather than being a bottom-up enterprise. For example, in a typical homeowner's association (HOA), there was never any point at which a group of homeowners got together and decided they wanted to form an HOA. Rather, what usually happens is that a developer buys a large plot of land, builds a bunch of houses on it, and creates an HOA whose membership attaches to each house, which are then sold one-by-one to buyers who otherwise have no connection to each other. Most of the homeowners thus have no real interest in participating in the HOA, but begrudgingly accede to the edicts of a handful of busybodies who have too much time on their hands.

Evolution of a growing clique

The culture of a clique may at first be ambiguous (A) because there is nothing really at stake. As it grows, however, if it does not simply break up, it will need to either follow the egalitarian path and become a guild (B), or the authoritarian path and become a cult (C). And in the latter case, the cult will inevitably be an embarrassed one, because if there had been someone with the requisite virtues to be a cult leader, the group would never have spent much time as a clique in the first place, but would have been a straightforward cult (D) from the beginning, and maintained its cultiness throughout its growth.

Therefore, as is probably clear by now, I think outcome C is bad and B is better. If a group has landed at C, then it may with great effort be pulled kicking-and-screaming to B - but this is likely to ruffle some feathers.

(I also suspect that there is a tendency for groups to get stuck at the "triple point" with around 30 members, in an uncomfortable equilibrium between all three types because the group cannot decide what it wants to be.)

What's so great about guilds? (Plan of the sequence)

Forthcoming articles in this sequence will lay out a case for why we should want more guild-like organizations to exist. (Links will be added as the articles are posted.)

  • A guild can grow larger than a clique (This article)
  • A guild makes it possible to improve things without schism (Fear of crowding out)
  • A lack of guilds leads to a general malaise and atrophy of democratic values (We live in a society)
  • A guild can contribute to the social fabric in a way more ambitious than cliques (Call for machers)
  • A guild can be more robust than a cult because it can better distribute important responsibilities ("Community organizer" is a double oxymoron)

Other articles (Society is a social construct, pace Arrow; Rubber stamp errors; Anti-civicality) will discuss various norms that are necessary for a guild to function well, but which may seem strange or unintuitive for people who are accustomed to cliques or cults. I will conclude with a reflection (So are you some kind of communist?) on the tension between social and individual moralities.



Discuss

We need Git for AI Timelines

2026-04-13 17:04:19

I was recently reading the AI Futures' Q1 2026 timelines update and noted their quarterly updates (the last one being in December, with the release of the AI Futures Model) are struggling to keep pace with the thing they're trying to track.

The pace of AI development is incredibly fast and only hastening; Kokotajlo's shortened his timelines for an AC by 18 months (late 2029 to mid 2028) in a single update due to 4 specific parameter changes. Five days later, Anthropic announced Claude Mythos Preview, which arguably invalidated some of the said parameters before the ink had time to dry.

This isn't a criticism of the AI Futures Project; they do commendable work. To be clear, Kokotajlo and the AI Futures Project are arguably the best at what they do in the world. His track record is remarkable, and AI2027 has sparked immense conversation about the future of AI/timelines (it's what got me into LW), but when the field changes completely in its pacing every two months, the community more often than not is navigating with an outdated map. And the problem is getting worse. Mythos hasn't yet been evaluated by METR, Spud hasn't released, and by the time the Q2 update drops, the field will have again shifted to another focal point.


But the cadence itself is the surface issue; updates aren't nearly granular enough to be tied back to each "step". When Kokotajlo updates his priors for an AC, we don't see the causal chain leading to each decision shortening his timelines by X amount. His rationale for the AC median being 1 year of autonomous work was that Opus 4.6 "impressed" him. But the actual definition of what 1 year even means remains muddy; the original AI2027 scenario had the median set at 6 months for an SC before moving it back to 3 years. The SC definition shift of 3y-1y accounted for around half of the 18 month shift in his Q1 update; the stated justification is Opus "impressed" him. Impressed how? At what point between December and April did he change his priors? The entire causal chain here collapses to a single word in a blog post.

In software engineering, this would be the equivalent to someone pushing a commit to main with a message "fixed stuff because it now works". You'd never accept that for code, so why would you accept that for a justifiable reason for the most important technological revolution in human history?

There's no unified platform where forecasters can independently publish their timelines with substantial backing/integration with the platform itself. Sure, you can write a Substack article, spin up a short LessWrong post, perhaps post a Twitter thread, but these are strung all over and are discontinuous for someone trying to get a concrete perspective of what different forecasters think. One might say Metaculus is the solution; while this is a way of congregating forecasts, it's still less than optimum. Conversation and rationale is walled behind "forecast and pay" without a congregational space to discuss the reasoning behind those forecasts (yes there is a comment feature but it is scarcely used). There was an excellent post around Broad Timelines that highlighted this; Metaculus highlights "medians" and less of a full distribution that's more sought after in our space.

As neo noted in said post, we need to "design info-UI tools that facilitate that (the timeline formulation) process". Broad distributions need platforms that can track how they update over time. A quarterly blog post cannot do that. Forecasts updated granularly over time with reasoning and deliberation behind them can.

Why I'm using Git here as an analogy; SWEing fixed this class of problem years ago. You had commits (changes in timeline predictions) diffs showing what changed, comments showing why they changed, branches for code (in this analogy, scenario) forks, blame for accountability (we need to be less wrong after all), and merge conflicts that require resolution rather than dissolving into Twitter discourse.


The minimum viable version of this is frankly embarrassingly simple. A GitHub repo with each forecaster maintaining a YAML file with their distribution for an agreed upon definition (whether it be an AC, SC, ASI etc.). Commits are updates to said files/timelines with rationale in the commit message.

Claude Opus 4.6 had a 80% time horizon of 70 minutes. Assuming Mythos has an 80% TH of ~240 min, the doubling time is ~34-40 days. Even if we're pessimistic at a time horizon of 180 minutes, the doubling time is still 45 days. The thing we're forecasting is now shorter than our update cycle.


The rationalist community, of all communities, should find that unacceptable.



Discuss

Treaties, Regulations, and Research can be Complements

2026-04-13 15:04:06

I think the debate over whether AI risk should be addressed via regulation or treaties is often oversimplified, and confused. These are not substitutes. They rely on overlapping underlying capacities and address different classes of problems, and both van benefit from certain classes of research.

David Kreuger, to pick on someone whose work I largely agree with, recently posted that “Stopping AI is easier than regulating it.” I largely agree with what he says. Unfortunately, I also think it is an example[1] of advocates for a cause creating fights where they're not needed, and in this case making the discussions around AI unfortunately more rather than less contentious, and less rather than more effective.

And the reason the fights are not needed is that different risks live at different levels, and different tools are effective in different ways.

Clearly, many of the risks and harms of AI should not be addressed internationally. There is little reason or ability to harmonize domestic laws on fraud, discrimination, or liability, which would be a distraction from either reducing the harms or addressing other risks. Existing laws should be adapted and applied, and new regulations should be formulated where needed. International oversight would be unwieldy and ineffective for even most treaty compliance efforts - as other treaties show, there is a mix of national and international oversight. But domestic regulation can create liability incentives, require or standardize audits, clarify rules, and provide enforcement mechanisms and resources. All of those are at least sometimes useful for treaties as well. When Kreuger says “the way I imagine stopping AI is actually a particular form of regulating AI,” he is not talking about the harms and risks regulation could address - though given what he has said elsewhere, he agrees that many of them are worth mitigating, even if they are not his highest priority. So it should be clear that treaties will not, cannot, and should not address most prosaic risks of AI systems and misuse.

By the converse argument, which he and others have made convincingly in the past, some harms of AI systems come from racing towards capability rather than prioritizing safety. These types of risk emerge from the dynamics of international markets and from great power competition. Obviously, these dynamics aren’t well addressed by domestic regulation on the part of any single actor. It is incomprehensible to talk about regulation alone to address those risks, just like it is tendentious to talk about using international treaties to mitigate other classes of risks and harms of AI systems.

Unfortunately, many discussions put “we need a global treaty to stop AI risks” in opposition to “domestic regulation is the only realistic path.” Not only do I think this is backwards, but I’ll argue that so is the related false dichotomy of industry self-regulation versus government rules. Industries that embrace safety welcome well-built regulation. Even in areas where they don’t have strict rules, airlines have national bodies that manage risk and accident reporting. (And the AI industry leaders often claim to be the same way, wanting national or international rules - just not any specific ones.)

So, to come to my unsurprising conclusion, we actually have several different plausibly positive and at least partially complementary approaches. 

  1. Certain classes of research produce techniques like, evals, interpretability, human oversight approaches, control methods, and operationalizable definitions of specific risks. Some of these are dual use or net negative, but the parts that are useful are complementary to both regulation and treaties. 
  2. Regulation needs operationalized definitions of risks, measurable standards, concrete goals, auditable procedures and oversight methods, and investigatory tools. Many of these are enabled by specific forms of technical or policy safety research. 
  3. Treaties need shared definitions, clear goals, regulatory oversight and enforcement, credible verification, and both technical and regulatory methods to distinguish compliance from defection. Some of these are enabled by regulation, some by relevant research.

So we end up with a sort of triad, where research can enable measurement and definitions, and provide tools, regulation can force adoption and enforce usage of tools, and treaties can align incentives around defection dilemmas and provide common aims.

This doesn’t imply that most safety research is net risk-reducing, that most regulation is useful, or that most possible treaties will reduce risks. But it does say that they can be complementary. Some disagreements are substantive. But others are treating complementary approaches as mutually exclusive - and I think we should instead figure out common ground, which can make the fights about these issues both more concrete, and narrower.

  1. ^

    yet another example



Discuss

5 Hypotheses for Why Models Fail on Long Tasks

2026-04-13 14:54:19

Written extremely quickly for the InkHaven Residency.

Like humans, AI models do worse on tasks that take longer to do. Unlike humans, they seem to do worse on longer tasks than humans do.

This is a big part of why the METR time horizon results make sense: because longer tasks are also “harder” for models, and more capable models can do longer tasks, we can use the length of tasks that the models can perform as a metric of model capability.

There’s a clear etiological or causal-historical explanation of why models do worse at long tasks: they’re probably trained on more short tasks and fewer long tasks. This is both because it’s easier to make shorter tasks, and because you can train models on more short tasks than longer tasks with a fixed compute budget.

But from the perspective of AI evaluations, it’s also worth considering mechanistic explanations that make reference only to how properties of long tasks interact with the AI system in deployment. Whatever the training story may be, the AI models as they currently exist have some property that makes long tasks genuinely harder for them in a way that tracks capability. Understanding what this property is could matter a lot for interpreting the METR time horizon and even for forecasting AI capabilities over time.

So here are five such possible hypotheses that explain why longer tasks seem consistently harder for current models, based in large part on my experience at METR.

Long tasks are less well defined, and require judgment or taste (which models are bad at). For a software engineer, a 1-minute coding task might involve composing a single 10 line function or running a relatively simple SQL query. By their very nature, these tasks tend to be easy to define and easy to score, with relatively objective success criteria and little human judgment involved. A 15 minute task may be implementing a relatively simple data processing script or fixing a simple bug: more complicated, but still relatively easy to score. In contrast, an 8 hour task likely involves substantial amounts of design taste (in ways that are harder to score), and month long tasks likely involve communicating with a stakeholder or building code with properties that are hard to algorithmically verify (e.g. maintainability). (This is also related to why algorithmically scorable longer tasks are harder to make.)

While the longer METR tasks are still algorithmically scored, they tend to require models to build sophisticated software artifacts or iteratively improve on experiment design, where taste plays a larger role in success. Since models seem to lack ‘taste’ of some sort, relative to humans of comparable execution ability (hence the complaints about AI Slop), this could explain why they do worse on longer tasks.

Long tasks require more narrow expertise (which models may not have). An important property of the METR task suite is that longer tasks should not be trivially decomposable into shorter tasks. That is, a 10 hour-task should not trivially be decomposable into 10 1-hour tasks, and 10 short math problems do not become a single longer math problem. Perhaps as an artifact of the property, many of METR’s longer tasks (and perhaps longer tasks in people’s day-to-day work in general) rely on more specialized procedural knowledge that is hard to easily acquire via Google. For example, many of METR’s long tasks are cryptographic or machine learning challenges that require some amount of procedural knowledge in the relevant fields to approach. Insofar as the long tasks are more likely to require procedural knowledge outside the AI models’ area of expertise, they may struggle.

Personally, I find this relatively unlikely as an explanation for the METR time horizon tasks (since AI models seem to have a lot of expertise in the relevant areas), but it might be a large explanation for the inability of AIs to autonomously complete large tasks in general.

Long tasks take models longer, leading to more stochastic failures (which models exhibit). A popular explanation that people cite is that tasks that take humans longer also take AI agents more steps to complete, and AI are not fully reliable, and fail with some small probability on each step. For example, Toby Ord raises this as a hypothesis in a response to our Time Horizon paper.

I think this is definitely part of the explanation (and why longer tasks are harder for humans as well), with some caveats: first, I caution against naively interpreting human time as proportional to AI steps and applying a constant hazard model. For example, it turns out that if you fit the failure rate model for AI agents over time, the failure rate goes down as the task goes on! Second, AI models seem to have different time horizons across different domains, and simple versions of this hypothesis cannot explain that phenomenon.

Long tasks take models longer, causing failures due to distribution shift or self conditioning (which models may suffer from). A related explanation is that longer tasks take models more off-distribution: base models (at least earlier on) were not trained to predict long sequences of model-generated outputs, and even RLVR’ed models were probably trained with short tasks, far shorter than the 16 hour, tens of millions of token tasks that we might ask them to do. This increases both the chance that the models are simply off distribution (and thus may be less competent in general), and the chance that they accumulate errors by chance and start conditioning on being the type of agent that makes such mistakes (and thus becoming more prone to make such mistakes). In the same way that naive versions of the constant hazard model seem contradicted by evidence, I suspect that naive versions of this hypothesis are also likely to fail. But it’s possible that more sophisticated versions may play a key role in explaining the phenomenon.

Long tasks require better time and resource management (which models struggle with). Finally, an explanation that I often think is neglected is that longer tasks tend to require meta-cognition and explicit strategy, which current models seem to struggle with. A 5-minute task such as writing a simple function or script can be done in one go, without much planning, but getting the best score in a machine learning experiment over 8 hours requires allocating scarce resources including remaining time and compute. It’s been observed that models understandably struggle a lot with understanding how much (wall clock) time they take to do particular tasks, or often double down on failing approaches instead of switching strategies.

I welcome more thinking on this topic, as well as more empirical work to distinguish between these hypotheses.



Discuss