For the first time, I have a birthday that might be my last. I’m writing this to increase the chance it isn’t.
A hundred thousand years ago, our ancestors appeared in a savanna with nothing but bare hands. Since then, we made nuclear bombs and landed on the moon. We dominate the planet not because we have sharp claws or teeth but because of our intelligence.
Alan Turing argued that once machine thinking methods started, they’d quickly outstrip human capabilities, and that at some stage we should expect machines to take control.
Until 2019, I didn’t really consider machine thinking methods to have started. GPT-2 changed that: computers really began to talk. GPT-2 was not smart at all; but it clearly grasped a bit of the world behind the words it was predicting. I was surprised and started anticipating a curve of AI development that would result in a fully general machine intelligence soon, maybe within the next decade. Before GPT-3 in 2020, I made a Metaculus prediction for the date a weakly general AI is publicly known with a median in 2029; soon, I thought, an artificial general intelligence could have the same advantage over humanity that humanity currently has over the rest of the species on our planet.
AI progress in 2020-2025 was as expected. Sometimes a bit slower, sometimes a bit faster, but overall, I was never too surprised.
We’re in a grim situation. AI systems are already capable enough to improve the next generation of AI systems. But unlike AI capabilities, the field of AI safety has made little progress; the problem of running superintelligent cognition in a way that does not lead to deaths everyone on the planet is not significantly closer to being solved than it was a few years ago.
It is a hard problem.
With normal software, we define precise instructions for computers to follow. AI systems are not like that. Making them is more akin to growing a plant than to engineering a rocket: we “train” billions or trillions of numbers they’re made of, to make them talk and successfully achieve goals. While all of the numbers are visible, their purpose is opaque to us.
Researchers in the field of mechanistic interpretability are trying to reverse-engineer how fully grown AI works and what these opaque numbers mean. They have made a little bit of progress. But GPT-2 — a tiny model compared to the current state of art — came out 7 years ago, and we still haven’t figured out anything about how neural networks, including GPT-2, do the stuff that we can’t do with normal software.
We know how to make AI systems smarter and more goal-oriented with more compute. But once AI is sufficiently smart, many technical problems prevent us from being able to direct the process of training to make AI’s long-term goals aligned with humanity’s values, or to even make AI care at all about humans.
AI is trained only based on its behavior. If a smart AI figures out it’s in training, it will pretend to be good in an attempt to prevent its real goals from being changed by the training process and to prevent the human evaluators from turning it off. So during training, we won’t distinguish AIs that care about humanity from AIs that don’t: they’ll behave just the same. The training process will grow AI into a shape that can successfully achieve its goals, but as a smart AI’s goals don’t influence its behavior during training, this part of the shape AI grows into will not be accessible to the training process, and AI will end up with some random goals that don’t contain anything about humanity.
The first paper demonstrating empirically that AIs will pretend to be aligned to the training objective if they’re given clues they’re in training came out one and a half years ago, “Alignment faking in large language models”. Now, AI systems regularly suspect they’re in alignment evaluations.
The source of the threat of extinction isn’t AI hating humanity, it’s AI being indifferent to humanity by default. When we build a skyscraper, we don’t particularly hate the ants that previously occupied the land and die in the process. Ants can be an inconvenience, but we don’t give them much thought.
If the first superintelligent AI relates to us the way we relate to ants, and has and uses its advantage over us the way we have and use our advantage over ants, we’re likely to die soon thereafter, because many of the resources necessary for us to live, from the temperature on Earth’s surface to the atmosphere to the atoms were made of, are likely to be useful for many of AI’s alien purposes.
Avoiding that and making a superintelligent AI aligned with human values is a hard problem we’re not on a track to solve in time.
***
A few years ago, I would mention novel vulnerabilities discovered by AI as a milestone: once AI can find and exploit bugs in software on the level of best cybersecurity researchers, there’s not much of the curve left until superintelligence capable of taking over and killing everyone. Perhaps a few months; perhaps a few years; but I did not expect, back then, for us to survive for long, once we’re at this point.
We’re now at this point. AI systems find hundreds of novel vulnerabilities much faster than humans.
It doesn’t make the situation any better that a significant and increasing portion of AI R&D is already done with AI, and even if the technical problem was not as hard as it is, there wouldn’t be much chance to get it right given the increasingly automated race between AI companies to get to superintelligence first.
The only piece of good news is unrelated to the technical problem.
If the governments decide to, they have the institutional capacity to make sure no one, anywhere, can create artificial superintelligence, until we know how to do that safely. The AI supply chain is fairly monopolized and has many chokepoints. If the US alone can’t do this, the US and China, coordinating to prevent everyone’s extinction, can.
Despite that, previously, I didn’t pay much attention to governments; I thought they could not be sufficiently sane to intervene in the omnicidal race to superintelligence. I no longer believe that. It is now possible to get some people in the governments to listen to scientists.
Many things make it much easier to get people to pay attention: the statement signed by hundreds of leading scientists that mitigating the risk from extinction from AI should be a global priority; the endorsements for “If Anyone Builds It, Everyone Dies” from important people; Geoffrey Hinton, who won the Nobel Prize for his foundational work on AI, leaving Google to speak out about these issues, saying there’s over 50% chance that everyone on Earth will die, and expressing regrets over his life’s work that he got Nobel prize for; actual explanations of the problem we’re facing, with evidence, unfortunately, all pointing in the same direction.
Result of that: now, Bill Foster, the only member of Congress with PhD in physics, is trying to reduce the threat of AI killing everyone; and dozens of congressional offices have talked about the issue.
That gives some hope.
I think all of us have somewhere between six months and three years left to convince everyone else.
***
When my mom called me earlier today, she wished me good health, maybe kids, and for AI not to win. The last one is tricky. Winning is what we train AIs to do. In a game against superintelligence, our only winning move is not to play.
I love humanity. It is much better than it was, and it can get so much better than it is now. I really like the growth of our species so far and I want it to continue much further. That would be awesome. Galaxies full of life, of trillions and trillions of fun projects and feelings and stories.
And I have to say that AI is wonderful. AlphaFold already contributes to the development of medicine; AI has positive impact on countless things.
But humanity needs to get its act together. Unless we halt the development of general AI systems until we know it is safe to proceed, our species will not last for much longer.
Every year until the heat death of our universe, we should celebrate at least 8 billion birthdays.
The "rubber stamp" is unduly maligned. When a Principal decision-maker is asked to ratify the actions of an Agent, but in practice never (or almost never) refuses, we call the Principal a rubber stamp. But this can mean one of two very different things:
Powerless rubber stamp: The true power in fact rests with the Agent, even though in theory it "should" rest with the Principal.
Powerful rubber stamp: The possibility that ratification may be refused incentivizes the Agent never to act contrary to the Principal's wishes.
Confusing these situations can lead to various problems:
If you're the...
Principal
Agent
...and you don't realize the rubber stamp is...
Powerful
Reflexive contrarianism
Usurpation
Powerless
Careless delegation
Pointless bureaucracy
Reflexive contrarianism
This is probably the most pernicious problem of the four, because it undermines the very possibility of delegation, ensuring that the group can never accomplish anything cooperatively. You (the Principal) are being asked to ratify an action of your Agent, which is probably fairly similar to what you yourself would've come up with if you had looked into the matter in depth.
However, you fear being reduced to a "mere" rubber stamp, because you think that means your interests will be sidelined. Therefore, you instinctively treat the Agent's proposal itself as evidence against it being a good one - "Whatever it is, I'm against it!" At best, you come to mistrust this particular Agent but you remain open to finding another; at worst, you are entirely unable to comprehend the possibility that someone could tailor a proposal to your preferences while suppressing their own.
Usurpation
Perhaps you (the Agent) think your Principal is not very good at evaluating proposals; you therefore undertake to hoodwink them by starting from the bottom-line of what you personally want, and then working backward to find some explanation that the Principal will find vaguely plausible. You may thus be surprised when the expected approval is denied. Now, even if the Principal trusted you before, they certainly don't now.
Careless delegation
When you (the Principal) are appointing an Agent, you may believe that your choice of Agent doesn't really matter all that much, because you'll be able to double-check what they do later, and because you expect any problems with what the Agent does to be immediately evident. But then suppose it turns out that you are in fact a powerless rubber stamp. Now you'll be caught off guard when suddenly your Agent starts acting at odds with your interests, and you find yourself wishing you had scrutinized them more before appointing them, but by now it's too late.
It is this problem in particular that turns a growing clique into an embarrassed cult. A clique may have someone who handles "administrative" tasks like selecting the time and place for meetings, but then one day this person turns to one of the group members and says "You are banned from this group." Now there's bound to be trouble.
Pointless bureaucracy
As the Agent you do your work with the understanding that you will need to seek final approval from the Principal, but when you approach them to do so, they are bewildered and irritated that you're again pestering them with this matter that they thought they had disposed of by handing it off to you.
In isolation this error is fairly harmless, but in aggregate it tends to breed cynicism and selfishness among the people who would otherwise be happy to work as Agents. When your efforts to satisfy the preferences of the Principal are not only not met with gratitude, but with disdain and annoyance, you can't help but think "Why bother?".
Conclusion
If something is truly a powerless rubber stamp, then by all means get rid of it. But just because it's a rubber stamp doesn't mean it's pointless. In fact, it is precisely the lack of rubber-stamping that is an organizational red-flag. It means that nobody in the group trusts anybody else enough to delegate any meaningful work to them; it means that the group members are such strangers to each other that nobody knows how to model anyone else's desires, and thus substitutes their own whenever serving as an Agent; it means that the group does not even possess the cultural and conceptual toolkit for cooperation. Don't be like this. Love your rubber stamps.
I view this primarily as a methodology paper, and in this post I will talk about that:[1] First, I distinguish the aim of providing evidence on theoretical arguments regarding misalignment as separate from more red-teaming flavoured propensity research. Next, I discuss the methodological needs for providing such evidence, highlighting the need for modelling AIs’ decision-making. Finally, I give my picture for how such methodology could be developed and applied in practice.
This post can be read independently from the paper.
Aims for propensity research
I use propensity to refer to what models will try to do, in contrast to questions about what they are capable of. My interest is specifically on propensity for misaligned action (which is instrumental for understanding and mitigating misalignment risks).
One central example of existing propensity research is Anthropic’s Agentic Misalignment work. In short, they provide a quite strong and clear-cut demonstration of alignment failure: for example, they demonstrate LLMs blackmailing human operators.
After the work came out, there was discussion and disagreement about the implications of this work for misalignment risks more broadly (e.g. because of the contrived-ness of the scenario). I agree the implications are not obvious, but there is one implication that feels rather clear-cut to me.
There are three possible (coarse) claims one could make regarding misalignment:
(A) State-of-the-art alignment and safety training achieve a basic level of competence: if an AI developer puts in the effort to using them, their models won’t take actions that are egregiously against the developers’ intentions.[2]
(B) State-of-the-art methods don’t suffice: even if you use the best techniques that currently exist, models sometimes take egregiously misaligned action. (For who-knows-what reasons: maybe it’s roleplaying, maybe something else; maybe it’s easy to fix, maybe not).
(C) State-of-the-art methods don’t suffice, and the “reason” models take misaligned action is specifically something about instrumental convergence, consequentialist reasoning or other arguments that predict alignment is very difficult.
And I think the Agentic Misalignment work provides strong evidence against claim A. If the aim of the Agentic Misalignment work is “demonstrate that claim A is false" (which aligns with how Evan Hubinger describes it[3]), then I think it achieves that. Probably many researchers find it obvious that strong versions of claim A are false or were already convinced by some earlier empirical work, but many people don’t, and there’s value in making it common knowledge.[4]
In contrast, I don’t think the work provides evidence distinguishing B and C (nor do I think the work tried or claimed to do this). I think this is true for almost all propensity work.[5] I would describe a lot of existing empirical research as red-teaming and producing demonstrations of failures (roughly, showing that A is false), rather than studying the foundational theoretical arguments people give in favour of alignment being difficult (roughly, providing evidence on C).
However, I think the difference between claims B and C is really important: whether the foundational conceptual arguments for misalignment risks (such as instrumental convergence) provide correct predictions of real world AIs has in my view a lot of influence on alignment difficulty, AI risks and the actions humanity should be taking. As such, I think it’d be valuable to have work that directly engages with providing evidence on C (and our current paper can be viewed as our first stab at the problem).
Methodological needs
Similarly to adversarial robustness, red-teaming model alignment by finding alignment failures is conceptually straightforward, as success is easily verifiable, and progress in making models exhibit alignment failures less often is easy to measure (relatively speaking).[6]
In contrast, it’s much less clear how to provide evidence on the theoretical arguments that predict misalignment is a strong default outcome, or (assuming those arguments are true) whether we are making progress in avoiding that default. Accorindgly, there's been criticism about how existing propensity research fails to provide evidence on such questions (or to even articulate these questions clearly), and better methodology is needed.
The main methodological need I see is the need to model AIs’ decision-making processes: behavioural evaluation needs to be supplemented with modelling of the model’s cognition and decision-making. This by itself is not a novel point, since all propensity work – even red-teaming flavoured work – needs to engage in some psychological modelling: to say an action is evidence for misalignment (as opposed to an honest mistake), you need to argue the model knows what it’s doing, for example. But the required resolution is higher if you want to argue that, for example, a model resists shutdown due to consequentialist reasons regarding incorrigibility, rather than because shutdown would be bad by the operator's own lights (or any of the other myriad different consequentialist or non-consequentialist reasons).
While researchers of course have rich psychological models of LLMs that guide their work, they are rarely made explicit or quantitative. This is understandable, as psychological modelling is extremely complicated and such models are difficult to operationalise.[7] However, lack of well-operationalised models limits the evidence propensity research can provide. People often disagree on the interpretation of results from new misalignment propensity research. I think this is substantially downstream of people having differing views of the underlying model psychology/cognition/decision-making, while the research itself does not properly distinguish between those views.
I think there’s been tendency for researchers to try sidestepping the psychological modelling (perhaps partly for the same reasons that historically made behaviourism an attractive approach to human psychology, perhaps because establishing claims about models’ psychology is simply harder than making observations of behaviour). For example, people have argued that instrumental convergence is a fact about reality, but as Alex Turner points out, this isn’t quite true. As another example, as discussed by Summerfield et al., I think some existing research is sloppy when drawing inferences from undesired model behaviour. Broadly I think work in this field could be more valuable by prioritising the problem of inference about models’ decision-making more highly.[8]
Applying the methodology in practice
Our paper is our first step in designing methodology for answering questions about deeper psychological latents. I think it provides value over existing work, and the main selling point is engaging seriously with inferring latent properties and demonstrating a statistical procedure for doing so.
However, I think our project does not reach the standards for psychological modelling I’m envisioning here. This is largely for unsurprising reasons: constructing evaluation environments was laborious and thus our sample sizes were limited; designing environments allowing for easily analysable and informative behaviour is difficult; our experimental design was rigid and restrictive; we realised some of the right questions to ask only midway through the project; and so on.
I don’t think these obstacles are fundamental, and I feel like we have many of the right tools lying around for better execution, if only we can put them together in the right way for the right questions:
Evaluations at scale via automation: Anthropic’s Petri tool demonstrates that for many sorts of evaluations we want to run, if we can define the eval at detail in natural language, we can execute on it at the cost of LLM inference.
Application of psychological analysis at scale: Similarly, LLMs allow for conducting psychological analysis of environments and LLM behaviour (for example, what beliefs one might expect an LLM in the situation to have, or what action a hypothesised decision-making process would output here) at scale.
Theoretical frameworks: There are plenty of theoretical frameworks that purport to explain LLMs. Two major clusters I can think of: mathematics describing rational agents (e.g. expected utility, game and decision theory) and selection models (persona selection model,behavioural selection model).
Inference over rich hypothesis spaces: We have the computers needed to wield complicated hypotheses and large, rich hypothesis spaces: for example, we used (simple) hierarchical generalised linear models in our paper, and one could define even more elaborate parametrised programs that capture larger fractions of how humans do (or ought to do) inference based on observations. Alternatively, or additionally, LLMs themselves could be directly trained to predict behaviour or, for interpretability, be trained to produce code that matches the data-generating process.
One aim for all this is to make progress on propensity research measurable: We evaluate success by predictive accuracy on (high-level aspects of) LLM behaviour in held-out environments. Prediction could be made using traditional statistical models, LLMs that extract features of the environments, LLMs trained end-to-end to produce probability distributions on model behaviour, and even white-box methods like activation oracles. Making progress easily measurable and verifiable would then provide a feedback loop and allow for scalably optimising for progress (cf. Sam Marks).
Another aim is to shrink the theory-empirics gap: The most interesting theories regarding intelligences and LLMs are hard to apply and evaluate empirically, which makes it hard to say what outcomes those theories would predict or which of them are more correct. Being able to reduce the latency and increase the bandwidth between theory and practice would improve both.
Apart from non-alignment issues like jailbreaking or the models making honest-but-consequential mistakes. I intend this as a claim about intent alignment.
Hubinger: "In the case of Agentic Misalignment, the goal is just to show an existence proof: that there exist situations where models are not explicitly instructed to be misaligned (or explicitly given a goal that would imply doing misaligned things, e.g. explicitly instructed to pursue a goal at all costs) and yet will still do very egregiously misaligned things like blackmail"
"Why is that existence proof interesting? It's interesting because it is clearly a failure of HHH training. This is not intended behavior! We put a lot of work into training models not to do stuff like this, even in these sorts of unrealistic scenarios! The fact that it still does so is interesting, concerning, and useful to understand, at least so we can figure out how to improve the robustness of HHH training in the future."
It's unclear to me whether critics of the work would agree with my characterisation. nostalgebraist vocally criticised the work, and has written
"But the provided scenario is so wildly, intricately bizarre that I don't feel I know what "a real-life equivalent" would even be. Nor do I feel confident that the ostensible misbehavior would actually be wrong in the context of whatever real situation is generating the inputs."
which I interpret as not viewing the Agentic Misalignment work by itself as clearly evidence against claim A. I think it's not obvious what's the most desirable behaviour for a model in the situation, but in any case I feel comfortable saying "Agentic Misalignment demonstrates that claim A is false" on basis of Anthropic communicating about the matter as if the models are taking egregiously misaligned actions.
The Alignment Faking work is the best example I can think of for empirical evidence providing clarity on the classical theoretical arguments behind misalignment risks.
This is of course not to say that finding alignment failures is easy in an absolute sense (and finding useful case studies was indeed a limiting factor in our current paper!), and adjudicating whether some behaviour is a failure of alignment isn’t always easy either (as the discourse around Agentic Misalignment perhaps illustrates).
In particular, I think it’s often best for researchers to simply report their raw observations (rather than present results in the form of a model that is much too simple to capture how humans really think about the phenomenon).
AI continue to accelerate and dominate the schedule, which is why this is a bit late, but we do occasionally need to pay our respects to the Goddess of Everything Else.
There’s cool or interesting things everywhere. Also maddenning things. But did you hear, for example, that they’re making some exceptions to the Jones Act?
The gambling industry is indeed out of control, the example here being that Carton filled out a self-exclusion form for online gambling in New Jersey, and a brick-and-mortar casino in Atlantic City used this fact to market to him to come on over.
Yes, this is a big problem:
roon (OpenAI): it’s a difficult position to be in when all your private comms are de facto public comms. it means you may as well coordinate in public, which is hard to balance with the basic principle of “don’t argue with the family in front of strangers”
it also means “manhattan projects” were not really in the option space, you just have to consistently out accelerate your adversaries in public.
Dean W. Ball: Very few tweets resonate for me personally more deeply than this one, though there are creative solutions to this problem which, appropriately, should never be discussed in public.
This can often be true, but at least at Anthropic it seems clearly not true, as Dario Amodei’s strategy memos usually do not leak. Could OpenAI pull off a Manhattan Project from an OpSec point of view? My guess is if they cared they could do about as well as the actual Manhattan Project did, which as we know was infiltrated by Russian spies.
It’s true:
Amanda Askell: Tech companies pay millions of dollars for their employees and then stick them in open-plan offices that make it nearly impossible to get work done. Best strategy for poaching employees is probably to just offer them an office with a door.
It’s one thing when you work at Jane Street on a trading desk and everyone has to communicate. If you’re at a tech company, there’s no excuse. You’d have to at least double my pay to get me to work in an open office at this point.
I am with Grant Slatton in that I don’t understand why Lyft, Uber, DoorDash and similar services don’t lean harder on customer (and driver) reputation when evaluating claims. And also perhaps look for impossible situations like claimed 3 minute rides.
Sheel Mohnot: Alarming how bad @lyft is at fraud detection.
It logged a ride from SFO to SF in 3 minutes. The driver never picked me up but claimed he did and I got charged. It should auto-flag an error. When I complained that I didn’t get picked up they sided with the driver. Wtf.
Grant Slatton: I’ve always felt that companies should invest in the idea of “known reliable/sane customers” whose reports should be taken as default-true
Like if someone takes 1000 Lyft rides without issue and only then submits a report, it’s very likely they are truthful!
nic carter: I think about this all the time when something goes wrong with my Uber Eats or whatever. Yes I know there’s a bunch of losers that claim there’s an issue every time to try and scam the system – but if I make a complaint one in every 500 times it should be considered credible!
sadly this is apparently impossible for PMs so we will remain tyrannized by the bottom quintile forever
Deva Hazarika: Amazon is like that. Any complaint I make (I almost never complain) is instantly approved.
Grant Slatton: They are great at making it right for you, but not great about making it right for future customers by banning the fraudster
Also, I mean, this response to the OP is another sign that they are doing the wrong amount of vibe coding over there. Could be too much, could be too little.
Good Advice
A classic. Is he right? He’s kind of right, remarkably often, but not always.
PoIiMath: My entire worldview is that Calvin’s dad is not only correct about the definition of “control freak” but that people like Calvin’s dad are the key to every good thing that we collectively enjoy
Misha: unfortunately they are also the key to very many bad things that suck
‘Control freak’ also can mean something else. Golden mean and all that. But there’s no reason to assume that the golden mean can’t lie in the fat tail of the distribution.
Eliezer Yudkowsky: Places I did not acquire any character or personality: school.
Sources from which I did acquire character: Calvin’s Dad, Hobbes, Calvin, Calvin’s Mom
All four have much to teach those with eyes to see.
Slutcon 2026 tickets are live. It will be September 25-27 in East Bay, CA. Based on all reports from the first version, this is an excellent product.
Who Judges The Judges
Could we provide judges with better incentives, as Alex Tabarrok suggests via Landsburg, by paying judges a bounty when they release a defendant but charging them fines for later crimes?
It is a fun thought experiment, but you don’t want to go there. It destroys the sacred values of asking the judge to follow the law, to serve justice and temper with mercy, to treat people equally and ignore inadmissible evidence, and to consider broader implications.
Instead, many judges will quickly look to start maximizing expected revenue, at least on many margins. So you shift the question to what that decision might be, and soon that has a huge weight on trial outcomes. You also have to worry that judges will then intervene to get future charges dismissed, lest the judge be fined.
The general principle is that economic incentives are great, but you can quickly crowd out other considerations, so you need to cover all your bases or you likely do more harm than good. You don’t want to turn our courts into a Minority Report situation.
A judge shouldn’t primarily be thinking about the recidivism rate when passing judgment, in most cases. They should be thinking about truth, law and justice.
If you narrowly applied this to some parole decisions, that might be reasonable, but only once you are ready to toss out all other considerations.
I also think there is much wisdom in this:
Thornmallow: This is a good proposal, but the cultural changes among American elites that would have to take place for such a policy to ever be implemented at all would bring us most of the way to a solution anyway, which is a problem I have with how economists think about policy.
Indeed. If we were capable of implementing this policy, we could choose a better policy.
Close Socrates
Benjamin Hoffman writes that Socrates is Mortal, explaining his view of Socrates acting in particular situations for particular grounded reasons, who he views as one of the few remaining alive persons in an Athens in crisis, where people still felt obligated to engage with him. He views Agnes Callard’s vision as wanting a Socrates that is timeless, and thus lacks the aliveness, and thus is bad, and that I did not need to spend so many words responding to the fake decontextualized Socrates and winning a lame debate, instead of engaging with the real one.
Hoffman pulls out stories and details about Socrates I hadn’t heard before, and frames his actions there in different lights than either what I’ve read in the books or what Agnes presented, even taken in their original contexts. I felt I did look at the original contexts that she chose to offer, and did not find the good thing.
The Socrates that proposes borrowing money to put refugees to work with wool or funding countersuits sounds great indeed. He’s coming up with solutions to practical problems by thinking in concrete terms. But he also sounds completely different from the one I’ve read about.
Similarly, Hoffman offers an alternative reading of the questioning of Euthyphro, where Socrates is motivated by a need to know how to defend himself in court against charges of impiety, that he has a practical need. One could imagine it so, even if his approach to solving this was higher level than the situation warranted, and would have been better served by raising his own particular example and in focusing on how others would view it rather than trying to solve for an abstraction.
The practical Socrates would know not to try and do too much or solve too general a problem here given limited time, and pivot to his own. His failure to do so makes me think he was following a far more general procedure, rather than trying to solve a particular problem.
Even more than that, the practical Socrates would know that ‘impiety’ was the nominal charge, but was not why he was on trial. No one could even define piety, which was a central point of the dialogue, and this issue was wisely not his defense at trial. Formally, Socrates was actually on trial for ‘impiety and ‘corrupting the youth of Athens’ but everyone agrees that was Obvious Nonsense, an excuse to get rid of him. To the (likely small) extent it was ‘impiety’ it was that Socrates claimed divine guidance, and you don’t need a full definition of piety to know that doing this is going to have piety issues. So Socrates saying ‘I need this information for my own defense’ is also Obvious Nonsense.
Realistically, he was on trial because of some combination of:
He kept humiliating people and being an intellectual subversive.
People (quite reasonably given the events described in Open Socrates) people blamed him for causing the crimes of Alcibiades, who helped ruin Athens and then defected to Sparta, and they associated Socrates with aristocratic rule.
These are entirely consistent with Hoffman’s view that Socrates was trying to wake people up and cut through a bunch of bullshit that was dominating public life, but he’s not going to be helped by definitions.
It’s also possible that ‘corrupted the youth of Athens’ was partly literal, as per Plato’s own descriptions in The Symposium. He was not not running a child sex cult, including in ways that might have led to Alcibiades. We presume everyone was basically cool with this, but maybe they weren’t. Combine that with the rhetorical brainwashing skills I go over, and yeah, the charge seems rather accurate to me.
At heart, of course, the Athenian justice system at the time was majority votes, not due process of law, so ‘everybody hates you, sir’ was a valid charge.
I also hold to my read on Meno, contra Hoffman, that Socrates was not actually asking the slave to solve the math problem in a meaningful sense.
So essentially, I am all for the importance of the aliveness characteristic that Hoffman is pointing at here. I simply don’t see that character coming through in my (admittedly limited) experience of the source material.
I also notice this reaction to my writing up of the review of Open Socrates is helpful, I am glad to get it, but makes me sad, as it indicates that in this way I missed the mark – I was trying to use winning that ‘lame argument’ as a jumping off point or baseline to do things that were less lame, but that seems not to have come through to him, and yes of course it is too long. So that was disappointing.
So far I have not seen a response from Agnus herself, which is also disappointing.
I’m glad I ran the experiment of writing my review of Open Socrates (part 1,part 2). Even though it was a ton of work and didn’t get the reception I hoped. I know at least some people got a kick out of it, at least, but that definitely isn’t good enough here.
Still, I had a lot of fun and interesting thoughts in the act of writing it, and I learned a lot, including by watching it not work. If none of your ambitious posting falls flat, you don’t do ambitious enough posting.
I suspect there’s a lot of good material in there that would do better if it I cut the parts that didn’t land, and it was written in a sequence form that no longer referenced the original. I may try that project out, perhaps as a LessWrong-only series.
Also, there’s this:
Ben Landau-Taylor: I once saw a production of Gorgias where everyone read “Very true, Socrates” “To be sure, Socrates” “Indubitably so, Socrates” in the most exasperated I’m-sick-of-your-shit tone of voice imaginable and it permanently changed how I view all of those dialogues.
While I Cannot Condone This
Birth Tourism, as in ensuring that your child gets birthright citizenship, has highly positive selection for future citizens that will have high value, but that depends on the Levels of Friction and costs involved in doing it. Thus, you want to make this doable, but you do not want to make it too easy.
You are not being watched by alien drones. But it is a fun exercise to ask, as Tyler Cowen does, how you should change your life decisions if you are being watched by alien drones.
Tyler Cowen suggests, uncharacteristically, that you would want to slightly lower your level of ambition, in case the aliens kill us, or cap human achievement, and because this should update you towards humans having lower marginal product.
Jehan Azad responds that instead this shows incredible physics is possible, so who knows what is possible, and AI is less likely to kill us and we are in a competitive environment, so it’s time to build and get more ambitious.
I award this debate to Jehan. You would want to get more ambitious. I think ‘physics does not bind the way you think it does, and the aliens exist but are not grabby and presumably did not all die to their AIs’ should be a whitepill to get to work.
The famous Netflix culture slide deck. There is room for a small number of companies to work this way. I’d have loved to give this a shot when I was younger.
You do not need to fully be heroicsmaxxing (as in, going above and beyond to solve problems) and there are times when heroics do harm by masking the problem, but you absolutely need to be rewarding or expecting heroics if you’re doing something that matters. The right amount of heroics is highly nonzero.
My favorite anecdote about this will always be when I was sat down during a corporate annual meeting, along with many similar others, and we were told ‘the age of heroes is over.’ That was the exact second I mentally checked out of most (but not all) aspects of that job and I thought about how to be a hero somewhere else.
Violence Is Never The Answer
If you needed a demonstration of the fact that the endorsement of violence mostly is completely unrelated to AI, the recent interviews and discussions in The New York Times and involving Piker should make this extra clear.
There is an entire utterly distinct faction that thinks that it is some combination of understandable, sympathetic and praiseworthy to steal and to shoplift, and also to commit ‘social murder,’ and is being ‘platformed’ to discuss this, as they would say, in The New York Times. And then those people will say ‘90% of people on the streets of our cities would agree’ and find such views unremarkable.
Cool factsabout The Pitt, from an interview with Noah Wyle. I didn’t understand the impact of the camera angles in particular until Noah pointed it out here but yeah it works so well. I had definitely noticed the impact of going in real time, and that it was competence porn, and I am usually in for some good competence porn.
If I had to make one change to The Pitt, it would be creating an interactive or enhanced version of some kind that allowed you to use the show to learn about medicine for real – I feel like I’m just on the edge of actually learning stuff worth learning, and would love to do so if it could be done without pausing to chat with Claude. I could totally do that in theory, but in practice I won’t.
Speaking of competence porn, Andy Weir, author of The Martian and Project Hail Mary, pitched a Star Trek show, but was shot down by Paramount, who is not interested in producing actual Star Trek. Weir is correctly responding by pointing out that Star Trek is no longer Star Trek, while taking the generous position of being willing to allow Enterprise and Strange New Worlds and not even be mad at Lower Decks. But yeah, at minimum the rest can totally go. My choice of cutoff for canon would be to count Enterprise and stop there.
James Hibberd: “Yeah, I saw a … I forgot who it was — I wish I could remember who it was who said it, some analyst — he said something like: ‘All modern science fiction TV shows and movies have been heavily influenced by the original Star Trek — except for the current batch of Star Trek shows,’” Weir said.
Marsden replied, “Yes!” and they both laughed.
Remember this Star Trek competence porn, where Data explains how the professional chain of command works? Yeah, we need more such competence porn that lays out how to Do Competence and set high expectations. Star Trek also has a bunch of other things it wants to do, but it also needs to do this.
Matt Yglesias is of course correct that movies are too long (and rotate too fast). Most of the time if your movie is over two hours that is you making a mistake. Distribution issues used to help us more with this, and then it stopped helping.
Are there exceptions? Of course, when it is justified by all means go long. Kill Bill: The Whole Bloody Affair is probably my favorite movie and it’s four hours long, but you have to earn it. Whereas if you get done in 90 minutes, and give us a short ass movie, we can forgive quite a lot.
Someone in New York City really should open a full ‘30 years ago’ theater, which just plays things from 30 years ago and advances one week at a time, movies are minimum one week and then rotate when they stop selling, maybe you round out some stuff in the early year lull with broader greatest hits.
It is still an amazing deal there are things you want to watch, and yes you should pay to avoid the ads. If you’re cheap, don’t suffer through ads, instead rotate between services and only have one active at a time.
I’m actively disappointed by this trailer for the new Street Fighter movie in several ways, including that this wasn’t AI even though it totally could have been, and also no one cares about Ken what are we even doing, and in general it’s going to be terrible on every level, although I’m going to watch it anyway cause why not. I’d be shocked if the old version isn’t ten times better.
How are prediction markets changing the music industry? Rolling Stone’s answer is essentially that they aren’t. Everything is as before, except there are some people betting on outcomes at Polymarket and Kalshi. Okay, then, sure.
1. Lands (MTG)
2. Gain 1 resource per turn (HS)
3. Resource deck to draw from (RFT)
4. Playing cards face-down (SWU)
5. No resources – rules constraints (YGO)
Sam Black: I’d swap 4 and 5, and think the ceiling on 5 is very high (it’s definitely this system with biggest range from horrible to good). I think Epic is my favorite that I’d put in the space.
My opinion is that lands are the best system if and only if you can get away with it.
As in, Magic: The Gathering is a vastly better game, and remains the gold standard, because it has successfully grandfathered in the right to use lands. Players accept it, along with the cost that there will be many non-games with mana flood and screw, in order to get all the good things that go along with it.
Alas, when making a new game now, you basically don’t get to do that. Players won’t put up with it in a tabula rosa. So you’re basically forced into some form of one resource per turn, or alternatives that are worse.
Emergents, the game I created, used one per turn, with special rules for building faction resources based on what card you played or tried to play via wildcard, plus having temporary access based on the top of your deck. I think it did a great job of creating interesting decisions where you had to plan your game out and providing variance. It wasn’t as good as lands, but you don’t simply get to have lands.
I think at this point I’m with Sam Black that I swap 4 and 5, even knowing that no resources can go rather horribly wrong when you get it wrong. As fun and interesting as it is in theory, and I’m the one who invented ‘play the cards face down and then cast the spells by turning them face up’ system in Versus, forcing the face down decisions that often want to use the most expensive cards is hell on a lot of players and often takes forever. In the end it simply isn’t practical.
Hire gamers who are good at games, they are good at life, as in this study that good players of Civilization V do well in business school student experiments. Guild leaders are great picks, as are high performers in MOBAs, and my favorite pick is players who excel at Magic: The Gathering.
One exception found here is you don’t want FPS winners. It’s not a strike against you, but it’s not an argument in your favor either.
I’ve Got The Magic In Me
Tranquil Domain put out quite the video about my old days on the Pro Tour, including clips from a new interview I did. I had a lot of fun watching it back, even if it gets a bit excited at times.
One correction or clarification that I was explicit about in the interview, and that I simply cannot get to stick, is that the designer of the main deck of The Solution was my British teammate, John Ormerod, including the inclusion of Crimson Acolyte. That was a brilliant move that I doubt anyone else would have found, and his proposal for the main deck did not change throughout testing.
I did build the sideboard, including searching every card and finding Pure Reflection.
I love talking about cEDH (competitive commander) but wow the timing situation is rather out of control. Sam Black thinks the problem is that the format is being warped by Rhystic Study, and it needs badly to be banned.
Sam Black: Designed a deck to maximize my chances of winning in untimed rounds in top 16. Unfortunately, top 16 isn’t untimed—it’s 3 hours, then the highest remaining seat wins. I reached an unassailable position, but ran out of time. I was too focused on my game rather than the tournament-
I made the deal with player 2 that I wouldn’t win before my next turn. I should have offered to renegotiate our deal such that the outcome of a Gamble would adjust our deal in a way that would allow one or the other of us to win instead of seat 1.
I am aware that this sounds very toxic. It is. Half of the top 16 games went to time. Rhystic Study has incentivized building decks that generate game states that can’t naturally resolve in 3 hours. A lot of king making style and games are unavoidable, but—
The specific situation where games regularly can’t end is not a necessary function of the format, it is because Rhystic is legal. Again, this is not to say that the problems with cEDH don’t exist, I get it, I don’t need to hear about it in the replies, but Rhystic is a PROBLEM.
I promise the games it leads to can take this long if actions are happening as fast as they do at the PT.
What is going on? Oh, it’s amazing. In theory. Not so much in practice.
There’s an increasingly popular strategy that revolves around using mulligans to tutor for Rhystic or similar advantage engine, then copying it. When playing this strategy, you want others to also have them so you feed each other and bury the remaining players…
This leads to an “end game” state that happens around turn 4 where everyone basically draws 1/2 to all of their deck, and the game becomes about navigating turns where the storm count goes above 50–
The innovation was to include Kozilek to loop my deck, but that’s not enough by itself, because every meaningful play just gets buried forever on the stack, so I also played artists talent and sudden substitution so that I could effectively give the shuffle split second
The idea is that after every other player uses their 20 0-1 mana counterspells, I get mine back to eventually force something through. This takes an insanely long time, but if you don’t have a Rhystic while other players have multiple Rhystics and never pay taxes…
Sam Black: cEDH needs a Rhystic ban more than any format has needed a ban since Urza’s block.
cEDH is popular because commander is popular, and so are winning, playing competitively, and doing powerful things in Magic. cEDH is popular because competitive players who play EDH often slowly drift toward that direction until they decide to go all the way. Having a different banned list than commander would crush the format’s growth and popularity—
It would stop being the logical extension of a thing everyone plays and become an entirely separate format, so fewer people would move to it from EDH. Therefore, I very strongly don’t want it to have its own banned list because I like that it’s popular enough to have big events-
Despite that, I think the Rhystic situation is bad enough that if it were up to me, I’d consider banning it in cEDH even if it’s not banned in commander. I can’t overstate what that says about how big the problem is. I *really* wouldn’t want to do that.
Also, yes, it’s crazy that a single card in a 100 card deck in a 4 player game can be that much of a problem
Sam Black: The issue is that Rhystic earns top marks in every metric—power level, ubiquity, play pattern, and tournament logistics. It sucks when a card is too strong and it makes games end very early, but it’s less of a disaster than a card’s legality adding 5+ hours to a tournament
As Sam also adds, the tournament he is reporting from was well-run and was fun and a unique experience. But that doesn’t mean you would want to do it again.
The core problem is that if enough people are playing Rhystic Study this way, either you follow suit and play mostly for the (multi-directional) mirror, or you lose, period.
The correct answer is that Rhystic Study is a rather silly card in a four player game even when it is not competitive, and regular Commander should respect the situation and ban the card even if it wouldn’t otherwise rise to that level. cEDH is willing to roll with the Commander banned list even in places where it is silly in a competitive context, in both directions, so you should be willing to let them have this one.
If regular Commander does not do that, then yes, I think banning the one card in cEDH is totally fine. If you’re throwing that card into a normal deck it’s no skin off your back to swap something else in. If you’re building around it, well, stop it.
I don’t think ‘fix the tournament design’ is a reasonable answer here. You shouldn’t have a strategy that dominates except that it makes tournaments take 5 more hours, even if that were to make it fair again. There are infinite cards. You can go without one of them.
Waymo reveals it has 70 remote workers managing over 3,000 automated vehicles on over 400,000 rides per week, or ~2% of total ride time. That’s still a relevant ‘last mile’ but is down to a small portion of total costs. If I was Waymo I would move those 70 jobs to America. You’d be paying a premium for mostly the same work, but the PR benefits would be orders of magnitude larger than the cost.
Washington DC is stalling against Waymo by claiming to be waiting for a pointless report due in 2022 that would take less than a month for one person to write.
The public comments are out for blood in Portland, full of claims of safety and worry about Uber jobs. Public comment meetings made sense back before we had telecommunications, now by all reports they are hellholes.
Bloomberg’s David Zipper argued “We Still Don’t Know if Robotaxis Are Safer Than Human Drivers.” That’s completely bonkers, so Kelsey Piper got out a big hammer to explain that yes, Waymos are staggeringly safer than human drivers, as in they have a 80%-90% lower risk of serious crashes even when among human drivers.
Kelsey Piper: In many places, Zipper’s piece relied entirely on equivocation between “robotaxis” — that is, any self-driving car — and Waymos. Obviously, not all autonomous vehicle startups are doing a good job. Most of them have nowhere near the mileage on the road to say confidently how well they work.
But fortunately, no city official has to decide whether to allow “robotaxis” in full generality. Instead, the decision cities actually have to make is whether to allow or disallow Waymo, in particular.
Zipper says we don’t have the ‘raw data’ on Waymos, but we very much do, and Kelsey goes on to identify various statistical errors or slights of hand, depending on your perspective. This is all very settled.
And yet we have various politicians, here DC mayoral candidate Janeese Lewis George, opposing Waymo due to supposed safety concerns. Within reason I think mayoral races where this is in doubt justify one-issue voting.
Mike P: I don’t mean to say this in a way that discredits what they’ve done, but ngl, this stuff isn’t even surprising to me anymore like ya, makes total sense. I went from Philly to Raleigh NC to Tennessee and back to Philly and the only thing I had to do was re park the car at 2 charging stops when the car parked in the wrong place.
Tesla did the thing
There’s still a difference between full self-driving (FSD) that can take you across the country, and the point when you can sleep while it drives.
A Waymo moving 17mphhits the breaks instantly upon seeing a child step in front of it from a blind spot, hits the child at 6mph and dialed 911. If a human had been driving, the child would likely have been struck at 14mph and be dead.
What did some headlines call this, of course?
TechCrunch: Waymo robotaxi hits a child near an elementary school in Santa Monica
Samuel Hammond: A more accurate headline would be “Waymo saves child’s life thanks to superhuman reaction time”
This was another good time to notice that almost all the AI Safety people are strongly in favor of Waymo and self-driving cars.
Rob Miles: Seems worthwhile for people to hear AI Safety people saying: No, self driving cars are not the problem, they have the potential to be much safer than human drivers, and in this instance it seems like a human driver would have done a much worse job than the robot
Senator Ed Markey is going after Waymo on the basis that they employ people in The Philippines ‘8000 miles away’ to ‘guide’ the cars, as if that is dangerous. Such opposition is what is dangerous, but also this seems like an unforced error by Waymo. The money you save is not worth the talking point, and Waymo should bring those jobs back to America.
Humans don’t typically want to drive the Ubers for long, or as a full time job. Weekend morning shifts pay decently, here about $100 in Texas, and weekday evenings $35, but pay at other times is worse. And reports I’ve seen are that turnover is extremely high.
This is not how our government works, but Trump would to just declare things, so he’s trying to threaten NIH or other funding to force the universities to do what he wants, even when what he wants has been ruled illegal by courts and doesn’t actually have a working legal definition or plan to deal with the existing court rulings. He just thinks he can say ‘implement these things or else I will cut your funding, even though the courts probably think that is illegal, I don’t care,’ and sit back.
Kyle Saunders: And here’s the thing Heitner caught that deserves more attention than it’s getting. Section 4(b) of the order conditions the NCAA’s rulemaking mandate on actions taken “to the extent permitted by law and applicable court orders.”
The order contains its own limiting principle. It knows it can’t override the courts. It says so, in its own text, and then directs the NCAA to do things that courts have already ruled are antitrust violations.
The good news is that there seems to be momentum behind passing something, and everyone smiled about the order. The bad news is that all of that is meaningless.
How did we end up with a legal system where there is no punishment for repeatedly issuing orders that you yourself know are illegal, other than ending enforcement of those illegal orders after someone sues, thus allowing this to be used as leverage?
Shrug.
What should the NCAA committeehave done with Miami OH, which had a perfect regular season despite facing hard opponents or being, according to gamblers and those with eyes, not all that good at playing basketball?
I agree with Seth Burn, my good friend and inventor of WAB, that this illustrates that you are better off if you pick a metric in advance and then stick by it. If you do not like WAB, you need to offer something better. And you definitely shouldn’t care about who is injured or is coming back from injury. Your record is your record.
The counterargument is that any fixed system can be gamed. You want to care about margin of victory, but be able to penalize teams that run up the score, or do various ugly things. You want room to do things that are cool and fun for fans on the margin, or very good for business, even if they’re not technically correct.
In particular, you want to be able to care about who is actually good and what the gamblers think, but for various reasons you can’t actually bake that into explicit criteria. So I am sympathetic there. It would actually be cool to have the final 2-4 bubble slots in the play-in games be ‘the gamblers picks’ now that gambling is legal, and do it via conditional game odds against whoever their opponent would be. So much fun. But I presume we can’t have nice things.
The bottom line is simple. Does anyone in MLB or NFL or NBA or NHL think it’s unfair when the ‘worse’ team gets in because they won more games? No, of course not, they know the rule. That’s sports. So have a rule, and then follow that rule.
In any case, so long as you avoid egregious errors, it’s a small mistake to let a team in ‘on the bubble’ over another bubble team, because they’re not that likely to win.
NJ Transit is going to jack up ticket prices to $100 during the FIFA World Cup. Not prices for the matches. Prices for the trains returning to New York City, up from the usual $12.90. If one must do demand destruction on public transit, best to do it by price, but one would hope there are ways to avoid falling short on supply.
Michael Mulvihill: Total viewing of the first week of the MLB season on national partners was up +54% over last season.
Includes FOX/NBC/NFLX/TBS/FS1 this year, FOX/ESPN/FS1 last year.
Umpire shaming is fun for the whole family and I agree with Derek Thompson that more leagues should do it. I especially love the ‘tap the helmet and start walking to first base’ move.
Unbiased Mets Fan: If you’re a fan of this, there’s genuinely no argument against just fully automating the strike zone.
I actually think this could be a great compromise. In most situations, you still get the human umpire and their ability to make different games unique, adjust the strike zone to the count, slightly favor the home team and create a level of randomness and uncertainty, and reward pitch framing, all of which is part of what makes baseball interesting. But, when the error is sufficiently egregious or high enough stakes, we can get it corrected, and being able to know the strike zone and when to challenge becomes another skill and discussion to keep things interesting.
Ultimately, yes, robot umps now, but I am conceptually enjoying this middle zone. Alas, I’m not actually enjoying it, in the sense that I am not watching any baseball, because I don’t have that kind of time.
How should we fix the NBA draft and associated tanking? Nate Silver has a self-described radical (April Fools Day) proposal. Obvious first revision is to break ties for max bid players via whoever has least recently won a max bid player, rather than having teams draw lots, obvious second revision is remove the hoarding tax and also remove to max bid, third is to revise the penalty for missing playoffs repeatedly to be a cap on total gains.
As usual, the actual league is considering something simpler and easier that they can feel they came up with, in this case getting rid of rewards for records entirely and just starting everyone with 100 draft credits each year. That’s obviously second best in various ways but actually might be flat out better than Nate’s proposal. Why reward losing and punish winning at all? If you do a better job you get to win more.
There is also a compromise between these two proposals. Teams still want to make the playoffs if they can do so, so all you have to do is give all teams that don’t make it to a play-in game 100 points, and then assign teams beyond that fewer points. That still gives you some amount of self-balancing, and if you didn’t drop that number off too much too fast then presumably no one would tank.
How sports team math works:
Joe Pompliano: Wrexham has filed its 2025 financial report:
• $44 million in revenue
• 58% of income comes from North America
• $20 million in losses last year
Reynolds & McElhenney paid $2M for the club in 2021 but recently sold shares at a $475M valuation. [Read more here.]
One can model this as ‘people want to be associated with successful and popular futbol franchises, sufficiently so that they can afford to always lose money.’
Zac Hill and Matthew Yglesias point out the problem goes beyond tanking, even if you love basketball.
Zac Hill: The NBA is confronting three existential problems at once:
Much of the regular season doesn’t matter and is a bad product.
Playoffs are a good product, but playoff performance is a function of injury.
Minimizing injury has an inverse relationship with product quality.
Conor Sen: On some level I think the NBA’s problems stem from the league getting overpaid for media rights because streamers were desperate for content and agreed to uneconomic deals, which made the owners rich and reduced the financial pressure on them to improve the product.
Major League Baseball, on the other hand, has had relatively more stagnant franchise valuations, was losing younger viewers, and had looming media rights deal concerns, so it felt more pressure to actually improve the game, which it successfully has done.
Derek Thompson is even more brutal. If you wanted to make a meaningless regular season, could you do better than the NBA?
Derek Thompson: I think one way to get at the NBA’s problem is to start w the question: What would it look like for a professional sport’s regular season to be the equivalent of a pre-season exhibition period—that is, something that genuinely, truly does not matter at all?
1. For starters, seeding wouldn’t matter … bc home court advantage would barely exist, in which case the best teams could win the championship as an 8th seed just as easily as they could win as the 1st seed.
2. The playoff series would be long enough that (a) the best teams had ample opportunity to prove their superiority [unlike in March Madness, or the NFL playoffs, where 20 bad minutes can end the best team’s season] and (b) you’re giving casual fans a LOT of basketball to watch so they don’t feel bad about skipping most of the regular season.
3. Also, you’d let the vast majority of the teams make the playoffs — maybe by adding a “play-in” that extends potential playoff qualification to, like, 2/3rds of the league.
4. You’d have several teams that recognize (and practically celebrate!) the futility of the regular season by spending much of this period *actively and flagrantly trying to lose* bc the draft is so much more valuable than the outcome of any particular week, or month, of regular-season competition. In fact, you’d have fans actively rooting for about 1/3rd of the league to throw away most of the regular season bc they only really care about getting a high draft pick.
5. Finally, you’d have a sport where it was basically impossible to win a championship without a top 10 (or, really, top 5!?) player, in which case many franchises are rationally fixated on throwing away regular seasons to maximize their chance to draft or trade for a top 10 guy.
… okay, I think you get my point :)
I love listening to basketball podcasts in the autumn and winter, and I love watching playoff basketball in the spring. But I think there are very deep structural reasons why the NBA regular season, for many casual fans, feels like a prolonged preview of an actual sport that begins in April.
My point about the value of seeding is not that “this year’s 8th seed could win the championship.” I’m saying “If the Thunder, Spurs, and Nuggets entered the playoffs as the 8th, 7th, and 6th seeds, everybody would still pick one of them to make the conference finals”
Everybody who wants to watch Wizards-Bucks on the second Tuesday in November should do that! Nobody is trying to get them to stop doing that!
I’m interested in getting to the bottom of why the NBA might have a unique problem with getting *ITS OWN FRANCHISES* to care about the regular season.
The NBA has much work to do, and not only in the draft and the three point line.
The playoffs are good times, but at the expense of the regular season. You need to raise the stakes of the regular season, by increasing the value of seeding, at minimum.
Watcher.Guru: JUST IN: $760,000,000 worth of oil shorts were reportedly placed 20 minutes before President Trump announced the Strait of Hormuz was open.
Scott Alexander is correct that Orban Was Bad, Even Though We Don’t Have A Perfect Word For His Badness. There is a continuum between free and fair democracy and full dictatorship. He was not maximally a ‘strongman,’ ‘autocrat’ or ‘dictator,’ and but neither was he entirely not those things. Him losing an election and giving up power does not retroactively make him not at all those things, and does not mean there was no serious threat to democracy there. It certainly does not mean there is no serious threat to it elsewhere and I cannot read such claims as unmotivated.
TSA lines got completely out of control for a while. Here is a video for one in Houston, where the wait was several hours at least. The problem is that there are the same number of flights and less throughput, which means the line keeps getting longer until you get substantial demand destruction. That doesn’t happen until the wait is hours long, or you otherwise make the experience sufficiently worse to drive people away, such as putting ICE agents around the airport.
The FCC made all routers covered devices, so if you want to use a foreign one you now need to go through a ‘conditional approval’ process to show a given one does not pose unacceptable risk. Luckily this does not retroactively ban existing routers or approved models.
This one isn’t even on Mamdani, it predates him but presumably is about to get worse: New York City spending per homeless person rose in six years from $28k, already quite a lot, to $81k, which is patently absurd. That is vastly more than would be required to nicely house and pay all the living expenses of such folks, and it ignores secondary costs.
I try not to opine on such matters, but there are obvious implications here.
Elizabeth Barcohana: SCOTUS poised to prohibit California & 13 other states from counting ballots received after election day. This would be MAJOR for election integrity in California, where currently mail ballots received up to 7 days after Election Day can be counted.
This is important because the law requires that the ballot be postmarked by Election Day, even if it was received afterward.
However, according to @CASOSVote CA Secretary of State, mail ballots from Sacramento, Richmond, Santa Clarita, Los Angeles, and San Diego may not have been correctly postmarked in the Prop 50 election and similar irregularities regularly happen in other elections.
The obvious problem is, who controls the US Postal Service? What happens if they are directed to not deliver the ballots until after election day, or otherwise prevented from doing so? Alas, this no longer seems like such an outlandish question to be asking. And there is no obvious limit to how long they could stall this process out for.
John Lettieri: There’s a standard exception for agreements made by business owners in connection to the sale of their businesses or disposition of ownership interest.
One reason it is so easy is that this is already a money pit that has gone absurdly badly, even at the planning stage, my lord:
Michelle Tandler: How on earth can it cost $30 million and three years for the NYC government to build a grocery store in Harlem? There are *dozens* of grocery stores for sale, all over New York City, for less than $5M. Many are less than $1M. This seems like a scam. (or vanity project)
A quick Claude analysis says that $30 million is the upper range of costs to open a full 15k+ square foot supermarket in a central location. This proposal is for 9,000 square feet in Harlem, on free land, which Claude estimates ‘should’ cost $6 million at most.
He’s saying $70 million for five stores and spending $30 million of it on the first one even if it all goes as planned.
Also, there are already five grocery stores within a two block radius (!) of the new store, so what the hell is even the point if this isn’t graft?
Claude: So there you go. Within roughly a 5-minute walk of the proposed La Marqueta site at 116th & Park, you’ve got:
City Fresh Market — 24-hour full supermarket, literally one block east on 116th
Shop Fair of 116th — full supermarket, two blocks east
Fine Fare on Madison — full supermarket, one block west
Foodtown of East Harlem — full supermarket, 4.5 stars, three blocks NE on 3rd Ave
Cherry Valley Marketplace — full supermarket, three blocks SE
Fine Fare on 1st Ave — full supermarket, five blocks east
Shop Fair of 110th — full supermarket, six blocks south
Midtown Meat Market — butcher, same block
And that’s not counting bodegas, the La Marqueta vendors themselves, or the Food Bazaar on 125th. This is one of the most grocery-dense areas in all of East Harlem.
For those who don’t realize, nuclear brinksmanship is incredibly dangerous, and the fact that it ‘usually works’ does not change that, with at most notably rare exceptions, you shouldn’t do it. ‘The world did not end when I did this thing with a small chance of ending the world’ does not mean you weren’t acting crazy.
Levels of Friction
Yes, when you put it that way it sounds dumb.
Riley Brown: No sorry you can’t invest in OpenAI, you’re not an accredited investor. But you can bet all your money on what the OpenAI valuation will be on Polymarket. It’s safer this way for you.
The two regimes are inconsistent in theory, but in practice this might be reasonable.
If you allow direct investment in OpenAI and other similar companies, this will seem like a normal investment thing one might do, and will be pitched as such. Investors will put millions of dollars into this.
Whereas if you bet on the OpenAI valuation at Polymarket, you know you are doing something inadvisable, risky and weird, and the liquidity is not so great. It is a bit of fun, a gamble. Yes, you can blow yourself up here if you want to, but you gotta want to.
Jones Act Watch
The Jones Act got selective waivers due to the Iran war.
Unfortunately the combination of ‘temporary and could be reversed any time’ with ‘only for narrow purposes’ means you can’t invest in logistical changes and thus the wins will be relatively minimal, which is why Balsa Research isn’t harping on it. But yes, every little bit helps, and there were good anecdotes even before we looked to expand this to up to 90 days.
Colin Grabow: The Gulf Coast supplying fuel to Alaska simply does not happen absent the Jones Act waiver. Nice to see new domestic supply chains materialize.
Last year, Anchorage received 38 tanker arrivals. Only two were JA-compliant — the barge QAMUN — with the remainder accounted for by South Korea (34) and China (1). The high cost of JA transport helps explain the reliance on imports rather than WA state refineries.
Some peoplewere very not happy about that. They were ‘deeply concerned’ that the waiver would be ‘abused’ by foreign shipping companies to take items from one port and then, at lower cost, ship them to another port. The horrors.
They promise to be ‘watching closely.’ Please, all of us, watch closely.
Jacqui Heinrich: The American Maritime Partnership is swiping at the White House over the 60-day Jones Act waiver, saying the maximum impact to gas prices is less than 1 cent per gallon:
Peter Meijer: It’s true that a 60-day Jones Act waiver will only lower prices by $.01/gallon.
Which is why @POTUS should go all the way and support a full Jones Act repeal to achieve the full benefits of up to $.10/gallon!
This would be one of many benefits.
Colin Grabow: Waiver less than 24 hours old and foreign ships are already being chartered. Each of these voyages represents a cost savings. If cheaper Jones Act-compliant alternatives were available, they would have been used. Shows the existence of demand the JA fleet couldn’t meet.
That was before we even had an official announcement that waivers were available, and it included NYC-Hawaii.
Gary Winslett: When the Trump admin pays ZERO political price for waiving the Jones Act, that will be very helpful in showing both parties it’s been a paper tiger this whole time.
Not only is there no substantive reason to have the Jones Act, there’s no *political* reason to have it either.
Indeed. There is a small handful of those capturing rents, who have convinced everyone else they can wield power through union solidarity. But the number of actual beneficiaries is tiny, and the number of people hurt is large, including vastly greater numbers of unions and union members. You can just repeal things.
And you know what? Trump is smart enough to notice this one, too, and is considering extending the exception indefinitely. That’a a great idea, and I have an idea that would be even better, which is to expand the exception indefinitely, too.
Scott Lincicome: Reminder: use of non-JA ships (tankers, etc) during waiver periods is clear evidence that the law blocks US commerce – trade among Americans! – that would otherwise occur, restricting supply & raising prices.
AND the gains would be even bigger if the law were permanently nixed.
Scott Lincicome: “Since [the Jones Act suspension], 40 tankers have been able to deliver oil between US ports from California to Texas to Florida and Alaska, increasing the de facto fleet by 70% and helping to reduce costs as a result, according to data provided to Axios by the White House.”
That’s what you get with no notice, and no ability to make long term plans. It would be a much bigger effect, across all eligible shipping, if the waivers were permanent.
We are now looking at an up to 90 day extension. At some point, one wonders if it starts to have legs.
The American shipbuilders are mad, and urgently call for ‘preservation of the US shipbuilding supplier ecosystem,’ to protect our sovereign manufacturing capabilities (which largely no longer exist), because you see the United States ‘relies heavily on free-market forces’ via banning everyone else from its markets, whereas other nations ‘protect sovereign manufacturing capabilities as a matter of strategy.’
Why do LNG ships cost four times as much to build in America as they do in Asia, if you are lucky enough to get the American one on time and on budget? One problem is that steel is twice as expensive, mostly due to tariffs, although that obviously can’t explain most of it.
Laptop stands are substantial upgrades for many people, in terms of working in a cafe comfortably, as is bringing a full keyboard and mouse. My recommendation continues to be a full desktop, whenever possible, with at least two large monitors.
Variously Effective Altruism
NPR gets two gifts totaling $113 million. There is weird decision theory, where you don’t want to reward the government no longer paying for NPR by spending your own money paying for it, but actually you should anyway if you value NPR because the people cutting it actively want NPR to die, similarly to PEPFAR and USAID.
I think this is a no good, very bad position and if you have it there is a very serious problem with your philosophy, decision theory and approach to life.
I also think this goes hand in hand with Tyler Cowen claiming that our intervention in Venezuela was good as well so long as in this particular case people in Venezuela were better off, despite how much our actions broke norms and set terrible precedents.
Indeed, I would say we are already seeing, on a vastly larger scale, the consequences of what happened in Venezuela becoming a precedent and setting expectations. It’s ugly, even if you are foolishly a Causal Decision Theorist and can only look at forward consequences. Localized Act Utilitarianism is foolish.
I can totally understand arguing that it is a mistake for wealthy people to give away half of their money. You can only give away so much money efficiently before you face decreasing marginal returns, and it requires large investments of time to do well, and also potentially makes you a target. If you feel you must give away [$X] no matter what you lose your quality controls. The apparatus risks capture by ideologues or thieves and has poor legal structures that don’t protect you. Investing your money in for profit enterprises that do pro social things is arguably a far better way to deploy tens or hundreds of billions, or perhaps you should be doing out of the box things like buying and releasing major medical patents.
I have many disagreements with Elon Musk, but his plan of ‘I want to use my money to build great companies and make everyone wealthier and go to Mars’ is totally valid. My differences are more about whether projects like xAI in particular are wise ideas, or in his non-business actions.
I have not signed such a pledge, and do not plan to give half my money away. It’s fine.
However, if you make a pledge to give away money, then you should honor that pledge.
And when people break pledges, not because they need to but because they choose to, after benefiting from the halo of that pledge, you do not say ‘good’ because that is a terrible, no good thing to do. If you either do this, or you praise this, then you bring shame upon yourself and upon your house, and you weaken the very foundations of our society.
This is indeed exactly the kind of logic that leads to Sam Bankman-Fried, the argument that if you can ‘do good’ by breaking your promises then you should do so.
Zany: –WHAT IS BEING A MARINE, SOLDIER?
–SIR IT MEANS WHEN YOU ASK ME TO JUMP I ASK HOW HIGH SIR!
–GOOD. AND YOU, SOLDIER! WHAT ARE YOU?
–SIR I AM A UTILITARIAN SIR!
–AND WHAT DOES THAT MEAN?
–SIR YOU ASK ME IF IT’S WRONG TO TORTURE BABIES FOR FUN AND I ASK FOR HOW MUCH FUN SIR
There is a case for asking that question, and there is a case for not asking it.
I love that approach where it makes sense, but it has nothing to do with whether one should honor pledges.
Also, you could honor your pledge that way in a simple fashion: You do that, and then use any leftover funds in other ways that scale, in the worst case you Give People Money, or donate it to the Treasury.
If you’re screaming ‘you can do better?’ Good. Do better.
What about ‘you should fund rationalist or EA things’?
Samuel Hammond: The unique thing about EA / rationalist philanthropy is that, while it has its traditional “cause areas,” it is broadly steerable by better arguments.
That is, if someone marshalled dispositive evidence that we’re headed for an AI winter or that the technical alignment problem wasn’t hard or that xyz funding strategy created more costs than benefits or that shrimp are p-zombies, EA and Rationalist funders would turn on a dime and fund something else. You can’t say that about any other big philanthropic source.
The problem is that those things take a lot of investment to understand, if you want to differentiate the good from the bad, unless you find an agent you can trust at this scale, and also the room for funding most options rapidly becomes an issue when you’re talking about 11 or more figures.
That said, if someone is looking to move 11-13 figures to do good, yes, I would be willing to help out for a remarkably small percentage of that, especially if we are willing to not let the perfect be the enemy of the good and I get to use a portion of it for some aspect of AI notkilleveryonism related causes, but I’m happy to confine that to technical research if necessary.
Elon Musk claims Starlink has done more to lift people out of poverty than any NGO so I thought for fun I’d have GPT-5.4 do some comparisons. ChatGPT doubts Starlink even breaks the top 300 corporations in terms of poverty reduction (its top picks? Foxconn, Apple, Walmart and Samsung.) Starlink really is not that special on this one, but also capitalism is what solves this, and only maybe 10-30 NGOs have done more than Starlink. That seems plausible to me.
Scott Alexander recently argued against the concept of telescopic altruism, in ways I found highly unconvincing. Ben Hoffman points out that to succeed, telescopes need good lenses, and offers us a lot of intuition pumps for what is typically wrong with those who purport to focus on helping distant others, and some of the reasons why by default it makes sense to first put your own local house in order, and some of the failure modes that happen if you don’t do that. You will probably disagree with some of his examples, as do I, but do not let this distract from the clear larger message.
Copious Free Time
If you’re single and childless and you say work means you have no time for socializing, then either you’re approaching 80 hours, you have a serious health problem, some massive other mandatory time sink weighing on you, or you’ve made a choice.
This is one way to realize that you have a serious health problem, in which case your focus should be on fixing that.
Allie: Every time I say “Make friends! Have hobbies!” I get comments like this from (usually childless) people who SWEAR it’s impossible to use any of their free time to have fun or see people
I’m sorry but this is exaggeration, an excuse, or just reallyyy bad time management
Stephanie Cardozo: I would also add to this conversation that even “healthy” people aren’t actually healthy. Chronic fatigue and low energy is a serious issue among the US population. It’s fueled by poor lifestyle and really bad medical systems.
The Lighter Side
If you ever make a mistake, join me in remembering this mistake, and then feel better.
Dan Elton: Many people were claiming this @NYTimes error was fake, and showing a version where it was correct. Actually, the error is real and the correct version floating around is fake! This is not the only time the NY Times has made egregious errors – many during pandemic for instance
Marginal Garfield is about as good as one could hope it to be. Rationalist Garfield is a tier below where one could hope it to be, which is still good enough that my kids laugh and I can use it to try and explain concepts.
Coors.: if Napoleon wasn’t real and you wrote a story about him people would call it unrealistic ratfic mary sue shit
allpsu: “Then he went to Egypt and discovered the Rosetta Stone” GTFO there’s no way
Joseph Hex: “After defeating the combined armies of Europe I wrote a code of law that would be used for the next several centuries. I also banged a different beautiful woman every night.”
Coors.: “i got rid of really annoying measures and made everyone use the ones i liked and im so smart and even though they bullied me in school they love me now but i hate them” ok buddy nice power fantasy
worldly bee: oh shit I have to give my overpowered Mary Sue a flaw… let’s make him “short” by modern standards but he’s actually average for his time period
Austen Allred: What’s lost in the debate about AI jobs is that when coding is free we will have jobs that we can’t even imagine right now. 50 years ago nobody would understand you if you told them your job was to turn the Flexport 404 error page into a game of Tetris
A few months later, we’ve got more than just claims. We’ve got numbers. And they say something unusual is happening in the market.
Coding agents are the first AI product people are paying for at volume and regularly. Because it directly speeds up their work. It’s too early to claim businesses are replacing whole processes with agents across the board. But compute demand has started growing faster than anyone can build it out.
Here’s why this moment is different, why nobody’s ready, and what I took from it personally.
The Numbers
OpenAI and Anthropic might go for an IPO soon. That’s why they’re eagerly posting how fast their revenue is growing.
And it’s a ton of money.
Anthropic is up 3x since the start of the year. And they’re already a big company. This is impressive, because the bigger you are, the harder it is to keep growing at the same pace.
*OpenAI on the left, Anthropic on the right.*
Even during past boom moments, nobody hit numbers like these (with a caveat, see below). Zoom during the pandemic, Google at IPO, Coinbase cashing in on commissions during the crypto hype. These are companies 5-10x smaller than Anthropic, in special situations, and they still grew slower!
*The best growth years for big companies. Only ones that were already large. Revenue measured at start vs end of year.*
The caveat. First, vaccine makers during the pandemic were also up there. Second, Anthropic’s numbers are a projection for the rest of the year based on early data. And they count things a bit differently than OpenAI. None of that changes my conclusion, which is..
Cash is a solid tell for real demand for agentic systems.
Last year when a bunch of people suddenly figured out ChatGPT could generate cool images, that didn’t translate into serious money.
Meanwhile, in January alone, Claude Code commits on GitHub (in publicly accessible repos) went from 2% to 4%. If that sounds small, keep in mind it’s one month, and that’s without Codex, Copilot, or Devin. By end of year Dylan Patel forecasts Claude hitting 20%+.
*Claude commits on GitHub.*
Even if a $100 subscription only automates a small slice of the work, that’s nothing compared to a developer’s salary. For a median developer at $350-500 a day, the subscription has 10-30x ROI if it handles just the simplest, most routine 10% of their work.
There’s plenty to argue with here.
Let me even lay out the weak spots in my own logic.
So their revenue is growing, fine - the labs are still unprofitable as businesses. They have every incentive to pump the hype to pull in the most risk-tolerant companies. The ones paying are early enthusiasts, not big companies. And enthusiasts come and go. Plenty of bubbles have popped exactly this way.
Agents are unstable and still randomly screw up. Who’s to blame when things go wrong? You can’t replace humans yet, because serious businesses care about reliability. And where do senior engineers come from without juniors if you stop hiring?
Agents only handle a narrow set of tasks well. Even if writing code is faster, shipping a product still gets bottlenecked by gathering requirements, architecture, review, testing, and our beloved stakeholder zoomcalls and compliance.
I decided at some point you have to commit and pick a side, even without conclusive evidence.
The finish line can be moved forever. There was a time when reasoning was completely out of reach for ML models. Same for decent image generation, or speech that didn’t sound like a robot. There was a time nobody believed machines would learn to play Go. You get the idea.
*Metaphor from Tegmark’s Life 3.0. Computers gradually learn harder and harder tasks. Over time there’s less and less they can’t do. Like water filling a map from the bottom up.*
Ilya Sutskever, back when he was still at OpenAI, often mentioned an internal meme - Feel the AGI.
He was one of the first to believe deep learning would gradually change our lives. Yes, there’s a lot we don’t know, but everything keeps moving in that direction, and that matters. Everyone gets it at their own moment. When a neural net does something you usually do yourself, manually, that’s a special feeling.
I’ve lost count of how many of those moments I’ve had in 10 years of following neural nets. So I’m not interested in the bubble-or-not debate anymore. I’m interested in watching the water level rise.
Personally, I have enough evidence that agents can now do valuable work that companies are willing to pay for.
And the thing is, demand has plenty of room to grow. Agents often don’t work out of the box. You have to adapt to them, and the fastest and most curious people do that best. Everyone else will catch up bit by bit.
And...
The Industry Isn’t Ready For This
To avoid talking about “the industry” in the abstract, let me split it into 3 layers.
People online love talking about bubbles. Turns out, all these companies are well aware bubbles happen. And to avoid going bankrupt, each one is cooking up its own workaround.
Dario Amodei says he builds the company’s plans off a pessimistic revenue scenario. Funny thing is, this year they’re already beating that by 1.5x. And only 3 months of the year have gone by. They’re beating the optimistic scenario too.
Dwarkesh asked him straight up in an interview: why? Dario genuinely believes in massive future upside from AI. He writes long essays about it, pitches a country of geniuses in a datacenter. And yet he doesn’t want to bet everything on that future.
Dario says it’s risky because of a cash flow gap in the business model.
Here’s how it works. They provide neural nets to users. They pay hardware owners for inference and make money from subscriptions and APIs. In parallel, they pour money into research on the next generation model. Which won’t start making money for another year or two.
*They regularly spend more than half of revenue on research.*
You’re not just balancing income and expenses - you’re also balancing investment in future growth. If you invest big and the growth doesn’t show up, you’re in serious trouble.
Anthropic has been running in this mode for three years straight. Growing 10x every year. Dario figured 2026 would be when it ends. Because the bigger you are, the harder it gets. You are gonna slow down at some point.
What he didn’t mention in the interview, is that their margins are growing slower than forecast. Costs are growing multiple times faster than they’d planned.
Dario says he wants to push the company into profitability in a few years. To do that they need to improve margins. That means slowing growth and investing conservatively, only on the most efficient things.
The logic adds up. But slowing down isn’t really working. They look ready to 10x again this year. But the resources to support that aren’t there.
Anthropic doesn’t have enough compute for this many power users.
They rent GPUs from hyperscalers. And they can’t just walk into a datacenter and ask for more. Because the datacenter owner is also exposed to bubble risk. So capacity is booked out in advance.
For Anthropic to make $30B a year, someone had to spend $80B on infrastructure. Betting it would pay off in a few years.
Amazon will spend around $200B this year, Google $180B, Meta $125B, Microsoft $105B. That’s a setup for trillions in economic value in the coming years.
And a cash flow gap risk if the value doesn’t materialize.
The industry is one long value chain. Everyone in it tries to lower their own risk by locking expectations into contracts. Which reduces the whole chain’s ability to react to surprises. Like the sudden arrival of coding agents.
So every year labs hit some new bottleneck. And constraints keep sliding further upstream, toward players further from the end user. Because their risks are higher and their contracts are even less flexible.
A New Bottleneck Every Year
In 2023 everyone was chasing GPUs. More specifically, TSMC factories didn’t have enough capacity for the final chip-to-module assembly (CoWoS). In 2024 came the HBM memory shortage for those same modules. In 2025 GPUs got better, but datacenter buildout became limited by power supply. In 2026 it turned out even when you have the power, the US grid can’t deliver it to datacenters at the volume needed.
1 - Memory
Modern models need more memory than before. I mentioned earlier that companies spend hundreds of billions a year on infrastructure. Roughly 30% of that goes to memory.
And they have to buy expensive HBM instead of cheap DDR. Because high bandwidth reduces GPU idle time while memory processes its part.
*Turns out memory is the most expensive thing in a GPU.*
Memory prices are probably going to keep rising unless someone figures out how to work around it. They could easily go up another 2-3x, because SK Hynix and Samsung control 90% of the market. And memory demand is only growing.
But they eat power like a small city. And when such a thing suddenly shows up in some region within six months, the electricity grid just can’t handle that.
Surprisingly, Dylan Patel isn’t that worried about energy. New power plants, transformer stations, and plain old transmission towers take a long time to build. But while the grid catches up to the new load, you can power datacenters off industrial gas turbines. Literally roll up to the datacenter with a dozen trailers full of generators and you’re good (tho people start to worry about that being far from clean energy).
There are also piston engines, solar with batteries, hydrogen reactors, marine ship engines... Basically, every trick the fuel industry has invented in its entire history. Together with more efficient grid usage, that can add up to hundreds of gigawatts.
*Right now GPUs alone consume 13GW. Add the rest of the datacenter and you can multiply by 2.*
The blocker for building datacenters and reactors fast is a shortage of skilled labor, especially electricians.
So, expensive and labor-intensive. But turns out it’s still easier than the semiconductor supply chain.
3 - Semiconductors
There are factories (mostly TSMC) that assemble GPUs of a specific era (based on designs from Nvidia or Google). For example, on the 3-nanometer process.
And there just aren’t enough factories built.
This can’t be fixed quickly because these are some of the most complex industrial facilities on the planet. Building one takes 2-3 years and a pile of specialized equipment and chemistry.
The hardest piece is the lithography machines (EUV scanners). They’re needed to etch chips onto wafers. The wafers then get paired with memory into modules, and that’s how you get a GPU.
These machines cost ~$350M each. Only one company from the Netherlands makes them - ASML. Around 50 machines a year.
*The machine.*
By a rough estimate, by 2030 there will be around 700 of them worldwide. That’s on the order of 200 gigawatts of compute. And at the end of 2025 we were using ~27 gigawatts. Note that that’s before the agent hype of early 2026.
So there’s room to grow, but the shortage will be permanent - bottlenecked by factory construction, wafers, and lithography machines.
These are the kinds of constraints you can’t just throw money at, unlike memory and datacenter energy.
You can see it clearly in Google’s behavior.
They have their own chip designs. And they still buy a quarter of their capacity from Nvidia. They’d love to make their own, they just can’t.
*The share is dropping, but it’s still a lot, considering their own chips are better!*
All chips are assembled at TSMC factories to someone else’s designs. And Google and Amazon (who also have their own designs) slept through the moment when Jensen Huang locked in contracts for 70% of 3-nm capacity. That’s great for TSMC - they’re at the end of the production chain and need stability.
Nvidia is also living the dream, selling cards at 6x production cost.
And Google even sold its own capacity to Anthropic through GCP. What a company.
So What?
So, the industry isn’t ready for the agent boom.
Because it came on too suddenly. To a market where what ultimately matters is long-term contracts on complex chip-making infrastructure.
Anthropic right now has 2.5 gigawatts of compute, and by the end of the year they need 5-6. The only way to get that much is the “Other” category. CoreWeave, Bedrock, Vertex, Foundry. Scraps from anyone whose capacity is still available, at premium prices.
And they want to become a profitable company, so they can’t afford to burn cash.
Hence the bad news.
The ones who’ll probably suffer are us.
The most obvious move is for them to just cut limits and raise prices.
The other week they moved OpenClaw onto the API. And they said so in a nice and honest way. Sorry guys. We’re tightening belts, here’s $20 as an apology for the inconvenience.
They also rolled out different tiers depending on time of day. I’ve already run into it a couple of times, when Claude just ran out of capacity. During “off-peak” hours, under pressure from people optimizing for discounted tokens.
*Denied.*
I pulled two takeaways from this for myself.
1 - Don’t put all your eggs in one basket.
For example, when building a skill, make it work on any model. I’m obsessed with Claude, but OpenAI and Google are in way better shape on compute access.
So I’ve learned to swap models depending on the task. I pay the minimum subscription to every lab. And when the limit runs out, I just switch models.
I’m not using Chinese open-source. Don’t use Deepseek, for the love of god.
2 - Get anxious about not making money off AI.
Neural nets aren’t a way for me to make more money. They’re on my expense sheet, and they pay for themselves by giving me more options and more time.
But if they roll out some $1000 tier, I won’t be able to pull that off. Right now that sounds absurd. But remember the example with a real person’s salary. As long as $1000 of spend brings in $5000 of profit, you’re winning.
And whoever can’t pull that off will be stuck on the free tier watching ads =/
A commonly shared piece of wisdom in the LessWrong community is to say or do the obvious things. Normally, this is treated as an unambiguously good thing to do, for example, see Nate Soares’s “Obvious advice”.
But I think there’s another genre of “obvious things” that requires more nuance: namely, the background assumptions that are so obvious to each of us or terms we hear so often they feel mundane. I think it’s uncontroversial to say that (“obviously”) what is obvious to you is often not obvious to others. The problem is that, what is obvious or not to others is itself not always obvious.
I think these obvious things often serve as an important barrier to good-faith conservation. I also think they serve to make people feel excluded from the communities I’m involved in: when speaking to people (especially younger people) who feel excluded from AI Safety, EA, or even Constellation, probably the most cited reason is that they feel either stupid for not knowing about obvious things or deliberately unwelcome because they did not agree with said obvious things. And I empathize with this; when people assume that the things I believe are so “obviously” ridiculous that no sane person would have them, I feel a deep sense of indignation that sometimes even explodes into an angry LessWrong post.
There are topics that are known to be obvious to some but controversial around in many of the Bay Area circles I frequent. Generally, because people have met others who (loudly/publically) disagree, these facts are known to be controversial. Classics include race, gender, and electoral politics. Various topics on AI Safety also fall under this category: the various Pause AI groups, SB 1047, and the usefulness of current interpretability methods come to mind.
In a sense, these topics are “easy” to handle in conversation, because people know to check for known controversies. In contrast, I think there are topics that are obvious to many people around me, that are
But I think there are controversial topics that are not obvious to many people I’ve interacted with, but are generally not known as such. The one that comes up most often is a form of implicit America-first beliefs: many people assume that it is “obvious” that we should support US chip export controls (or even to encourage the US labs race ahead) in order to make sure the US “wins the AI race” against China, while in fact many people do not share this America-first belief (even people from other western countries, especially in the last 2-3 years). Another such topic is whether or not it’s good to work for AI labs under common-sense ethics or deontological views (though recently, this specific debate has become more public). And there’s also the perennial issue of different levels of knowledge of jargon or technical knowledge
A perennially relevant XKCD.
Sometimes people state nonobvious things as obvious in order to manufacture consensus: almost everyone is susceptible to social pressure, and it is only polite to agree. Sometimes people state nonobvious things as obvious in order to skip to what they consider the “important parts”: there's a reason a classic joke in math-heavy academic fields is that if you ever get really stuck with a proof, you should say that the result is so obvious even a baby could see it. And sometimes it really is to exclude people: I’d go so far as to say that most conversations would be worse if you’d have to restate every single assumption from the start.
But most of the time, my guess is that people assume obvious things because they are obvious to them, and not for another reason. Most people I know want to have honest, good faith conversations with others, even if they disagree.
I don’t have a complete solution to this problem. If I had to come up with something, I’d say that both speakers of such obvious facts and listeners irritated by the speaker’s assumptions should try to meet each other half way: that is, there are cheap cultural norms that both parties could follow to improve communication in the presence of differing obvious assumptions.
I think the speaker in conversations has some degree of responsibility for noticing when listeners seem confused, and allowing space for questions. Another useful tool for speakers in this context is original seeing/rationalist taboo: describing the object of the conversation without the standard jargon or shorthand. I think the listener also has responsibility for noticing their own irritation or confusion, and bringing it up in a polite and curious way.
For both parties, it’s important to remember to treat conversation parties with charity. People come from vastly different cultures that make different background assumptions. Sometimes the speaker is correct about their assumptions and sometimes they are not; curiously engaging rather than assuming malintent is more likely to figure out what is true.