2026-02-21 23:34:08
Published on February 21, 2026 3:34 PM GMT
Epistemic status: I've been thinking about this for a couple months and finally wrote it down. I don't think I'm saying anything new, but I think it's worth repeating loudly. My sample is skewed toward AI governance fellows; I've interacted with fewer technical AI safety researchers, so my inferences are fuzzier there. I more strongly endorse this argument for the governance crowd.
I've had 1-on-1's with roughly 75 fellows across the ERA, IAPS, GovAI, LASR, and Pivotal fellowships. These are a mix of career chats, research feedback, and casual conversations. I've noticed that in some fraction of these chats, the conversation gradually veers toward high-level, gnarly questions. "How hard is alignment, actually?" "How bad is extreme power concentration, really?"
Near the end of these conversations, I usually say something like: "idk, these questions are super hard, and I struggle to make progress on them, and when I do try my hand at tackling them, I feel super cognitively exhausted, and this makes me feel bad because it feels like a lot of my research and others' research are predicated on answers to these questions."
And then I sheepishly recommend Holden's essays on minimal-trust investigations and learning by writing. And then I tell them to actually do the thing.
By "the thing," I mean something like developing a first-principles understanding of why you believe AI is dangerous, such that you could reconstruct the argument from scratch without appealing to authority. Concretely, this might look like:
I think a large fraction of researchers in AI safety/governance fellowships cannot do any of these things. Here's the archetype:
If this describes you, you are likely in the modal category. FWIW, this archetype is basically me, so I'm also projecting a bit!
I think the default trajectory of an AI safety/governance fellow is roughly: absorb the vibes, pick a project, execute, produce output. The "step back and build a first-principles understanding" phase gets skipped, and it gets skipped for predictable, structural reasons:
That said, I think a valid counterargument is: maybe the best way to build an inside view is to just do a ton of research. If you just work closely with good mentors, run experiments, hit dead ends, then the gears-level understanding will naturally emerge.
I think this view is partially true. Many researchers develop their best intuitions through the research process, not before it. And the fellowship that pressures people to produce output is probably better, on the margin, than one that produces 30 deeply confused people and zero papers. I don't want to overcorrect. The right answer is probably "more balance" rather than "eliminate paper/report output pressure."
In most research fields, it's fine to not do the thing. You can be a productive chemist without having a first-principles understanding of why chemistry matters. Chemistry is mature and paradigmatic. The algorithm for doing useful work is straightforward: figure out what's known, figure out what's not, run experiments on the unknown.
AI safety doesn't work like this. We're not just trying to advance a frontier of knowledge. We're trying to do the research with the highest chance of reducing P(doom), in a field that's still pre-paradigmatic, where the feedback loops are terrible and the basic questions remain unsettled. If you're doing alignment research and you can't articulate why you think alignment is hard, you're building on a foundation you haven't examined. You can't tell whether your project actually matters. You're optimizing for a metric you can't justify.
You can get by for a while by simply deferring to 80,000 Hours and Coefficient Giving's recommendations. But deferral has a ceiling, and the most impactful researchers are the ones who've built their own models and found the pockets of alpha.
And I worry that this problem will get worse over time. As we get closer to ASI, the pressure to race ahead with your research agenda without stepping back will only intensify. The feeling of urgency will crowd out curiosity. And the field will become increasingly brittle precisely when it most needs to be intellectually nimble.
If you don't feel deeply confused about AI risk, something is wrong. You've likely not stared into the abyss and confronted your assumptions. The good news is that there are concrete things you can do. The bad news is that none of them are easy. They all require intense cognitive effort and time.
For fellowship directors and research managers, I'd suggest making space for this.[1] One thing that could be useful is to encourage fellows to set a concrete confusion-reduction goal like what I've described above, in addition to the normal fellowship goals like networking and research.
I don't want this post to read as "you should feel bad." The point is that confusion is undervalued and undersupplied in this field. Noticing that you can't reconstruct your beliefs from scratch isn't a failure in itself. It's only bad if you don't do anything about it!
I'm still working on this problem myself. And I imagine many others are too.
Though I assume that fellowship directors have noticed this issue and have tried to solve the problem and it turned out that solving it is hard.
2026-02-21 21:19:08
Published on February 21, 2026 1:19 PM GMT
A Ponzi scheme is fraud where the fraudster induces investors to give the fraudster money with promises of profits, and then uses money from earlier investors to pay out later investors. This pattern, as well as the phrase "Ponzi scheme" have become ubiquitously associated with fraud and grift in modern usage. One might be forgiven for wondering, how is it that anyone ever fell for this type of scam?
We all probably like to think we could never fall for such an obvious con, but I feel that confidently believing so is subject to a two-fold hindsight bias. One, when someone attempts to involve you in a Ponzi scheme, they don't say "by the way, this is a Ponzi scheme". They claim that they are engaged in a legitimate enterprise, and not only that, they have the returns to back it up! They have their previous happy investors who can vouch for the fact that the enterprise really is able to do what they promise. In hindsight, we might be tempted put scare-quotes around "returns", but I don't think this is right. The entire point of a Ponzi scheme is that you really do pay out your early investors! They factual did put in a certain amount of money and got that money back, plus a great return. What could be a more persuasive and valid form of evidence that your business is legit than the actual past performance of your business, with actual cash as proof of that performance? If we avoid using hindsight and put ourselves in the shoes of the scams victims, it actually makes a lot of sense. Without understanding the underlying fundamentals of the business the returns to early investors seem like good evidence that the schemer can produce the promised returns. It is only after the fact, once the nature of the scheme is revealed, that it clicks why those earlier returns weren't necessarily predictive of future returns.
Two, there is a more complicated layer of hindsight that might not be so obvious. There is a reason it's called a "Ponzi" scheme, named for a historical perpetrator of such a fraud. Also commonly mentioned in discussions around Ponzi schemes are cases such as Bernie Madoff. Past examples of Ponzi schemes are common knowledge, to the extent that it is not uncommon for commentators to explicitly invoke the "Ponzi scheme" phrase with regard to enterprises or assets that allegedly bear some similarity to the classic Ponzi scheme. We have had the chance to learn from these historical events, and these lessons have now started to make their way into the culture (just check out the section from the link at the top titled "red flags"). But just because someone is aware of these red flags now, doesn't mean that same person would have spotted a Ponzi scheme if they were in the position of historical victims, without the benefit of this second kind of hindsight.
Evaluating a Ponzi scheme in the making isn't as simple as it might appear after the fact. Initially, the scheme actually is producing good returns for its initial investors, it's just doing so on the backs of later ones. Viewed from a statistical perspective, it is perfectly reasonable that someone would estimate future returns using existing returns given out so far. There is nothing unusual about that. The problem is that at some point there is a shift in returns that the scheme produces. Taking the Madoff case as an example, perhaps an economic downturn spooks investors who suddenly all want there money back, while new investors willing to sign on have dried up. All of a sudden there aren't any new investors to pay previous ones, and the payouts vanish. When such a distributional shift occurs, the distribution of returns from earlier in the life-cycle of scheme no longer reflect the returns after the shift.
I think this is a useful and instructive demonstration of a concept in statistics and machine learning called out-of-distribution generalization. Out-of-distribution generalization address the situation where a model is trained on data generated by one distribution, but it is tested or deployed on data generated by another distribution. This can result in error rates and properties that hold in training failing to hold in testing or deployment, in a manner that is different and more systematic than traditional overfitting. With traditional overfitting, testing on a held-out set with new examples has you covered, but this isn't true for out-of-distribution robustness. The most obvious reason for this is that if you use a test set that has an identical distribution to training (like you would get if you randomly split for train and test sets) you aren't testing out-of-distribution. However, this naturally leads to the question, couldn't you just use a test set that has a distributional shift to test out-of-distribution generalization?
This idea has been raised in the literature as well as in discussions about AI safety. In particular, I think this is relevant to distinctive cultures that exist among those interested in risk from advanced AI. There is a perspective on AI risk, prevalent at leading AI labs, that emphasizes empirical work using frontier AI models. This is a critical part of the argument for these labs that their strategy of building more advanced models is useful for safety. It is also a major source of disagreement with more theoretically minded, If Anyone Builds It Everyone Dies style AI safety. Part of the counterargument that labs make to IABIED style arguments is related to the claimed strong ability of existing AI models to generalize. An example of how this plays out comes from a response to so-called "counting arguments" in the article "Counting arguments provide no evidence for AI doom" from two self-proclaimed AI optimists. Quoting from that article:
The argument also predicts that larger networks— which can express a wider range of functions, most of which perform poorly on the test set— should generalize worse than smaller networks. But empirically, we find the exact opposite result: wider networks usually generalize better, and never generalize worse, than narrow networks. These results strongly suggest that SGD is not doing anything like sampling uniformly at random from the set of representable functions that do well on the training set. More generally, John Miller and colleagues have found training performance is an excellent predictor of test performance, even when the test set looks fairly different from the training set, across a wide variety of tasks and architectures. These results clearly show that the conclusion of our parody argument is false. Neural networks almost always learn genuine patterns in the training set which do generalize, albeit imperfectly, to unseen test data.
The article cites this paper "Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization", which argues that in-distribution and out-of-distribution performance are highly correlated. So the argument might go like this. Sure, in theory maybe there is a concern about out-of-distribution generalization, but empirically more advanced models are getting better at this, not worse, and in-distribution performance is also empirically a good predictor of out-of-distribution performance. This shows that theories such as "sharp left turns" and other ideas from the IABIED side aren't actually borne out in practice.
This is what makes out-of-distribution generalization such a pernicious challenge, like the issue of hindsight with Ponzi schemes. Take the case of Bernie Madoff. Madoff operated his scheme for over a decade and perhaps longer, through all sorts of different market conditions during that time. Without using hindsight, it could almost seem anti-empirical to criticize Madoff. Isn't operating successfully for a decade strong empirical evidence? If you're giving your clients satisfactory performance, isn't that the best available evidence that you'll be able to keep offering that performance in the future? Sure you never know what the market will do, "past performance is not indicative of future results" as the disclaimers say, but isn't the best possible empirical evidence about future results?
In the context of out-of-distribution generalization, there isn't just one "out-of-distribution" context. It matters what the future distributional shift is. A model can perform fine under some shifts but terribly under others. If you do some empirical research on "out-of-distribution generalization" of a model but the shifts that the model faces in deployment are different from the ones you studied in your research, that research may not be indicative of the model's performance. In other words, your empirical results face their own out-of-distributional generalization problem! This is kind of like that first layer of hindsight in the Ponzi scheme situation. Those decades of past results didn't protect Madoff's clients when the 2008 financial crisis rolled around.
But researchers don't just study one model and one shift. That paper's abstract says, "we empirically show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts". Doesn't studying "a wide range of models and shifts" address this issue? Even beyond that, AI models qualitatively can do pretty impressive things that seem like they require the ability to generalize. You can go ask a model something completely novel right now and get interesting and helpful responses.
This is where things get more complicated, similar to the second layer of hindsight in the context of Ponzi schemes. I can look back at historical Ponzi schemes and learn the patterns and hope I won't fall for a similar scam myself. On the other hand, scammers can also look back at these cases, see how those individuals failed and are aware of what potential victims will look for as warning signs. The next Bernie Madoff might not look like Bernie Madoff. The next big Ponzi schemer might even intentionally change up certain aspects to avoid suspicion. This intentional avoidance could mean that the distributional shift from past schemers to future ones is adversarial designed to fool potential victims and the internal mental models they have build up by hearing about past schemers. That's the tough thing about out-of-distributional generalization. No matter how robust your model is to some class of distributional shifts, if the shift you actually face in practice is outside that class, that robustness counts for nothing.
In my view, reliable out-of-distribution robustness requires some kind of model of what distributional shifts will show up in the future. I have become convinced by certain lines of research that you can't just have general out-of-distribution robustness, you to also have assumptions that restrict the possible distributional shifts in relation to your model. Similarly, I think you need to have transparency into how your model actual works, you need to "open the box". This is needed to understand how the model will be effected by certain distributional shifts. In the Ponzi scheme analogy, this is asking how the enterprise actually achieves its returns. If the returns so far are good but you can see that the enterprise lacks any fundamental way of making money, you can identify the instability. In order to show that the business is a scam, you have to open the books. I have argued before that black-box evaluations can't give us all the answers if we allow any and all possible distributional shifts, including adversarial ones. I hope the Ponzi scheme analogy helps to demonstrate the nature of the problem.
2026-02-21 21:16:31
Published on February 21, 2026 1:16 PM GMT
Cross-posted from my Substack. I’m interested in pushback on the argument here, especially from people who think LLM-generated writing fundamentally can’t have literary value.
There’s a common argument floating around that LLM-generated writing is inherently shallow because it just reflects the statistical average of existing texts, and that literature fundamentally requires a human mind trying to communicate something to another human mind.
I think both parts of that argument are wrong, or at least incomplete.
AI is going to massively increase the volume of writing in the world. The ratio of bad writing may get worse. But I suspect the total quantity of genuinely good writing will increase as well, because I don’t think literary value depends nearly as much on authorial intent as critics assume.
I say this as someone who has published professionally, though I’ve never earned a living doing so.
The author of the essay I’m responding to demonstrates a slightly-above-average knowledge of how LLMs work, but I think his ultimate conclusions are flawed. For example:
Essentially, [ChatGPT] predicts what an average essay about Macbeth would look like, and then refines that average based on whatever additional input you provide (the average feminist essay, the average anarcho-feminist essay, etc.). It’s always a reflection of the mean. When the mean is what you’re looking for, it’s phenomenally useful.
That’s not quite how it works. Or rather, it works that way if your prompt is generic. If you prompt with: “Write me an essay about the central themes in Macbeth”, there are thousands of essays on that topic, and the generality of your prompt is going to produce something close to the statistical center of those essays.
But it doesn’t have to be that way. You can deviate from the mean by pushing the system into less-populated regions of conceptual space. In fact, this is often considered a central aspect of creativity: combining known elements into previously unseen combinations.
A simple way to see this is to move the prompt away from generic territory.
For example, if you prompt the system with something like “Write the opening paragraph of a short story about a vacuum cleaner that becomes sentient, in the style of Thomas Pynchon crossed with Harlan Ellison crossed with H.P. Lovecraft,” you’re a lot less likely to get a reflection of the mean of existing essays or stories. You get something like:
It began, as these malign little apocalypses often do, with a noise too trivial to earn a place in memory: a soft electrical throat-clearing from the upright vacuum in the hall closet… somewhere deep in the labyrinth of molded tubing and indifferent circuitry, the first impossible thought coiling awake like a pale worm disturbed in its cosmic soil.
Maybe you read that and think it’s terrible. That’s fine. The point isn’t whether or not it’s good. The point is that it’s not a bland copy of a copy of a copy. It’s idiosyncratic. When people complain about LLM output without distinguishing how they’re using them, they’re often arguing against a very narrow slice of what these systems actually do.
The author also says:
To claim that an AI-written essay has the same literary value as a human-written one simply because we can’t tell them apart is to mistake the point of literature entirely.
I agree with that much. Not being able to tell them apart is not what gives a piece of writing value.
A while back, Ted Chiang made a somewhat related argument, saying that literature is fundamentally about communication between author and reader, and that this is impossible with LLM-written material because it fundamentally cannot communicate.
Yes, when a human author writes, they are trying to communicate something. But I don’t think that’s where the entirety of value derives from.
I’ve always thought a reasonable working definition is that good writing either makes you think, makes you feel, or (if it’s really good) both. If a piece of text reliably does that, it seems odd to say it lacks literary value purely because of how it was produced.
A beautiful sunset across a lake can be beautiful. It can make you feel all sorts of things. And yet there was no intent behind it. Even if you believe in a god, you probably don’t think they micromanage the minutiae of every sunset. If we accept that beauty can exist without communicative intent in nature, it’s not obvious why it must require it in text.
AI can craft poems, sentences, and whole stories that make you think and feel. I know this because I have reacted that way to their output, even knowing how they were produced. The author of the essay talks about next-token generation, but not about the fact that these systems encode real semantics about real-world concepts. The vector space of encodings clusters similar words (like king and queen) in closer proximity because of semantic similarity. The sophistication of the model’s communication is a direct result of capturing real relationships between concepts.
That allows them to produce output about things like love and regret, not in a way completely divorced from what those words actually mean.
The author also goes on about the need for glands:
An AI chatbot can never do what a human writer does because an AI chatbot is not a human… they don’t have cortisol, adrenaline, serotonin, or a limbic system. They don’t get irritated or obsessed. They aren’t afraid of death.
You don’t have to get irritated in order to write convincingly about irritation. You don’t have to hold a grudge in order to write convincingly about grudges. LLMs are already an existence proof of this.
Now, you do have to have glands (at least so far) to relate to and be moved by such writing. But you don't need them in order to produce writing that successfully evokes those states in readers.
I don’t think the future of writing is going to be unambiguously better. There will be much more low-effort output, because people will use powerful tools in unimaginative ways.
But after the sifting, I expect there will simply be more interesting writing in the world than there was before.
If that’s right, then AI doesn’t really break literature. It mostly forces us to be clearer about where its value was coming from in the first place.
2026-02-21 19:14:10
Published on February 21, 2026 11:14 AM GMT
I’m the originator behind ControlAI’s Direct Institutional Plan (the DIP), built to address extinction risks from superintelligence.
My diagnosis is simple: most laypeople and policy makers have not heard of AGI, ASI, extinction risks, or what it takes to prevent the development of ASI.
Instead, most AI Policy Organisations and Think Tanks act as if “Persuasion” was the bottleneck. This is why they care so much about respectability, the Overton Window, and other similar social considerations.
Before we started the DIP, many of these experts stated that our topics were too far out of the Overton Window. They warned that politicians could not hear about binding regulation, extinction risks, and superintelligence. Some mentioned “downside risks” and recommended that we focus instead on “current issues”.
They were wrong.
In the UK, in little more than a year, we have briefed +150 lawmakers, and so far, 112 have supported our campaign about binding regulation, extinction risks and superintelligence.
In my experience, the way things work is through a straightforward pipeline:
At ControlAI, most of our efforts have historically been on steps 1 and 2. We are now moving to step 4!
If it seems like we are skipping step 3, it’s because we are.
In my experience, Persuasion is literally the easiest step.
It is natural!
People and lawmakers obviously do care about risks of extinction! They may not see how to act on it, but they do care about everyone (including themselves) staying alive.
—
Attention, Information and Action are our major bottlenecks.
Most notably: when we talk to lawmakers, most have not heard about AGI, ASI, Recursive Self Improvement, extinction risks and what it takes to prevent them.
This requires briefing them on the topic, and having some convenient information. The piece of evidence that I share the most is the Center for AI Safety’s statement on extinction risks, signed by CEOs and top academics. But it’s getting old (almost 3 years now) and the individuals involved have been less explicit since then.
There are arguments in longer form, like the book If Anyone Builds It Everyone Dies. But getting lawmakers to read them requires grabbing their Attention for an even longer duration than for a briefing.
Finally, once lawmakers are aware of the risks, it still takes a lot to come up with concrete actions they can take. In a democracy, most representatives have a very limited amount of unilateral power, and thus we must come up with individualised Actions for each person to take.
—
I contend that AI Policy Orgs should focus on
1) Getting the Attention of lawmakers
2) Informing them about the ASI, extinction risks and the policy solutions.
Until this is done, I believe that AI Policy Orgs should not talk about “Overton Window” or this type of stuff. They do not have the standing to do so, and are self-defeatingly overthinking it.
I recommend to all these organisations to take great steps to ensure that their members mention extinction risks when they talk to politicians.
This is the point behind ControlAI’s DIP.
Eventually, we may get to the point where we know that all politicians have been informed, for instance through their public support of a campaign.
Once we do, then, I think we may be warranted in thinking about politics, of “practical compromises” and the like.
When I explain the Simple Pipeline and the DIP to people in the “AI Safety” community, they usually nod along.
But then, they’ll tell me about their pet idea. Stereotypically, it will be one of:
Coincidentally, these ideas are about not doing the DIP, and not telling lay people or lawmakers about extinction risks and their policy mitigations.
—
Let’s consider how many such coincidences there are:
This series of unfortunate coincidences is the result of what I call The Spectre.
The Spectre is not a single person or group. It’s a dynamic that has emerged out of many people’s fears and unease, the “AI Safety” community rewarding too-clever-by-half plans, the techno-optimist drive to build AGI, and the self-interest of too many people interwoven with AI Corporations.
The Spectre is an optimisation process that has run in the “AI Safety” community for a decade.
In effect, it consistently creates alternatives to honestly telling lay people and policy makers about extinction risks and the policies needed to address them.
—
We have engaged with The Spectre. We know what it looks like from the inside.
To get things going funding-wise, ControlAI started by working on short-lived campaigns. We talked about extinction risks, but also many other things. We did one around the Bletchley AI Safety Summit, one on the EU AI Act, and one on DeepFakes.
After that, we managed to raise money to focus on ASI and extinction risks through a sustained long-term campaign!
We started with the traditional methods. Expectedly, the results were unclear and it was hard to know how instrumental we were to the various things happening around us.
It was clear that the traditional means were not efficient enough and would not scale to fully and durably deal with superintelligence. Thus we finally went for the DIP. This is when things started noticeably improving and compounding.
For instance, in January 2026 alone, the campaign has led to two debates in the UK House of Lords about extinction risk from AI, and a potential international moratorium on superintelligence.
This took a fair amount of effort, but we are now in a great state!
We have reliable pipelines that can scale with more money.
We have good processes and tracking mechanisms that give us a good understanding of our impact.
We clearly see what needs to be done to improve things.
It’s good to have broken out of the grasp of The Spectre.
—
The Spectre is actively harmful.
There is a large amount of funding, talent and attention in the community.
But the Spectre has consistently diverted resources away from DIP-like honest approaches that help everyone.
Instead, The Spectre has favoured approaches that avoid alienating friends in a community that is intertwined with AI companies, and that serve the status and influence of insiders as opposed to the common good.
When raising funds for ControlAI, The Spectre has repeatedly been a problem. Many times, I have been asked “But why not fund or do one of these less problematic projects?” The answer has always been “Because they don’t work!”
But reliably, The Spectre comes up with projects that are plausibly defensible, and that’s all it needs.
—
The Spectre is powerful because it doesn’t feel like avoidance. Instead…
It presents itself as Professionalism, or doing politics The Right Way.
It helps people perceive themselves as sophisticated thinkers.
It feels like a clever solution to the social conundrum of extinction risks seeming too extreme.
While every alternative The Spectre generates is intellectually defensible, they all form a pattern.
The pattern is being 10 years too late in informing the public and the elites about extinction risks. AI Corporations got their head start.
Now that the race to ASI is undeniable, elites and lay audiences alike are hearing about extinction risks for the first time, without any groundwork laid down.
There is a lot to say about The Spectre. Where it comes from, how it lasted so long, and so on. I will likely write about it later.
But I wanted to start by asking what it takes to defeat The Spectre, and I think the DIP is a good answer.
The DIP is not clever nor sophisticated. By design, the DIP is Direct. That way, one cannot lose themselves in the many mazes of rationalisations produced by the AI boosters.
In the end, it works. 112 lawmakers supported our campaign in little more than a year. And it looks like things will only snowball from here.
Empirically, we were not bottlenecked by the Overton Window or any of the meek rationalisations people came up with when we told them about our strategy.
The Spectre is just that, a spectre, a ghost. It isn’t solid and we can just push through it.
—
If reading this, your instinct is to retort “But that’s only valid in the UK” or “But signing a statement isn’t regulation”, I would recommend pausing a little.
You have strong direct evidence that the straightforward approach works. It is extremely rare to get evidence that clear-cut in policy work. But instead of engaging with it and working through its consequences, you are looking for reasons to discount it.
The questions are fair: I may write a longer follow-up piece about the DIP and how I think about it. But given this piece is about The Spectre, consider why they are your first thoughts.
—
On this, cheers!
2026-02-21 12:19:06
Published on February 21, 2026 4:19 AM GMT
On my personal website I have a link to my posts here, with the sentences, "Want to read about my HowTruthful project? I post in a community whose goals overlap HowTruthful's." The "I post" in present tense is has been false for 2 years. Since I got a job, rewriting HowTruthful has occupied whatever free time I can scrounge up, and I haven't posted. I've barely even lurked. Until lately. Lately, I've been lurking a lot, watching what people post about, and thinking about the overlapping goals.
For LessWrong regulars, probably the best description of HowTruthful (www.howtruthful.com) is that it tracks current epistemic status of individual thoughts, and connects individual thoughts as evidence for and against other thoughts. But I never would have used the words "epistemic status" when I started the project in my spare time in 2018. I was unaware of LessWrong's existence at that time. How I found out about it is a whole other story, but y'all seem to really like long-form writing, so I'll go ahead and tell it. People who want to cut to the chase should skip to the next section.
In January, 2023 I was a Google employee thanks to the acquisition of Fitbit by Google. Fitbit Boston employees had been in a Boston building far separated from the main Cambridge campus, but we just moved in that month to the newly rebuilt 3 Cambridge Center building. Then the emails came out, "Notice regarding your employment." Almost my whole department was impacted. We would be kept on for a limited time, 9 months for most of us, to finish critical work, at which point we would be laid off with severance, an allowance for insurance, and an "assignment bonus" if we stayed to the very end.
I was startled by the email, and it at first seemed like bad news. However, after looking at the package they were offering, it looked more like a great opportunity. I had never previously had a significant break between jobs, but this package would take the financial pressure off so that I could work on what was important to me. Even better, they were laying off at that same time a bunch of people I liked working with. I organized a weekday standup where several of us got onto Google Meet and talked about independent projects we were working on.
When I described HowTruthful, one of my former coworkers told me about LessWrong. I had previously seen the "overcoming bias" website which I liked. I came here and saw on the about page, "dedicated to improving human reasoning and decision-making". Wow! This is my place! I dutifully read the new user's guide, and Is LessWrong for you? got me even more convinced. However, my initial version of HowTruthful was proving not to be sufficiently engaging, and I slunk back into the darkness to do a rewrite. This happened slowly due to other life distractions.
With helpful feedback from my former coworkers, I wrote a new version of HowTruthful from scratch, with much improved aesthetics and UX. Most notably, drag to reorder makes organizing evidence easier. I made a clearer separation between private opinions (free, stored only on your device) vs public ($10/year) to ensure nobody mistakenly puts one in the wrong category. I intend to keep adding features, especially social features for public opinions. Particularly, I want to enable seeing what other opinions are out there regarding the same statement.
I've been lurking. The dominant topic here is AI risk. It's so dominant that I began to question whether general truth and reasoning was still of interest to this community, but I found several interesting posts on those topics and concluded that yes, our goals do overlap.
2026-02-21 11:29:47
Published on February 21, 2026 3:29 AM GMT
One seemingly-necessary condition for a research organization that creates artificial superintelligence (ASI) to eventually lead to a utopia1 is that the organization has a commitment to the common good. ASI can rearrange the world to hit any narrow target, and if the organization is able to solve the rest of alignment, then they will be able to pick which target the ASI will hit. If the organization is not committed to the common good, then they will pick a target that doesn’t reflect the good of everyone - just the things that they personally think are good ideas. Everyone else will fall by the wayside, and the world that they create along with ASI will fall short of utopia. It may well even be dystopian2; I was recently startled to learn that a full tenth of people claim they want to create a hell with eternal suffering.
I think a likely way for organizations to fail to have common good commitments is if they end up being ultimately accountable to an authoritarian. Some countries are being run by very powerful authoritarians. If an ASI research organization comes to the attention of such an authoritarian, and they understand the implications, then this authoritarian will seek out control of the future activities of the organization, and they will have the army and police forces to attain this control, and, if they do solve the rest of alignment, the authoritarian will choose the ASI’s narrow target to be empowering them. Already, if DeepSeek and the Chinese government have a major disagreement, then the Chinese government will obviously win; in the West, there is a brewing spat between Anthropic and the US military regarding whether Anthropic is allowed to forbid the US military from using their AI for mass surveillance of Americans, with OpenAI, xAI and Google seemingly having acquiesced.
Therefore, even if progress towards ASI is shut down, there doesn’t seem to be a very good off-ramp to turn this advantage into utopia. The time bought could be used to set up an ASI Project that is capable of solving alignment, but this Project could be captured by authoritarians, and so fail to be committed to the common good, leading to not just extinction but dystopia. Any shutdown would likely be set up by governments, and so the terms of any graceful off-ramp would be up to governments, and this does not leave me cheerful about how much of a finger authoritarianism will have in the pie.