MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

The Plan - 2025 Update

2026-01-01 04:10:45

Published on December 31, 2025 8:10 PM GMT

What’s “The Plan”?

For several years now, around the end of the year, I (John) write a post on our plan for AI alignment. That plan hasn’t changed too much over the past few years, so both this year’s post and last year’s are written as updates to The Plan - 2023 Version.

I’ll give a very quick outline here of what’s in the 2023 Plan post. If you have questions or want to argue about points, you should probably go to that post to get the full version.

So, how’s progress? What are you up to?

2023 and 2024 were mostly focused on Natural Latents - we’ll talk more shortly about that work and how it fits into the bigger picture. In 2025, we did continue to put out some work on natural latents, but our main focus has shifted.

Natural latents are a major foothold on understanding natural abstraction. One could reasonably argue that they’re the only rigorous foothold on the core problem to date, the first core mathematical piece of the future theory. We’ve used that foothold to pull ourselves up a bit, and can probably pull ourselves up a little further on it, but there’s more still to climb after that.

We need to figure out the next foothold.

That’s our main focus at this point. It’s wide open, very exploratory. We don’t know yet what that next foothold will look like. But we do have some sense of what problems remain, and what bottlenecks the next footholds need to address. That will be the focus of the rest of this post.

What are the next bottlenecks to understanding natural abstraction?

We see two main “prongs” to understanding natural abstraction: the territory-first prong, and the mind-first prong. These two have different bottlenecks, and would likely involve different next footholds. That said, progress on either prong makes the other much easier.

What’s the “territory-first prong”?

One canonical example of natural abstraction comes from the ideal gas (and gasses pretty generally, but ideal gas is the simplest).

We have a bunch of little molecules bouncing around in a box. The motion is chaotic: every time two molecules collide, any uncertainty in their velocity is amplified multiplicatively. So if an observer has any uncertainty in the initial conditions (which even a superintelligence would, for a real physical system), that uncertainty will grow exponentially over time, until all information is wiped out… except for conserved quantities, like e.g. the total energy of the molecules, the number of molecules, or the size of the box. So, after a short time, the best predictions our observer will be able to make about the gas will just be equivalent to using a Maxwell-Boltzmann distribution, conditioning on only the total energy (or equivalently temperature), number of particles, and volume. It doesn’t matter if the observer is a human or a superintelligence or an alien, it doesn’t matter if they have a radically different internal mind-architecture than we do; it is a property of the physical gas that those handful of parameters (energy, particle count, volume) summarize all the information which can actually be used to predict anything at all about the gas’ motion after a relatively-short time passes.

The key point about the gas example is that it doesn’t talk much about any particular mind. It’s a story about how a particular abstraction is natural (e.g. the energy of a gas), and that story mostly talks about properties of the physical system (e.g. chaotic dynamics wiping out all signal except the energy), and mostly does not talk about properties of a particular mind. Thus, “territory-first”.

More generally: the territory-first prong is about looking for properties of (broad classes of) physical systems, which make particular abstractions uniquely well-suited to those systems. Just like (energy, particle count, volume) is an abstraction well-suited to an ideal gas because all other info is quickly wiped out by chaos.

What’s the “mind-first prong”?

Here’s an entirely different way one might try to learn about natural abstraction.

Take a neural net, and go train it on some data from real-world physical systems (e.g. images or video, ideally). Then, do some interpretability to figure out how the net is representing those physical systems internally, what information is being passed around in what format, etc. Repeat for a few different net architectures and datasets, and look for convergence in what stuff the net represents and how.

(Is this just interpretability? Sort of. Interp is a broad label; most things called “interpretability” are not particularly relevant to the mind-first prong of natural abstraction, but progress on the mind-first prong would probably be considered interp research.)

In particular, what we’d really like here is to figure out something about how patterns in the data end up represented inside the net, and then go look in the net to learn about natural abstractions out in the territory. Ideally, we could somehow nail down the “how the natural abstractions get represented in the net” part without knowing everything about what natural abstractions even are (i.e. what even is the thing being represented in the net), so that we could learn about their type signature by looking at nets.

More generally: the mind-first prong is about looking for convergent laws governing how patterns get “burned in” to trained/evolved systems like neural nets, and then using those laws to look inside nets trained on the real world, in order to back out facts about natural abstractions in the real world.

Note that anything one can figure out about real-world natural abstractions via looking inside nets (i.e. the mind-first prong) probably tells us a lot about the abstraction-relevant physical properties of physical systems (i.e. the territory-first prong), and vice versa.

So what has and hasn’t been figured out on the territory prong?

The territory prong has been our main focus for the past few years, and it was the main motivator for natural latents. Some key pieces which have already been nailed down to varying extents:

  • The Telephone Theorem: information which propagates over a nontrivial time/distance (like e.g. energy in our ideal gas example) must be approximately conserved.
  • Natural Latents: in the language of natural latents, information which propagates over a nontrivial time/distance (like e.g. energy in our ideal gas example) must be redundantly represented in many times/places - e.g. we can back out the same energy by looking at many different time-slices, or roughly the same energy by looking at many different little chunks of the gas. If, in addition to that redundancy, that information also mediates between time/space chunks, then we get some ontological guarantees: we’ve found all the information which propagates.
  • Some tricks which build on natural latents:
    • To some extent, natural latent conditions can nail down particular factorizations of high level summaries, like e.g. representing a physical electronic circuit as a few separate wires, transistors, etc. We do this by looking for components of a high-level summary latent which are natural over different physical chunks of the system.
    • We can also use natural latent conditions to nail down particular clusterings, like in A Solomonoff Inductor Walks Into A Bar.

… but that doesn’t, by itself, give us everything we want to know from the territory prong.

Here are some likely next bottlenecks:

  • String diagrams. Pretty much every technical diagram you’ve ever seen, from electronic circuits to dependency graphs to ???, is a string diagram. Why is this such a common format for high-level descriptions? If it’s fully general for high-level natural abstraction, why, and can we prove it? If not, what is?
  • The natural latents machinery says a lot about what information needs to be passed around, but says a lot less about how to represent it. What representations are natural?
  • High level dynamics or laws, like e.g. circuit laws or gas laws. The natural latents machinery might tell us e.g. which variables should appear in high level laws/dynamics, but it doesn’t say much about the relationships between those variables, i.e. the laws/dynamics themselves. What general rules exist for those laws/dynamics? How can they be efficiently figured out from the low level? How can they be efficiently represented in full generality?
  • How can we efficiently sample the low-level given the high-level? Sure, natural latents summarize all the information relevant at long distances. But even with long-range signals controlled-for, we still don’t know how to sample a small low-level neighborhood. We would need to first sample a boundary which needs to be in-distribution, and getting an in-distribution boundary sample is itself not something we know how to do.

And what has and hasn’t been figured out on the mind prong?

The mind prong is much more wide open at this point; we understand it less than the territory prong.

What we’d ideally like is to figure out how environment structure gets represented in the net, without needing to know what environment structure gets represented in the net (or even what structure is in the environment in the first place). That way, we can look inside trained nets to figure out what structure is in the environment.

We have some foundational pieces:

  • Singular learning theory, or something like it, is probably a necessary foundational tool here. It doesn’t directly answer the core question about how environment structure gets represented in the net, but it does give us the right mental picture for thinking about things being “learned by the net” at all. (Though if you just want to understand the mental picture, this video is probably more helpful than reading a bunch of SLT.)
  • Natural latents and the Telephone Theorem might also be relevant insofar as we view the net itself as a low-level system which embeds some high-level logic. But that also doesn’t get at the core question about how environment structure gets represented in the net.
  • There’s a fair bit to be said about commutative diagrams. They, again, don’t directly address the core representation question. But they’re one of the most obvious foundational tools to try, and when applied to neural nets, they have some surprising approximate solutions - like e.g. sparse activations.

… but none of that directly hits the core of the problem.

If you want to get a rough sense of what a foothold on the core mind prong problem might look like, try Toward Statistical Mechanics of Interfaces Under Selection Pressure. That piece is not a solid, well-developed result; probably it’s not the right way to come at this. But it does touch on most of the relevant pieces; it gives a rough sense of the type of thing which we’re looking for.

Mostly, this is a wide open area which we’re working on pretty actively.



Discuss

Safety Net When AIs Take Our Jobs

2026-01-01 04:05:15

Published on December 31, 2025 8:05 PM GMT

I'm analyzing what happens to the US economy in the short-term aftermath of the typical job being replaced by AIs and robots. Will there be a financial crisis? Short answer: yes.

This is partly inspired by my dissatisfaction with Tomas Pueyo's analysis in If I Were King, How Would I Prepare for AI?.

Let's say 50% of workers lose their jobs at the same time (around 2030), and they're expected to be permanently unemployed. (I know this isn't fully realistic. I'm starting with simple models and will add more realism later.)

I'll assume that AI starts making the world more productive around the same time that this job loss occurs, but that big innovations such as a cheap cancer cures or the ability to conquer the world are still far enough in the future that financial markets aren't ready to price them in.

These assumptions are designed to help me analyze the effects of job loss with minimal complications from other effects of AI. I'm focused here on the short-term financial and political consequences of job losses. There will be some radically different longer-term consequences, but I'm only analyzing those here to the extent that I expect markets to reflect them at the time of the job losses.

This post is merely an outline of what a rigorous analysis would look like. It's good enough for informing my investment strategies, but not for persuading politicians to adopt better policies.

Note that this will be one of my least readable blog posts. Most of you should start by reading the conclusion, and only reading the rest if you're tempted to argue with my conclusions.

If you still think my conclusions are wrong, you can find some more detailed explanations of my reasoning in this conversation with Gemini.

Note that I'm targeting this at readers with a significant background in finance. Please question the details of my analysis, and produce competing guesses based on answering similar questions.

Conclusions

I expect turmoil similar to that of the pandemic. My median guess is that it will be somewhat less sudden than the crash of March 2020, and that markets will mostly recover in one to two years (assuming we have years before something more dramatic happens).

The financial turmoil is likely to induce political instability. I find that hard to predict.

The US government will need to be more competent than it was during the pandemic in order to avoid hyperinflation or defaulting on its debt.

The magnitude of the turmoil will likely be heavily influenced by hard-to-predict expectations.

Maybe a bright spot is that a financial crash could slow capability advances at roughly a time of near-maximum risk related to AI alignment. But that might be offset by politicians being too distracted to do anything competent about alignment.

I'm surprised at how much my outlook fluctuated while writing this post, between optimism and despair, before settling on an intermediate mood.

The process of writing this post convinced me to (slowly) start selling my remaining (small) positions in bank stocks. I'll be less willing to sell my stocks in gold mining companies. I'll probably be more willing to sell some of my other stocks when I've guessed that they've reached bubble levels, rather than hoping to sell close to the peak.

See my blog for the full post.



Discuss

2025 Year in Review

2026-01-01 03:50:53

Published on December 31, 2025 7:50 PM GMT

It’s that time. It’s been a hell of a year.

At the start we barely had reasoning models. Now we have Claude Code and Opus 4.5.

I don’t code. Yet now I cause code to exist whenever something about a website annoys me, or when I get that programmer’s realization that there’s something I am planning on doing at least three times. Because why not?

The progress has simultaneously been mind bogglingly impressive and fast. But a lot of people don’t see it that way, because progress has been incremental, and because we were reasonably expecting to often get even more than this.

The public conversation and debate, even more than before, was full of false narratives and active attempts to make the situation worse. The same goes for attempts to shape Federal policy towards AI, and OpenAI’s conversion into a for-profit.

It’s been, as they say, one battle after another, with many wins, many setbacks and a lot of things in between.

This includes the key developments in AI, and also other blog posts from the year that I consider memorable looking back.

This is only our corner of the world’s Year in Review, not one in general, thus things like Liberation Day are happening in the background and go undiscussed.

January

The confusions started in January, as we prepared for Trump to take office.

OpenAI had just given us o1-preview, the first reasoning model.

At the tail end of 2024, DeepSeek released v3, or The Six Million Dollar Model. This was a big advancement in open source and Chinese model capabilities, and showed that they were not as far behind as we thought they were, and also that damn good models could be trained on the cheap. Not as cheap as the headline number, since the six million was only direct costs of the final run, but still pretty cheap.

Then a few weeks later, DeepSeek gave us r1, a reasoning model based on v3. They wrapped this up into a nice clean free app experience, which included the first time most people could see a reasoning model’s chain of thought – Gemini Flash Thinking offered this too but almost no one knew about that or cared. This showed that the ‘secret sauce’ of building a reasoning model was not so difficult to copy, and the marginal costs of doing so were low.

DeepSeek shot to the top of the App store, and the world completely lost its mind. The stock market mini-crashed. People talked about how China had ‘caught up’ to America, or this meant inference would be so cheap no one would need Nvidia chips (as consumers rushed out to buy Nvidia chips to run DeepSeek r1), or how it would destroy margins and drive American AI out of business. I had to warn people, many times, with the classic advice: Don’t Panic, and I went on Odd Lots to discuss it all.

Collectively this was called The DeepSeek Moment.

White House rhetoric talked about how this meant we were in a ‘race’ with China, so of course any other considerations than ‘winning’ must be thrown out the window.

With time, those paying attention realized all of that was overblown. DeepSeek was impressive as a lab, and v3 and r1 were excellent models, but still on the order of eight months behind OpenAI, Anthropic and Google. We had been comparing the relatively best features of r1 on their own, and then using that to project into the future, which flat out did not happen. This happened at a crucial inflection point, right when reasoning models had started, which was when a tiny amount of compute could go a maximally long way.

Later on, r1-0528 did not have a moment, nor did DeepSeek 3.1 or DeepSeek 3.2.

February

Google started out the month introducing us to Deep Research, a new product form that would be copied by OpenAI, allowing the AI to take time to prepare a report. At the time, this was super impressive. It definitely has its uses, even if the timing is awkward and you have to push past the tendency to pad reports with a lot of slop.

A new paper on The Risk of Gradual Disempowerment From AI improved the debate by highlighting a central way that humans end up not being in charge. There doesn’t need to be some ‘AI coup’ or battle, the AIs will by default end up with more and more resources and power unless something stops this from happening. One day we wake up and realize we are not in control. Another day after that we don’t wake up.

OpenAI declared that its primary alignment strategy would be Deliberative Alignment, so I analyzed that approach. I think it is helpful, but not a central solution.

The Administration made its AI feelings clear at The Paris AI Anti-Safety Summit. Previous summits had been efforts to lay foundation for international cooperation, with serious discussions of existential risks, in particular with The Bletchley Declaration. That was clearly over, transformed into a disdain for the idea that sufficiently advanced AI could be existentially dangerous, and Vance giving a speech demanding suicidal accelerationism and warning against attempts to not die.

The year would play out in similar fashion. We had some modest success in California and New York, but the White House would, under the influence of David Sacks, become an active force for interference with efforts to not die, and later even to beat China. They would do some pro-America things along the way, but also things that actively interfered with our competitiveness.

I introduced a key new concept handle which I call Levels of Friction. Different actions are variously harder or easier, from both practical and legal perspectives, to do. They range from Level 0 (defaults or requirements), to Level 1 (legal and ubiquitous and easy), Level 2 (safe but annoying), Level 3 (actively tricky or risky), Level 4 (actually seriously illegal) up to Level 5 (we really care about stopping you). Instead of thinking of a boolean of legal-illegal or possible-impossible, it is often more enlightening to consider moving between levels.

AI is going to move a lot of things to lower levels of friction. That is by default bad, but frictions can be load bearing, such as with job applications or limiting antisocial behaviors. It protects the commons. We will have to adjust quite a lot of things once key frictions are removed from the system.

February was the peak of ‘could Grok be a thing?’ It turned out not to be a thing. In other model news we got Claude 3.7.

We also got our first introduction to Emergent Misalignment, the idea that training the AI to do bad things associated with evil could lead it to generalize into thinking of itself as trope-style evil and doing a wide range of trope-style evil things.

March

A non-AI highlight was my piece on elementary education, School Is Hell.

GPT-4.5 was OpenAI’s attempt to give us a large and slow model. It did some cool things, and there are people that really liked it, but mostly it wasn’t worthwhile.

A big part of AI coverage is getting confident in dismissing hype. A great example of this was my coverage of The Manus Marketing Madness. Now that they’ve unceremoniously sold out to Meta, it’s easy to forget that a lot of people were hyping Manus as The Next Big Thing, as well as the next reason we would Lose To China.

I warned against using The Most Forbidden Technique, which is where you use interpretability to train on intermediate outputs, to teach it to think the thoughts you want it to think, thus teaching the AI to, like humans before it, hide its thinking.

Image generation had its first big moment, when the 4o image generator came online and everyone went Studio Ghibli crazy, taking advantage of both the advancement in quality and willingness to mimic styles.

Gemini 2.5 Pro came out, which I called the new state of the art. I think this was correct at the time, but later versions of Gemini 2.5 Pro were actively worse, and soon OpenAI would be back out ahead.

April

AI 2027 provided an illustrative scenario that presented a best guess as to what was likely to happen, with an alternative scenario option where things turn out well because a bold decision is made to slow down at a key moment. Scott Alexander and Daniel Kokotajlo explained the details on the Dwarkesh podcast, and I covered various responses.

Llama 4 was released, and turned out to be a total dud. Meta has been silent since in terms of topline AI products, while spending hundreds of millions on individual pay packages to try and gather the talent to get back in the game. It is a good thing Meta is struggling, given its bizarrely dystopian AI vision it is willing to give in public.

o3 put OpenAI firmly back out in front in reasoning, with excellent tool use, but was rapidly exposed as a Lying Liar that lies a lot.

OpenAI had other problems with GPT-4o. It was always an absurd sycophant that could get some of its users into trouble, but updates made around this time made it even more of an absurd sycophant, forcing a reversion to a previous build. I would later offer a postmortem.

May

OpenAI claimed that their conversion to a for-profit, which as announced then would clearly have been one the biggest thefts in human history, would leave the non-profit in control.

The White House had from the beginning made a huge deal out of how Just Awful the Biden diffusion rules were, just like it talks about everything Biden did, but it initially acted generally wisely on chip diffusion and export controls, including on the H20.

Alas, over time David Sacks got more control over their narrative and increasingly started spouting Obvious Nonsense About AI Diffusion, literally claiming that ‘beating China’ means maximizing Nvidia’s share of chip sales, and warning that China would step in with non-existent and otherwise greatly inferior AI chips to build its own ‘AI tech stack’ if we didn’t sell massive compute to partners with questionable loyalties. Initially this rhetoric and action was confined to sales to parties like UAE and KSA, where a case can be made if the deals and safeguards are good, and details matter. Later this would extend to trying to sell chips to China directly.

OpenAI released Codex to compete with Claude Code. Claude Code was such a stealth release, initially a side project of one employee, that it took a while to notice something was happening, and even longer for me to finally give it a try. Nowadays Claude Code might be most of my AI token usage.

Claude 4 put Anthropic back in the game.

I offered thoughts on those who use AI to cheat, especially in education.

Veo 3 gave Google the lead in video generation.

I wrote my first ‘Letting Kids Be Kids,’ I would later write another in December.

June

Dating Roundup #6 proved popular, and #7 did solidly too. I just put out #8 and #9.

I did an analysis of New York’s proposed RAISE Act, by Alex Bores who is now running for Congress. I concluded it was an excellent bill. It would later pass, although in somewhat weakened form because of Governor Hochul’s changes.

OpenAI and in particular Sam Altman continued to try and sell us on the concept of a Gentle Singularity, that AIs would become superintelligent and your life wouldn’t much change. This is of course Obvious Nonsense. Your life might become great, or it might end, or it might get into High Weirdness, but it won’t stay the same.

o3 Pro came out, and was very strong and not the lying liar that normal o3 was.

I came out with my (hopefully annual from here on in) blog recommendations.

July

The first attempt to pass a federal moratorium on AI regulation, as in tell the states they aren’t allowed to regulate AI because that should be federal while also not regulating AI at the federal level, came dangerously close to passing as part of the BBB. It was ultimately stripped out 99-1 once the tide had turned.

Congress had one of its finer hearings, where they asked good questions about AI.

Grok ran into trouble. No, Grok, No. Do not call yourself MechaHitler. Or worse.

Kimi K2 was an unusually impressive new open Chinese model. We would later get Kimi K2 Thinking in November.

Google and OpenAI got IMO Gold.

AI companions were getting a lot of attention, which has since died down. This will be a big thing at some point, and for some it is a very real thing, but for now it isn’t good enough to hold most people’s interest. I followed up again in August.

August

The big hyped release of the year was of course GPT-5. This would be their big moment to unify all their crazy model variations and names, and create one model to rule them all, with a router to think longer if and only if that was worthwhile. There were approaching death stars and we saw a variety of assertive valueposting. It was the big version number jump, and people expected a lot.

GPT-5 was a good model, I found it to be a clear upgrade, but it very much did not live up to the hype. Many even strongly wanted to keep GPT-4o for its far friendlier and more empathic attitude, or some would say its sycophancy – the very features that make GPT-4o not a great thing for many users are alas the reasons users often like it so much. I covered the basic facts and model card, then outside reactions and finally created a synthesis.

Unfortunately, the model OpenAI chose to call GPT-5 being a disappointing release gave so many people, up to and including David Sacks and Sriram Krishnan at the White House, the wrong idea. There is a constant demand for data points that say AI won’t advance much, that scaling is dead, that it will all be a normal technology and you don’t have to worry about AGI. Washington seems to have come away from the GPT-5 release with this message, and it plausibly did great harm in numerous ways, including to our export controls.

I tried to push directly back against this, pointing out that AI was continuing to make rapid progress, both around GPT-5 and various other misleading data points, especially the no-good, very-bad ‘MIT study.’ I followed up by pointing out that Yes, AI Continues To Make Rapid Progress, Including Towards AGI.

I noticed I was deeply confused about AI consciousness, along with everyone else. I still am, except now I’m more confused at a better, more advanced level. These questions are coming up more and more now, and I expect that to continue.

It’s so funny to have half of people debating AI consciousness, while the other half thinks AI is not making any progress.

I offered my advice around flying.

Are the AIs starting to take our jobs? Not in general, but for entry level jobs? Kinda.

September

I reviewed If Anyone Builds It, Everyone Dies. There were a few weeks where this inspired a lot of discussion, much of it remarkably good.

The month ended with Anthropic reclaiming its role as my daily driver thanks to Claude Sonnet 4.5.

There was more on AI craziness, then later in November we would see additional lawsuits against OpenAI related to suicides.

October

OpenAI meanwhile decided to release Sora and The Big Bright Screen Slop Machine, attempting to turn its good short video generator into a dystopian social network. I said the comparables were Google+ and Clubhouse. Call looks good.

I got to go to The Curve, which was an excellent conference.

One of the consequences of the GPT-5 release was more people talked about AI as potentially being in a bubble. I do not agree, other than in the nominal ‘number might go down’ sense. Number might go down, if not number needs to go up.

OpenAI completed its trio of overhyped releases with the Atlas browser. This jaded people sufficiently that when GPT-5.1 and GPT-5.2 later came out, people gave them remarkably little focus.

Andrej Karpathy went on Dwarkesh Patel and cautioned us not to get overexcited.

The biggest advantage America has over China is its access to vastly more compute. This is thanks in large part to our export controls. Alas, David Sacks it the AI Czar, acts like a de facto Nvidia lobbyist, and is trying to make us give that edge away.

Emboldened by prior success in getting authorization for H20 sales, Nvidia and David Sacks made their move, and came (based on what I know) remarkably close to getting America to commit quite a lot of civilizational suicide and sell B30A chips to China, essentially giving them close to chip parity. This would have been a completely insane move, and we should be thankful a combination of key people stepped up and prevented this from happening.

Unfortunately, although far less unfortunately than if we’d sold B30As, they then regrouped and in December would successfully push, despite it being obviously unwise and unpopular, for us to sell H200s to China. The Chinese are making a show of not wanting them so much, but it’s a show, and our edge has been substantially eroded. The logic behind this seems to have been nominally based in part on a prediction that Huawei can scale chip production far faster than credible predictions say, as in being off by an order of magnitude or more.

OpenAI finished its conversion to a for-profit, completing what I believe is arguably the second largest theft in human history behind the Russian oligarchs of the 1990s. The final terms came as the result of negotiations with the District Attorneys of Delaware and California, and they did extract a lot of highly meaningful concessions, both in terms of compensation and also in helping retain meaningful control and oversight over OpenAI. This could have gone so much worse. But as I said, that’s like a mugger demanded your money, and they got talked down to only taking half your money, then they claim they ‘recapitalized you.’ You’re still out half of your money.

November

We got what may be the final key revelations of what I call OpenAI’s Battle of the Board where the board attempted to fire Sam Altman, as we got Ilya Sustkever’s testimony about what happened. We now know that this was driven by Ilya Sutskever and Mira Murati, and was motivated by ordinary business concerns, centrally Sam Altman’s lying and mistreatment of employees.

I offered my 2025 edition of The Big Nonprofits Post, for those looking to donate, and would later share an update from my nonprofit, Balsa Research.

The year would finish with a flurry of new model releases.

OpenAI started us off with GPT-5.1, a modest upgrade that follows custom instructions well and often glazes the user, and then followed it up with GPT-5.1-Codex-Max, which was a substantial boost in coding power in particular.

Google gave us Gemini 3 Pro, a vast intelligence with no spine and also severe alignment issues and mental problems. It’s a great model, and was clearly now the best at a variety of uses, especially raw intelligence, or a teacher or whom you had questions with known answers that you would ask an autist.

Anthropic then gave us the big one, Claude Opus 4.5, which is for now the clear best model available, and remains my daily driver, both for chat and also in Claude Code.

Claude Opus 4.5 felt like a large practical leap, some like Dean Ball going so far as to call it AGI. I don’t agree but I understand where they are coming from.

December

I went to San Francisco for the Solstice, and wrote Little Echo.

I did the annual movie review.

We learned even more reasons to beware reward mismatches in RL.

OpenAI upgraded again to GPT-5.2, which I evaluated as Frontier Only For The Frontier. Its impressive benchmarks do not reflect its capabilities, and people reacted with fatigue after too many disappointing OpenAI model releases. It’s not an especially ‘fun’ model to interact with, nor is it especially fast, and it currently occupies a sweet spot only for tasks where you need a lot of raw thinking capability and are looking for ‘just the facts’ and cold analysis, and potentially for coding where everyone serious should try various models to see what works best for their tasks.

I offered a sequence of posts on why median wages are up, economists keep saying times are solid, yet young people keep saying things suck. Those complaining often say false things and use statistics wrong, but if so many people think things suck, then you know there’s a problem. I looked into cost changes over time, and when were various things the best. Finally, I presented my thesis, which was that this was due to the Revolution of Rising Expectations and the Revolution of Rising Requirements. Our expectations and comparison points are supremely high, as are the things we legally require of those looking to raise families.

Questions For Next Season

AI is going gangbusters. The news about it is accelerating, not slowing down. It’s going to increasingly impact our lives and be the topic of conversation. The model releases will come fast and furious. The agents will make big leaps in 2026, and not only for coding. It will likely be a major topic in the midterm elections. I don’t expect full High Weirdness in 2026, but you can’t fully rule it out.

Blog growth, in terms of views, stagnated this year. That’s disappointing, as previously I had experienced strong growth, and I likely need to explore additional ways to get the word out. But ‘number go up’ was never the ultimate goal and I am confident that I am directly reaching quite a lot of the people I care about reaching. I do intend to send out a user survey some time in the near future.

One big personal goal for 2026 is to do more coding and evergreen posting, going deeper into questions that matter or that I get curious about, and being better about organizing my thoughts, and to focus less on ephemeral items and news, and to finally get a handle on organizing what I do have to better create longer term resources. I am fully aware that almost all views happen within a few days of posting, but that doesn’t need to dictate anything, and there are some basic things where I could build permanent resources much better than I’ve been doing.

The other big goal is to focus on what matters, including the fights and debates that matter, making sure to do that in a way that adds to permanent resources and not let important things end up buried. I have to do better triage, especially in letting relatively unimportant matters drop. I intend to publish fewer words on the blog in 2026, and with that to become more willing to skip days. I know the amount of content can be overwhelming.

One thing that got lost in the shuffle this year, and illustrates the problem, was my planned review of Open Socrates. It’s a book warning you not to live your life 15 minutes at a time, and I didn’t finish my response because life kept throwing too much stuff at me. Well, that’s kind of the worst possible excuse not to finish that, isn’t it? Even if because of the delay I ultimately have to reread a lot of the book.

I also have a bunch of projects I’d love to try. We’ll see how that goes. But also movies to watch, and games to play, and people to see, and fun to be had. Life beckons.

And you know what? Life is pretty awesome. Other people sing Ald Lang Syne. I go to the Secular Solstice. My personal tradition, at year’s end, is something else entirely.

Happy New Year, everyone.

Double click to interact with video

 

 

 

 

 

 



Discuss

Uncertain Updates: December 2025

2026-01-01 00:20:47

Published on December 31, 2025 4:20 PM GMT

2025 was a rough year for me. My mom died. My cat died. I suffered a concussion, and I had to deal with a few other health issues.

But it was also a good year. I curated my mom’s art. I built an AI oracle. I wrote 37 blog posts, gained 500 MMR in DoTA2, lost 10 pounds, volunteered at 2 conferences, and revised 5 book chapters to make them much, much better. And none of that is to mention all the quality time I got to spend with friends and family and all the cool places I got to visit.

Year boundaries are a good time for setting goals. Here are mine for 2026:

  • finish revisions on Fundamental Uncertainty and get it into print

  • run a conference at Lighthaven (details still in the works, more to come)

  • continue to do whatever I usefully can to prevent existential catastrophes

  • live my life well and love all the people in it

Although I have plenty of reason to worry for the future, I’m generally hopeful, and I look forward to seeing how things unfold in the year to come!



Discuss

Halfhaven Forever

2025-12-31 23:59:06

Published on December 31, 2025 3:59 PM GMT

Scott Alexander, a famous blogger with about 100 000 subscribers, once said:

"Whenever I see a new person who blogs every day, it's very rare that that never goes anywhere or they don't get good. That's like my best leading indicator for who's going to be a good blogger."

We know that "correlation is not causation" and that "when a measure becomes a target, it ceases to be a good measure". But we also know that "practice makes perfect", so...

In November 2025, Lightcone Infrastructure organized The Inkhaven Residency, where 30 people posted 1 blog per day during 30 days, with feedback and mentorship from other successful writers.

For people who couldn't take a month off to participate in the event, a less intense online alternative Halfhaven was provided in October and November 2025, as a Discord group to post links to your blogposts, with a soft target of 1 blog each 2 days.

After the holidays, I think it is time to try again. If you haven't joined Halfhaven before, but you thought about it, this may be the right moment to join. If you were at Inkhaven, and you need an extra nudge to keep writing now that it is over, you are also welcome.

Click here to join: https://discord.gg/FqPppfdHQE

What's different this time

Three months

The writing tempo is reduced -- now you have three months to collect your 30 points. That means: January, February, and March 2026.

The idea is to keep doing this forever; that is, as long as people are interested in doing it. Because, becoming a successful blogger is an ongoing project, not something you just do once and then you stop. But people seem to benefit from having artificial deadlines. So the deadline is every three months. You try to achieve 30 points until the deadline. Then your score resets and you try again. (You are encouraged to collect your points a few weeks before the deadline, and then take a break. Then return in April.)

Points

Previous Inkhaven/Halfhaven rules created a pressure to write short articles, or to split potentially longer articles into multiple parts, because only the number of articles counted, not how long they were (as long as they were at least 500 words long).

To reduce this pressure, so that people can choose more freely between writing short or long articles, as they wish, or as appropriate for the topic, now the articles get a certain number of points -- more for the longer ones, but less than linearly. Specifically:

  • 1 point 🪙  = 500 words or more
  • 2 points 🪙🪙 = 1500 words or more
  • 3 points 🪙🪙🪙 = 3000 words or more
  • 4 points 🪙🪙🪙🪙 = 5000 words or more

Some people wanted to make video posts instead; the scoring for videos is:

  • 1 point 🪙 = 10 minutes or more
  • 2 points 🪙🪙 = 30 minutes or more
  • 3 points 🪙🪙🪙 = 60 minutes or more

So at the end of the third month, you are supposed to have collected 30 points; not necessarily written 30 articles or recorded 30 videos.

I hope these two changes will make the project more sustainable.

Don't worry about the exact date

Previous Inkhaven/Halfhaven rules required users to post exactly/at most one blogpost per day. I think that it is good to avoid self-deception like "I don't feel like writing today, but that's okay, because I can still make multiple posts on the last day", because realistically, that is either not going to happen, or you will feel deeply ashamed for the quality of those last-minute posts. But I am not going to micromanage you.

The only time limit is to post articles between January 1st and March 31st. (Your local time, if that makes a difference. It shouldn't.)

What remains the same

You participate by joining the discord channel, and posting links to your blogposts in the appropriate forum (currently "blogposts-2026-q1"). The recommended format is:

  • how many points for this posts (by default, one point)
  • how many total points you have now (to see your progress easily)
  • a hyperlink to the post
  • the title of the post (optional if it is obvious from the link)

For example:

2 points (total 7) http://example.com/on-blogging.html -- On blogging

Please only post links in the "blogposts-2026-q1". There is another channel, "general-discussion", for the general discussion and feedback to other participants' blogposts.

It is acceptable to award yourself more points for a shorter article or a video, if the extra effort justifies it. The 500 words / 10 minutes standard is meant for a typical essay or recording; if you e.g. composed a poem, or drew a complicated diagram, or created an animation, feel free to assign yourself the points regardless of the words / minutes.

Languages other than English are also accepted, but you only get points for the original content, not for translations, not even translations of your own work. (So if you post a short article in five different languages, you only get one point, for the first version.)

Do not use AI to generate the main content. You can use AI to e.g. generate the images to accompany your text.

Do not post pornography or hate speech or... other problematic content. There is no exact definition, please use your judgment. (This is not a ban on political content in general; just, uhm, please don't be stupid or hateful, m'kay?)

The content has to be publicly available, i.e. not behind a paywall. (It is okay to put it behind a paywall let's say one month after the deadline.)

It is okay if you publish on different places, for example some posts on your private blog, some posts on Less Wrong, and some videos on YouTube. (Actually, please do not post everything indiscriminately on Less Wrong, unless it is the kind of content that you would have posted on Less Wrong anyway.)

Ultimately, this entire project is honor-based. I will not try to detect AI usage, or copyright violation, or whether you counted the words correctly. It's all up to you. (Blatant violations may be addressed by social pressure and/or a ban.)

Importantly, this is not a competition; it does not matter who achieves the 30 points first. Also, after achieving the 30 points, you are supposed to take a break and celebrate, not try to maximize the number of points. In this sense, it is possible for everyone to "win".

References

Related Less Wrong posts:



Discuss

Grading my 2022 predictions for 2025

2025-12-31 23:45:27

Published on December 31, 2025 3:45 PM GMT

Three years ago, back in 2022, I wrote "A Tentative Timeline of The Near Future (2022-2025) for Self-Accountability." Well, 2025 is almost over now, so let's see how well I did! I'll go over each individual prediction, and assign myself a subjective grade based on how close I got to the truth. 

Predictions for 2022

  • Post written by AI with minimal prompting reaches 30+ upvotes on LessWrong
    • Score: probably D. I didn't see any high-karma posts from 2022 which were obviously AI-generated, but frankly, I didn't look very hard. I remember reading a few experimental AI-generated posts, but they were all downvoted pretty badly at the time. There were a lot of posts which included smaller excerpts from AI text, but that's not really what I was aiming for, so I'll say I failed this prediction.
  • AI can regularly fool a randomly-selected (from American population), non-expert judge in a 10-minute Turing test.
    • Score: D-. What in the world was I thinking with this one?? I suspect I severely over-updated on stories like Blake Lemoine claiming Google's AI was sentient, not realizing that a chatbot seeming "intelligent" is very different from an AI seeming "human" to people. I think we've passed this point by now in 2025 (so I won't give myself an F), but I was a few years too early.

Predictions for 2023

  • AI reaches human expert level at MATH benchmark.
  • Famous, well-respected public intellectual announces that they believe AI has reached sentience, deserves rights.
    • Score: C-. By this point, a few famous (or newly famous) people (most notably Blake Lemoine in late 2022) were claiming AI sentience, but as far as I can tell, none of them were particularly "well-respected" or considered serious "public intellectuals" by normative standards. I'd say it's an edge-case if I passed this one or not.
  • AI can now write a book with a mostly consistent plot, given roughly a page of prompting or less.
    • Score: A+. I actually thought that I'd failed this one, but I looked it up, and surprisingly (to me), it seems AI was in fact capable of this by 2023! See, for instance, Death of an Author, a novella supposedly written 95%+ by ChatGPT, and described by New Scientist as "not awful." High praise indeed...
  • "Weak" AGI is announced that can play a randomly-selected game on Steam and get at least one achievement (in games which have Steam achievements enabled) most of the time. This assumes someone bothers to try this in particular, if not it should still be obvious it can be done.
    • Score: F. This still doesn't seem to be fully possible in 2025 (although we might be getting pretty close). It certainly wasn't happening (or obvious it could happen) by the end of 2023.
  • AI proves an "interesting" result in mathematics (as judged by professional mathematicians) with minimal prompting.
    • Score: D+. While I don't believe there were any particularly interesting and original AI proofs produced with minimal prompting in 2023, there were some fascinating results produced with the help of AI. An interesting example of this would be FunSearch. I'd say I didn't do too badly on this prediction, although I still technically failed.
  • Major lawsuit involving AI trained on "stolen artwork" gets in the news
  • It is unclear if artists are actually losing significant amounts of work to AI, but plenty of op-eds get written which assume that premise.
  • I move out of my parent's house, possibly to LA for networking/work reasons, possibly remaining in Virginia, for community-building/health reasons. In a possibly related move, I finally come out to my parents, which probably goes okay, albeit with a small chance of being disowned by my grandparents.
    • Score: C. It happened, but I came out to my parents in early 2024, not 2023. The first half of the prediction can't be scored, as I mentioned both possibilities.
  • S.B.F. somehow remains a free, not-in-jail citizen, and continues to post questionable statements on Twitter.
    • Score: F. S.B.F. was in jail by the end of 2023, and although he was under house arrest for the first seven months of the year, that hardly counts as being a "free" citizen, so I'm failing myself on this one.
  • Anti-EA sentiment mostly dies down, but anti "AI safety" sentiment goes way up. The term has become associated with (perceived) censorship, and right-wing politicians may begin to shun people who use "AI safety" in their public branding. AI governance orgs try to adjust by going for a "national security" public angle. [Note that that last bit is incredibly speculative, and depends on too many factors to predict with any real confidence.]
    • Score: B. It didn't take too long after the fall of S.B.F. for anti-EA sentiment to fade from the public spotlight (although it still exists to some extent, especially after the whole Zizian cult disaster), but anti-AI-safety sentiment certainly seems much higher than it was in late 2022. I'm not quite sure how accurate my latter prediction was, but I don't think I was entirely wrong, so that counts for something, I'd say.
  • Multiple people land well-paying coding jobs and publicly post about how they "don't actually know how to code" (beyond some really basic level), but have been outsourcing everything to AI.
    • Score: C-. As far as I can tell, while people were just beginning to "vibe-code" in earnest, there wasn't much public discussion by the end of 2023 of people with no coding knowledge taking coding jobs. By now it's not that unheard of, but it took a few more years than I thought it would.

Predictions for 2024

  • Assuming Donald Trump is not barred from running, he will become president. If not him, it’s an easy DeSantos win. (Biden is the Democratic nominee of course, assuming he's still alive. As usual, the media pays no attention to third party candidates.)
    • Score: A. I didn't do too badly here. Although Biden stepped down at the end while only a presumptive Democratic nominee, "assuming he's still alive" was kind of marginal, so I'll take partial credit for that anyway. 
  • AI writes a NYT best-selling book.
    • Score: D+. As far as I can tell, this did not happen in 2024. However, it seems actively implausible that AI assistance wasn't used to help write a NYT bestseller this year (though to be fair, I don't have direct proof of that), so I'd consider this a close miss.
  • Twitter is still functional, and most users haven't left the site. The workplace environment is kind of miserable though, and content moderation is still severely lacking (according to both sides of the culture war). Elon Musk is largely washed-up, and won't be doing anything too groundbreaking with the remainder of his life (outside of politics perhaps, which I won't rule out).
    • Score: A? I don't think I did too badly on this one. Twitter (now "X") is still fully functional, and it still has a large userbase. There have been multiple waves of layoffs and plenty of reported internal drama there, which sounds pretty miserable to me. Musk's main focus were his DOGE efforts, so he did go into politics, but outside of that, most people seem to consider him well-past his intellectual prime. Obviously this sort of thing is largely subjective, but I think most people would agree my prediction(s) have held up.
  • A minor celebrity or big-name journalist finally discovers Erik Sheader Smith's video game The Endless Empty for the masterpiece it is, kickstarting its growth as widely-hailed classic of the genre. My own game, Nepenthe, is largely forgotten by history, at least until someone discovers a certain easter egg, which is occasionally mentioned in 40+ minute long Youtube videos (you know the type).
    • Score: C+. My friend's masterpiece has not yet been discovered by big-name celebrities or journalists, but it has experienced an explosion in players and fan-artists from China, who do genuinely seem to regard it as a cult classic. The growth is entirely grassroots for now, however. Meanwhile, my videogame, while not entirely forgotten, isn't exactly growing a large fanbase or anything. It doesn't help I've stepped away from making videogames over the past few years (though I'm considering getting back into it).
  • The social media battle going on between those who firmly believe that AI is "just copy-pasting others work" and those who firmly believe that AI is sentient (and want to free it), has reached enough intensity that it gets brought up a few times in the political news cycle. At least one (possibly fringe) candidate pledges to "protect the rights of artists" through AI legislation.
    • Score: B-. I got things directionally right here I think-- except instead of the opposing view being "AI is sentient/deserves rights," it's "AI is helpful; forget about sentience," for the most part. Politicians did seriously talk about protecting artist's rights with AI legislation in 2024, as evidenced by things like the Generative AI Copyright Disclosure Act.
  • Some new video game nobody has heard about before goes viral among schoolchildren, sparking a wave of incredibly forced puns across news headlines worldwide.
    • Score: F. I'm grading myself harshly on this one. Despite there being a few viral indie game hits (like Balatro) in 2024, none of them really went massively viral among schoolchildren in the way something like Five Nights At Freddy's or Undertale did. I did not notice any wave of forced puns relating to said games, either.
  • China's economy has pretty much recovered from Covid. Other than that, hard to predict, but growth won't look terribly different from the rest of the world.
  • Companies start actually replacing a significant number of customer support jobs with AI. Consumers generally report being more satisfied as a result, to many people's annoyance.
  • Both teachers and students have the ability to easily automate online assignment work, leading to a growing number of absurdist scenarios where algorithms play meaningless educational games while teachers and students do their own thing, unwatching. This is objectively hilarious, but people get mad about it, leading to a poorly-managed escalation of the school surveillance arms race we already see today.
    • Score: A. Another win for my predictive abilities...not so much for the rest of the world. This pretty much came to pass, but I'm not giving myself an A+ because it's not clear to me just how much school surveillance has actually increased as a direct result of AI cheating concerns (though AI-powered school surveillance has certainly increased since 2022).
  • Another billionaire has emerged as an EA mega-donor.
    • Score: D. We still have Dustin Moskovitz (and his wife Cari Tuna) as billionaire mega-donors, but they aren't exactly new on the scene. Sadly, I was wrong about this one.

Predictions for 2025

  • Self-driving cars (and drone delivery) never quite reach market saturation due to some consumer/cultural pushback, but mostly due to legislation over "safety concerns," even if self-driving is significantly safer than human-driven vehicles by this point. However, more and more self-driving-adjacent features are added into "normal" cars for "safety reasons," so it's become increasingly hard to delineate any sort of clear line between AI and human-operated vehicles.
    • Score: A. This seems to be pretty much on the nose! The only potential issue is it's arguably debatable if self-driving is truly "significantly safer" than human driving, mostly due to issues like mass-outages during crises situations. I think it's safer, but I can see how a reasonable person might disagree, so I'm not giving myself an A+.
  • I am in love.
    • Score: A. It's a long and dramatic story, but this isn't the time or place to share it...
  • A mass fatality event occurs due to what could plausibly be interpreted as "misaligned AI." This sparks some countries to pass a whole bunch of AI-related laws which are totally ignored by other countries. The AI safety community is split on if the blame for what happened should be placed on misaligned AI, human error, or some complex mix of both. For whatever reason, a popular language model (developed for entertainment perhaps) publicly takes responsibility, despite seemingly having nothing to do with the incident. For the most part though, this is treated as just another tragedy in the news cycle, and is ignored by most people.
    • Score: D. There was no single "mass fatality event" caused by AI this year. That being said, there have been a significant number of murders and suicides plausibly linked to AI psychosis, which, if considered together, likely resulted in a large number of unnecessary deaths. It's debatable to me if this should count, but I'm leaning against it, as it's not the sort of thing I was envisioning at the time, I think. There have indeed been a number of irregularly enforced AI safety laws passed, but not as many as I would have expected. I was correct that people are split over how much AI is to blame for the deaths that have occurred, but incorrect that an AI would erroneously take the blame on itself for said deaths. And indeed, most people simply ignore the whole thing, and it's not the primary driver of the news cycle this year.
  • Someone who has at some point called themself "rationalist" or "EA" commits a serious crime with the intention of halting capabilities gain at some company or another. This is totally ineffective, everyone agrees that that was like, the least rational or altruistic action they could have possibly taken, but the media runs with exactly the sort of story you'd expect it to run with. This makes AI governance work a bit harder, and further dampens communications between safety and capabilities researchers. Overall though, things pretty much move on.
  • Despite having more funding than ever before, the quality and quantity of AI safety research seems...slightly lesser. It's unclear what the exact cause is, though some point out that they've been having a harder time staying focused lately, what with [insert groundbreaking new technology here].
    • Score: C. AI safety funding is indeed going strong. It is unclear to me if research is better or worse than it was in late 2022, but AI safety research in general seems to have taken a backseat within the largest AI companies, which is worrying. Some research does suggest that using tools like Cursor actually slowed developers down, despite a perception that it was speeding up work, which arguably counts as a partial win for my prediction.
  • Youtube dies a horrible death in a totally unpredictable manner. The whole disaster is retroactively considered clearly inevitable by experts. There is much mourning and gnashing of teeth, but the memes, too, are bountiful.
    • Score: F. This did not happen.
  • The sun rises and the sun falls.
    • Score: A+. This actually happened multiple times!
  • Me and my friends are still alive.
    • Score: B. I am still alive, and so are most of my friends, but there are a few who seem to have disappeared from the internet, and I am worried about them. I hope they are okay, but I have no guarantee, so I don't feel comfortable giving this an A+.

Conclusion

Um...I'm not sure what conclusion to take away from all of this. Predicting the future is hard, and I certainly failed a lot, but also, I was pleasantly surprised to see how much I got right, or at least got directionally correct. It seems like I generally over-updated on the rate of advancement in 2022, and assumed things would move faster than they did. That being said, I really don't think I did too badly compared to those around me at the time, and I'm proud of what I did get right.

Happy New Years!



Discuss