2025-12-23 08:08:57
Published on December 23, 2025 12:08 AM GMT
We might be in a generative AI bubble. There are many potential signs of this around:
If LLMs were truly on the path to AGI[1], I would be expecting the opposite of many of these - opportunities for LLM usage opening up all over the place, huge disruption in the job markets at the same time as completely novel products enter the economy and change its rate of growth. And I would expect the needed compute investments to be declining due to large efficiency gains, with LLM errors being subtle and beyond the ability of humans to understand.
Thus the world does not look like one where LLM-to-AGI is imminent, and looks a lot more like one where generative AI keep on hitting bottleneck after bottleneck - when, precisely, will the LLMs stop hallucinating? When will image composition work reliably[2]?
Remember when GPT 3.5 came out? It did feel that we were on the cusp of a something explosive, with countless opportunities being enthusiastically seized and companies promising transformations in all kinds of domains.
But that didn’t happen. Generative AI has a lot of uses and many good possibilities. But in terms of R&D progress, it now feels like an era of repeated bottlenecks slowly and painfully overcome. LLMs are maturing as a technology, but their cutting-edge performance is improving only slowly - outside of coding, which is showing some definite upswing.
A bubble wouldn't mean that generative AI is useless. It might even be transformative and a huge boost to the economy. It just means that the generative AI companies cannot monetise it to the level required to justify the huge investments being made.
And the investments being made are huge. See arguments like "Big Tech Needs $2 Trillion In AI Revenue By 2030 or They Wasted Their Capex" (if you want a well researched skeptical take on the economics of LLMs, the whole of Ed Zitron blog is a good source - stick to information, not his opinions, and be warned that he is extremely uncharitable towards AI safety).
There are many reasons why generative AI companies might fail at monetising. Since the end of (very weak) training scaling laws, we've been in an "inference" scaling situation, buying and building huge data centers. But that isn't enough for a moat - they need economies of scale, not just a large collection of expensive GPUs.
Because open source models are a few months, maybe a year, behind the top models. If the top LLM companies really become profitable, it will be worth it for others to buy up a small bunch of GPUs, design a nice front end, and run DeepSeek or a similar model cheaply. Unless they can clearly differentiate themselves, this puts a lower bound on what the top companies can charge.
So it's perfectly possible that generative AI is completely transformational and that we are still in an AI bubble, because LLM companies can't figure out how to capture that value.
If LLMs were a quick path to AGIs, then we'd certainly not be in a bubble. So, if we are in a bubble, they're not AGIs, nor the path to AGIs, nor probably the road to the avenue to the lane to the path to AGIs.
And the big companies like OpenAI and Anthropic, that have been pushing the LLM-to-AGI narrative, will take a huge reputational hit. OpenAI especially has been using the "risk" of AGI as a way to generate excitement and pump up their valuation. A technology so dangerous it could end the world - think of what it could do to your stock values!
And if the bubble bursts, talk of AGI and AGI risk will be seen as puffery, as tools of bullshit artists or naive dupes. It will be difficult to get people to take those ideas seriously.
There will be some positives. The biggest positive is that LLMs would not be proto-AGIs: hence there will be more time to prepare for AGI. Another positive is that LLMs may be available for alignment purposes (I'll present one possible approach in a subsequent paper.
Some of these things are things we should probably be doing anyway; others are conditional on generative AI being a bubble. The list is non-exhaustive and intended to start discussion:
In a subsequent post, I'll discuss how we might improve our AGI predictions - almost any advance in computer science could lead to AGI via recursive self-improvement, but can we identify those that are genuinely likely to do so?
I've had very painful experiences trying to use these tools to generate any image that is a bit unusual. I've used the phrase "Gen AIs still can't count" many a time.
Ed will be the kind of person who will be seen as having "been right all along" if there is an AI bubble.
It's paywalled, but he talks about the AI 2027 paper, concluding:
[...] Everything is entirely theoretical, taped together with charts that have lines that go up and serious, scary language that, when boiled down, mostly means "then the AI became really good at stuff."
I fucking hate the people that wrote this. I think they are craven grifters writing to cause intentional harm, and should have been mocked and shunned rather than given news articles or humoured in any way.
And in many ways they tell the true story of the AI boom — an era that stopped being about what science and technology could actually do, focusing instead on marketing bullshit and endless growth.
This isn't a "scenario for the future." It's propaganda built to scare you and make you believe that OpenAI and Large Language Models are capable of doing impossible things.
It's also a powerful representation of the nebulous title of "AI researcher," which can mean everything from "gifted statistician" to "failed philosophy PHD that hung around with people who can actually write software.
Note that, in general, the quality of his arguments and research is much higher than this vitriol would suggest.
2025-12-23 06:01:57
Published on December 22, 2025 10:01 PM GMT
finding agents in raw dynamics
Related: Formalizing «Boundaries» with Markov blankets, «Boundaries» Sequence
Suppose you are given time series data from a simulated ecology (such as Smallville). You can usually tell what the agents in there are doing; at least if the variables are properly labeled. If they are not labeled but just numeric values, your first task would be to reverse-engineer the mapping to the conceptual model behind the simulation (or whatever generated the time series data). That might not be feasible, but let's ignore that for now. Because there is a bigger problem and that is a result of any labelling where existing or reverse engineered: You introduce an ontology. Once you call something an agent (label="player1_position"), you assume that those variables belong to that agent. That works well because our intuitions about agents are pretty good.
Until it doesn’t. Our intuitions attribute agency where it isn't. Our ancestors anthropomorphized nature. We may attribute intent where there’s only coupling. It is more dangerous to overlook an agent than to see one too many.
But we may miss intent when it is distributed or doesn't neatly fit into our ontologies of physically bounded agents.
If you need to find an agent, esp. a potentially powerful agent that may work very differently from our intuitions, we need a method that can discover agents in raw unlabeled data without using a prior ontology (or, in other words, without a unit that is already known to be an agent).
That's two problems: 1. without prior ontology and 2. finding in raw data.
For the first, as part of agent foundations work, johnswentworth, and others have proposed modeling an agent in terms of its boundary that shields inner states from outer change. A natural modeling seemed to be a Markov blanket.

Quick recap: The blanket is a partitioning[1] of all variables into four sets:
There are some problems with Markov blankets, but these seem solvable:
Thus, it may make more sense to talk about invariant ε-boundaries when referring to such agents. Where the Markov blanket is not determined from raw observables, but over parameters invariant[2] under transformation, and the understanding that the ε is chosen to minimize predictive regret for predicting internal from external (invariant) variables.
Give such a formalization of an agent as a Markov blanket, how can we find it in the raw data? While there are efficient algorithms to find the blanket of a single variable, and there is some recent work on the relation between Markov Blanket Density and Free Energy Minimization, I'm not aware of any implementation that uses blankets to find agents. One problem is that we do not have a nice causal graph[3] that we could inspect structurally for the blanket property. We have to check a lot of variables statistically.
Thus as a first contribution, I offer an implementation of Unsupervised Agent Discovery (UAD), i.e., of how it might be possible to find such agents in unlabeled raw time series data.
The implementation takes a raw dataset and executes these steps:
The implementation also includes a simulation of a configurable number of agents (simple state machines for simple controllers) in a simplified shared environment from which a dataset of all their observables is generated.
From implementing and running these simulations, I learned
Discovering agents is nice. Now that we can do that, what else can we do? We can try to understand the internal states better. We can try to understand how the agent represents itself and its environment. This is the second contribution of the post (Note: These is not yet part of the implementation).
We can treat memory as a compression[4] of past inputs that inform future actions.
For each internal variable m∈I, and lag k, compute . If Δm(k) is large, then the past of m predicts the agent’s next internal state. We have to be careful because a variable may look like memory just because it is changing slowly.
If we can track inputs and outputs of an agents, we can try to infer what the implied policy behind this input-output relation is. We can infer R(I,A) with inverse reinforcement learning (IRL). Given a P(R) that weighs reward functions[5], we can use the standard formulation . We could call these the agent's goals. Though with simple modeling, we can't yet represent issues like Mesa Optimization.
If we can find two agents and their memory, we can ask if one of them represents the other, or, technically, if some memory predict beyond what X already does: . A large Δm means that some part of X is predicting Y.
I'm not clear if this refutes How an alien theory of mind might be unlearnable or not.
Once you can track what agents know about each other and what objectives they follow, you can, in principle, derive how well these agents do or do not cooperate.
You can calculate:
Cooperation is favored when the fraction . We can call κ a cooperativity index; a generalization of Hamiltons's rule[7]. If we look at the full induced cooperation graph between agents, where edges are weighted by κ, we can use percolation theory to determine at which level cooperation will become universal (a giant component) due to The Evolution of Trust.
The promise of this approach is that we could look at a complex system with many potential agents large and small, human, legal, or artificial, and determine at least an approximation of the cooperation structure and whether there are adversarial tendencies.
With all these promises, there are potentially serious problems.
The best blanket algorithm fails if we cannot observe sufficient details and especially the internals of agents of interest - for sure we can't look into humans. The theoretical arguments above even model why agents wouldn't want transparency in some cases. And if we use proxies we loose reliability. As a consolation, as we are most interested in powerful AI, we might have access to the internals of them at least.
If any variable we overlook is a common cause we can mistake agent boundaries and fail to separate and thereby fail to identify the crucial agents. I'm mostly thinking about complex LLM-like agents here. An LLM is distributed across multiple computers and may depend on human operators for its function but still be powerful. Agents running on LLMs are also hard to find with slot-based approaches
Things may change. The method, as currently conceived, is sample hungry, and if anything changes during the sampling period, this change will be covered and may interfere with the discovery. And also, some types of learning that change the policy of the agent (which might even be in response to the discovery process) may manifest only after the sampling period.
There are concerns that the calculations might not be statistically stable and esp. not computationally feasible. Currently, the algorithm is combinatorial in the number of variables. My current argument here is that we have an existence proof: Humans have learned to identify agents in complex data fairly effectively, and it should be possible to reproduce that in a specialized algorithm.
At this point, Unsupervised Agent Discovery doesn't give you an algorithm that discovers agents in real-world data yet. But it provides a precise way to talk about agents, their goals and cooperation, and many other things we care about that usually require an a priori notion of an agent, but can now be grounded in physics.
Github repo: https://github.com/GunnarZarncke/agency-detect/tree/master
My initial dense LaTeX/PDF writeup of UAD can be found here.
Many thanks to the reviewers Jonas Hallgren, Chris Pang, and Peter Kuhn. Additional thanks go to the team at AE studio that supported the development of the experiments and write-up with time and compute.
Feedback welcome, especially from the agent foundations people.
by using the condition
from M. D. Kirchhoff, T. Parr, E. Palacios, K. Friston, and J. Kiverstein, “The Markov blankets of life: autonomy, active inference and the free energy principle,” J. R. Soc. Interface, vol. 15, no. 138, 2018.
The invariants may themselves depend on the agents' dynamics, making a simple layer-by-layer inference infeasible.
If we could intervene on the simulation/experiment, we could determine the causal structure as done in the Deepmind Discovering Agents paper. That is also how humans check whether something is an agent or not: we prod it and see if it evades. Is is a promising direction but was beyond the scope of this work.
In evolutionary environments, agents with memory of past input that may be relevant to future outputs that affect survival will outcompete agents without. And agents with more compact memory will outcompete agents with larger memory but the same predictive effect.
This weighing P(R) is often seen as arbitrary or in need of justification, but here we are closer to the underlying substrate. In most environments of interest, we can argue that there will be entropic forces that select for simpler policies and lower prediction errors.
This residual mutual information between agents’ actions, internal models, or rewards is not indicate a failure of separation. It captures alignment without leakage, e.g. from shared task structure, common external drivers, following the same conventions, or algorithmic similarity.
Hamilton's rule says that genes for a particular behavior should increase in frequency when rB>C where r = the genetic relatedness, B = reproductive benefit, and C = the reproductive cost
W. D. Hamilton, “The genetical evolution of social behaviour,” J. Theor. Biol., vol. 7, no. 1, pp.1–16, 1964.
2025-12-23 05:56:59
Published on December 22, 2025 9:56 PM GMT
Here are some example latents taken from the residual stream SAEs for Gemma V3 27B IT.
This release contains SAEs trained on 3 different sites (residual stream, MLP output and attention output) as well as MLP transcoders (both with and without affine skip connections), for every layer of each of the 10 models in the Gemma 3 family (i.e. sizes 270m, 1b, 4b, 12b and 27b, both the PT and IT versions of each). For every layer, we provide 4 models (widths 16k and 262k, and two different target L0 values). Rather than giving the exact L0s, we label them "small" (10-20), "medium" (30-60) and "big" (90-150).
Additionally, for 4 layers in each model (at depths 25%, 50%, 65%, 85%) we provide each of these single-layer models for a larger hyperparameter sweep over widths and L0 values, including residual stream SAEs with widths up to 1m for every model.
Lastly, we've also included several multi-layer models: CLTs on 270m & 1b, and weakly causal crosscoders trained on the concatenation of 4 layers (the same 4 depths mentioned above) for every base model size & type.
All models are JumpReLU, trained using a quadratic L0 penalty along with an additional frequency penalty which prevented the formation of high-frequency features. We also used a version of Matryoshka loss during training, which has been documented to help reduce the instance of feature absorption.
If you're interested in finding features connected to certain behavioural traits (to perform steering, or to better attribute certain model behaviours, or analyze directions you've found inside the model using supervised methods etc), we recommend using the residual stream models trained on a subset of the model layers (e.g. here). The 262k-width models with medium L0 values (in the 30-60 range) should prove suitable for most people, although the 16k and 65k widths may also prove useful. All the examples in the screenshots above were from 262k-width medium-L0 SAEs finetuned on Gemma V3 270m IT.
If you're interested in doing circuit-style analysis e.g. with attribution graphs, we recommend using the suite of transcoders we've trained on all layers of the model, e.g. here. Affine skip connections were strictly beneficial so we recommend using these. Models with larger width lead to richer analysis, but the computational cost of circuit-style work can grow very large especially for bigger base models, so you may wish to use 16k width rather than 262k. Neuronpedia will shortly be hosting an interactive page which allows you to generate and explore your own attribution graphs using these transcoders.
Here's all the relevant links to go along with this release:
The ARENA material will also be updated to use this new suite of models, in place of the models from the 2024 Gemma Scope release.
2025-12-23 05:32:52
Published on December 22, 2025 9:32 PM GMT
Balsa Research is a tiny nonprofit research organization currently focused on quantifying the impact of the Jones Act on the American economy, and working towards viable reform proposals.
While changing century-old policy is not going to be easy, we continue to see many places where there is neglected groundwork that we’re well positioned to do, and we are improving at doing it with another year of practice under our belts. We’re looking to raise $200,000 to support our work this giving season, though $50,000 would be sufficient to keep the lights on, and we think we are also well positioned to do more with more funding.
Donations will support Balsa’s policy advocacy, either in Jones Act and maritime policy reform or potentially in other planned cause areas (housing reform and NEPA reform) if there is capacity to significantly expand.
Donate here to fund our mainline policy work.
One additional possibility for Balsa, which would be funded entirely separately if it did happen, is for Zvi Mowshowitz to use Balsa as a piece of philanthropic infrastructure to help guide new philanthropic money coming online in 2026 if there is demand. Contact us ([email protected]) if you would like to be involved in such an effort in any capacity, or want to authorize this as a potential use of your funds.
Donate here if you are interested in contributing fully flexible funding.
Quite early in the year, Balsa’s plans for Jones Act investigative work was derailed by a certain Section 301 Investigation, which I wrote about here. In short, the USTR was proposing two significant changes to maritime transport: a $3-5 million fee for Chinese-built ships to deliver imports to American ports, and new, Jones Act-tier restrictions to up to 20% of American maritime exports. All of American industry focused on lobbying against the legibly bad first proposal, sadly no one else was on the ball about how bad the second proposal was because it required a slightly more sophisticated argument. So Balsa stepped in and wrote up a public comment and presented it to the USTR during their public hearing on the proposal. At least in part due to our research and our outreach to maritime industry players, this proposal was basically entirely axed.
After our mid-year write-up on the whole adventure, Balsa did also end up submitting a second comment in response to what we felt was a deeply counterproductive tariff scheme in the updated proposal. This was the first arc played out in miniature; after functionally scrapping both major proposals from the first round, the USTR was proposing that an increasing percentage of American LNG must be shipped out on U.S.-built LNG tankers (there are currently zero in the fleet and no capacity for the shipyards to build any new ones) and that all port crane parts made in China be subject to 100% tariffs. Everyone focused on lobbying against the first policy change which was obviously bad, the second was bad in a more subtle way. So it was once again up to Balsa to point out that the exact setup of the port crane tariffs were structured in a way counterproductive to the stated U.S. policy, would incentivize American ports to buy their cranes from Chinese manufacturers instead of manufacturers in allied countries (there is no domestic port crane manufacturing capacity), and negatively impact port revitalization investments that need to happen.
One piece of good news is that President Trump signed a trade deal with China in November, which resulted in a one-year suspension of all of the punitive measures proposed in the Section 301 investigation. We think there’s a decent chance that the suspension might become indefinite, but it still seemed like a good use of our time to write up our objections should the measures resume in 2026.
We also worked on the Jones Act. We launched a new RFA to investigate the labor impacts of the Jones Act. This is meant to complement our first RFA, which invites academics to look at the economic impacts of the Jones Act. Both are open for applications!
You may also recall that we had already given out grants for two different studies under the first RFA, on economic impacts. These papers are still in the process of being written. We remain confident in both teams and look forward to seeing their results in 2026.
We shored up a few places where we felt like some of the groundwork done by others on the Jones Act were either neglected or outdated. We published two pieces: The Jones Act Index, which works as a very short overview of all the myriad dysfunctions of the current domestic maritime industry, and an operational analysis of what exactly the 93 extant Jones Act eligible vessels get up to.
Besides all that, there is of course the frustratingly intangible work of networking and building a deeper understanding of the shape of the problem. We conducted over forty conversations with stakeholders across the maritime policy landscape, including domestic shipping operators, port executives, and congressional staff. These conversations directly informed our operational analysis of Jones Act vessels and helped us identify which reform framings resonate (and which don't) with different constituencies. We've compiled this primary research into internal documentation mapping stakeholder positions, constraints, and potential pressure points—groundwork that will directly inform our policy binder and draft reform proposals.
Additionally, in the last few months of the year, we brought on a very part-time contractor to help with shipping out more of our policy work.
A breakdown of our 2025 spend to the nearest thousand, for a total of ~$143k:
Considering Balsa’s size, unless fundraising goes exceedingly well, we plan to stay focused on the Jones Act and maritime policy until we crack this nut (i.e. deliver the policy binder) instead of diverting attention across different policy streams.
Currently, the people working on Balsa work are Jenn (me, full time-ish), a contractor who works ten hours a week, plus Zvi Mowshowitz in an advisory capacity. In 2026, we’d like to bring our contractor or another policy analyst on full time, because my own time is somewhat constrained by the overhead of maintaining a 501(c)(3) nonprofit. The amount of funding we have in reserve gives us a decent amount of runway, but is insufficient for our grantmaking and hiring ambitions.
We’re looking to raise $200,000, which would be enough to bring on our contractor full-time and give us a reasonable amount of buffer for additional research funding that we would like to disburse. However, we think $50,000 is the minimum for Balsa to be viably funded to the end of 2026.
Here’s what we plan on doing in 2026, should we hit our fundraising goal:
This is the core deliverable that everything else feeds into, that was waylaid by our Section 301 work. The binder will include a short executive summary of the case for reform; one-pagers on specific impacts; a longer technical document synthesizing our funded research and the existing literature; and a FAQ addressing common objections. Much of the work is filling gaps identified through stakeholder conversations, and interpreting the information for specific audiences.
Both teams are expected to submit their papers in 2026. Once results are in, we'll write accessible summaries for non-academic audiences, brief interested Hill offices, and incorporate findings into the policy binder.
The labor angle is underexplored in existing Jones Act research and useful for engaging unions constructively. We're looking for proposals examining questions like: How many jobs does the Jones Act actually protect, and in which states? What's the counterfactual employment picture under reform? What are the job creation effects in industries currently harmed by high shipping costs? A rigorous study here could shift the conversation toward a more nuanced understanding of net labor market effects.
The one-year suspension of Section 301 measures expires in late 2026, and if negotiations with China stall, the proposed port fees and export restrictions could return; we'll track developments and be prepared to submit updated comments or testimony. The SHIPS for America Act proposes expanded cargo preference requirements facing similar vessel availability problems to those we identified in Section 301, and we're developing analysis of cargo preference laws we can deploy if this legislation gains momentum. The goal is readiness to contribute when high-leverage, without letting monitoring consume time that should go toward the policy binder.
We can do even more with additional resources:
Donating is not the only way to give. If you have experience with maritime shipping, naval procurement, connections to labor unions, or anything else you think might be relevant to Jones Act reform, we'd be interested in talking to you and hearing your perspective. Get in touch at [email protected] and let us know how you might be able to help, whether that's sharing your insights, making introductions, or contributing in other meaningful ways.
If you're an economist positioned to publish in peer-reviewed journals, please consider applying to our economy or labor RFAs, and doing direct research on the issue. If you have friends who fit that profile and might be interested in this kind of work, please consider forwarding the RFAs their way.
Balsa Research is still a very small organization (me, another policy analyst at ten hours per week, and Zvi in an unpaid, very part-time advisory role) and our progress this year has been possible only through the generous support of our donors and the many people who have shared their time and expertise with us. We're grateful for this community of supporters and collaborators who continue to believe in the importance of this work.
2025-12-23 05:20:45
Published on December 22, 2025 9:20 PM GMT
This post provides an overview of the sequence and covers background concepts that the later posts build on. If you're already familiar with AI alignment, you can likely skim or skip the foundations section.
This sequence explains the difficulties of the alignment problem and our current approaches for attacking it. We mainly look at alignment approaches that we could actually implement if we develop AGI within the next 10 years, but most of the discussed problems and approaches are likely still relevant even if we get to AGI through a different ML paradigm.
Towards the end of the sequence, I also touch on how competently AI labs are addressing safety concerns and what political interventions would be useful.
Because in my opinion, no adequate technical introduction exists, and having more people who understand the technical side of the current situation seems useful.
There are other introductions[1] that often introduce problems and solution approaches, but I don’t think people get the understanding to evaluate whether the solution approaches are adequate for solving the problems. Furthermore, the problems are often presented as disconnected pieces, rather than components of the underlying alignment problem.
Worse, even aside from introductions, there is rarely research that actually looks at how the full problem may be solved, rather than just addressing a subproblem or making progress on a particular approach.[2]
In this sequence, we are going to take a straight look at the alignment problem and learn about approaches that seem useful for solving it - including with the help from AIs.
Any human or AI who wants to technically understand the AI alignment problem. E.g.:
I am an AI alignment researcher who worked on alignment for 3.5 years, more in this footnote[3].
Here are the summaries of the posts written so far [although as of now they are not yet published]. This section will be updated as I publish more posts:
[Those two posts should get posted within the next 2 weeks, possibly tomorrow. After that it may take a while, but hopefully around 1 post per month on average.]
The orthogonality thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal.
In particular, being smart does not automatically cause an agent to have “better” values. An AI that optimizes for some alien goal won’t just realize when it becomes smarter that it should fill the universe with happy healthy sentient people who live interesting lives.
If this point isn’t already obvious to you, I recommend reading this page.
If you know almost nothing about how current AIs work, watch this brief video. More knowledge isn’t required for following along, although feel free to watch some more videos in that series.
Intelligence is the power that allows humanity to build skyscrapers, cure diseases, and walk on the moon.
I expect the power of intelligence is already obvious to you if you’re reading this, but here is a nice video about it: The Power of Intelligence (text version here).
Once AI systems become as good as the best humans at AI research, we likely get an intelligence explosion: smarter AIs can make faster AI progress, which leads to even smarter AIs even faster, and so on.
Since smarter minds can often make much faster progress on intellectual problems, this feedback loop seems likely to be superexponential - perhaps hyperbolical[4], where in theory an infinite amount of progress would happen in finite time, although of course in practice only until you run into limits.
The upper limits on machine intelligence appear to be extremely high. The human brain's learning algorithm is likely far more efficient than current deep learning methods—yet the brain itself is almost certainly nowhere near optimal. And AI hardware is remarkably powerful: a single H100 GPU can perform around 2×10¹⁵ operations per second, which may be comparable to or exceed the brain's computational throughput, depending on how you estimate it. xAI’s Colossus datacenter has a compute capacity equivalent to 300,000-350,000 H100 GPUs. Those chips have very high communication bandwidth, so in principle a datacenter could operate more like one gigabrain than lots for individual brains needing to communicate slowly like humans.
We cannot easily imagine what a mind much smarter than humans would be like. One useful Eliezersubstitute is to imagine a large civilization of supergeniuses, all running at 10,000× human speed, with perfect memory and the ability to share knowledge instantly. (For a vivid exploration of this, see Yudkowsky's short story That Alien Message (video here).)
Technologies that currently seem far off, like advanced nanotechnology, might arrive much sooner than we'd expect from extrapolating human research progress, because a superintelligence can make much much faster progress than humanity combined.
For more on these dynamics, see Optimization and the Intelligence Explosion and AI 2027 (video here).
Although timelines don’t play a huge role for this sequence, I want to briefly mention that superhumanly intelligent AI might come soon.
Measurements by METR show the task-completion time horizon of AIs seems to be consistently doubling around every 6-7 months.
Prediction markets reflect substantial probability of near-term AGI. The Manifold Markets AGI series currently shows ~9% by 2027, ~26% by 2029, ~36% by 2030, and ~50% by 2033.
The team behind AI2027—expert AI researchers and forecasters—expected 2027 to be the most likely year in which AGI might be developed (although not their median guess) at the time of publication, although they now predict timelines to be a chunk longer.
Even if current approaches plateau, history suggests another paradigm shift (like transformers were for deep learning) is likely within the next 15 years.
Like the DeepMind AGI Safety Video Course or AIsafety.dance (although I only very roughly skimmed the latter). ↩︎
The most notable exception for deep learning (DL) alignment is Joe Carlsmith’s sequence on “how do we solve the alignment problem”. I have some disagreements and don’t think it does a great job at clarifying the key difficulties of alignment, but hey, lots of credit to Joe for writing that sequence! There are some other attempts towards an overall discussion of alignment by Holden Karnofsky, and if you’re charitable you could count the Deepmind Safety Plan, but not much. ↩︎
I basically started to take an ambitious shot at the alignment problem—tried to think concretely about how we might be able to create very smart AI with which we could make the future end up well, which gave me a decent theoretical understanding of key difficulties. It looked to me that we might need a much more understandable and pointable AI paradigm, so I went to work on that. I started out from agent foundations and ontology identification research, and then developed my own agenda for better understanding minds, which involves more concrete analysis of observations. To be clear, that was a longshot and I hoped we had more than 10 years of time left. Even though it was not the main focus of my research, I still know quite a lot about hopes for DL alignment, and this fall I’ve been reading up in more detail on some hopes in order to better evaluate how feasible DL alignment is. Also feel free to check my LW page. ↩︎
Hyperbolical growth isn’t unprecedented—the human economy grew hyperbolically until around 1960. Since then it is “only” growing exponentially, presumably because the relative population growth rate went down a lot. If anything, we should expect the returns of higher intelligence to be even larger than the returns of having more humans. Especially above genius level, small increases in intelligence can have a very outsized impact. E.g. Einstein was able to solve some problems faster than the rest of humanity combined could’ve—he postulated multiple correct theories where most physicists at his time thought for years that they were wrong. ↩︎
2025-12-23 04:09:49
Published on December 22, 2025 8:09 PM GMT
It's been a while since I hosted a competition. This time there is prize money! $500.
Do you think you're a better writer than me? Now is your chance to prove it.
I want to know whether human beings can pass for me. Your mission, should you choose to participate, is to write a Less Wrong post that is indistinguishable from one that I would write.
All timezones are Berkeley Time (UTC−12).
This schedule is tentative because I often encounter technical difficulties when running competitions like this.
Lorem Ipsum
[This is an entry for lsusr's write-like-lsusr competition.]
Dolor sit amet.
If you violate these constraints, then you will be disqualified because it should be obvious to everyone that the entry was not written by lsusr. (Except the AI constraint, which is largely honor system.)
The judging of this competition will be done by prediction markets.
I will create one YES/NO question on Manifold for each entry. Each entry will be seeded with 100 mana. I have a total budget of 10,000 mana. If >100 posts are submitted, then I may have to ditch the original plan and use a more complicated system instead. Maybe I'll increase the mana budget. Hopefully Manifold will not ban me for creating too many markets? IDK. I have never done this before.
You may notice that 10,000 mana is worth significantly less than the $1,000 prize pool. You may wonder whether arbitrage via deliberate manipulation of the prediction markets is permitted. Yes, it is permitted. Does this mean that that you are allowed to buy a bunch of YES or NO right before the prediction markets expire? Yes it does.
Does this mean you are allowed to run a propaganda campaign in the comments? Yes, but please keep it contained to this competition. Feel free to keep your misinformation compaign confined to the comments section of posts (including this one) that have voluntarily opted into the competition.
Q: I have a post idea that I don't think would be mistaken for an lsusr post, but it's high-effort and human-written. May I enter it in the competition?
A: Yes.
Q: I want to win by manipulating the prediction markets without writing a post myself. What do I do?
A: Maybe team up with someone writing a real entry? There are no rules against collaboration. Plots are fun!
Q: Who funded this?
A: Me.
Q: Can I volunteer to help with this competition?
A: Yes! Creating, managing and resolving the prediction markets is a hassle I'd rather someone else deal with. If you'd like to volunteer for this duty, please message me. (If you are the one managing the prediction markets, then you are disqualified from participating in the competition and manipulating the markets.)
The winner will be whoever has the highest % likely "written by lsusr". The 2nd place will be whoever has the second-highest % likely "written by lsusr". (Assuming there are <100 entries.)
Prizes will be distribited by Venmo. If you don't want to receive money via Venmo, then I can instead donate 110% of your prize to the charity of your choice or 120% to Our Partners in Health. You may request prizes in kind in lieu of a cash payment.
I don't anticipate changing the rules, but just in case something goes horribly wrong or I made a mistake somewhere, I reserve the right to revise the rules over the next 48 hours.
<coming soon>