MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

Life at the Frontlines of Demographic Collapse

2026-02-14 14:30:49

Published on February 14, 2026 6:30 AM GMT

Nagoro, a depopulated village in Japan where residents are replaced by dolls.

In 1960, Yubari, a former coal-mining city on Japan’s northern island of Hokkaido, had roughly 110,000 residents. Today, fewer than 7,000 remain. The share of those over 65 is 54%. The local train stopped running in 2019. Seven elementary schools and four junior high schools have been consolidated into just two buildings. Public swimming pools have closed. Parks are not maintained. Even the public toilets at the train station were shut down to save money.

Much has been written about the economic consequences of aging and shrinking populations. Fewer workers supporting more retirees will make pension systems buckle. Living standards will decline. Healthcare will get harder to provide. But that’s dry theory. A numbers game. It doesn’t tell you what life actually looks like at ground zero.

And it’s not all straightforward. Consider water pipes. Abandoned houses are photogenic. It’s the first image that comes to mind when you picture a shrinking city. But as the population declines, ever fewer people live in the same housing stock and water consumption declines. The water sits in oversized pipes. It stagnates and chlorine dissipates. Bacteria move in, creating health risks. You can tear down an abandoned house in a week. But you cannot easily downsize a city’s pipe network. The infrastructure is buried under streets and buildings. The cost of ripping it out and replacing it with smaller pipes would bankrupt a city that is already bleeding residents and tax revenue. As the population shrinks, problems like this become ubiquitous.

The common instinct is to fight decline with growth. Launch a tourism campaign. Build a theme park or a tech incubator. Offer subsidies and tax breaks to young families willing to move in. Subsidize childcare. Sell houses for €1, as some Italian towns do.

Well, Yubari tried this. After the coal mines closed, the city pivoted to tourism, opening a coal-themed amusement park, a fossil museum, and a ski resort. They organized a film festival. Celebrities came and left. None of it worked. By 2007 the city went bankrupt. The festival was canceled and the winners from years past never got their prize money.

Or, to get a different perspective, consider someone who moved to a shrinking Italian town, lured by a €1 house offer: They are about to retire. They want to live in the country. So they buy the house, go through all the paperwork. Then they renovate it. More paperwork. They don't speak Italian. That sucks. But finally everything works out. They move in. The house is nice. There's grapevine climbing the front wall. Out of the window they see the rolling hills of Sicily. In the evenings, they hears dogs barking in the distance. It looks exactly like the paradise they'd imagined. But then they start noticing their elderly neighbors getting sick and being taken away to hospital, never to return. They see them dying alone in their half-abandoned houses. And as the night closes in, they can't escape the thought: "When's my turn?" Maybe they shouldn't have come at all.

***

The instinctive approach, that vain attempt to grow and repopulate, is often counterproductive. It leads to building infrastructure, literal bridges to nowhere, waiting for people that will never come. Subsidies quietly fizzle out, leaving behind nothing but dilapidated billboards advertising the amazing attractions of the town, attractions that closed their gates a decade ago.

The alternative is not to fight the decline, but to manage it. To accept that the population is not coming back and ask a different question: how do you make a smaller city livable for those who remain? In Yubari, the current mayor has stopped talking about attracting new residents. The new goal is consolidation. Relocating the remaining population closer to the city center, where services can be still delivered, where the pipes are still the right size, where neighbors are close enough to check on each other.

Germany took a similar approach with its Stadtumbau Ost, a federal program launched after reunification to address the exodus from East to West, as young people moved west for work, leaving behind more than a million vacant apartments. It paid to demolish nearly 300,000 housing units. The idea was not to lure people back but to stabilize what was left: reduce the housing surplus, concentrate investment in viable neighborhoods, and stop the downward spiral of vacancy breeding more vacancy. It was not a happy solution, but it was a workable one.

Yet this approach is politically toxic. Try campaigning not on an optimistic message of turning the tide and making the future as bright as it once used to be, but rather by telling voters that their neighborhood is going to be abandoned, that the bus won’t run anymore and that all the investment is going to go to a different district. Try telling the few remaining inhabitants of a valley that you can’t justify spending money on their flood defenses.

Consider the España Vaciada movement representing the depopulating interior of Spain, which has achieved some electoral successes lately. It is propelled by real concerns: hospital patients traveling hours to reach a proper facility, highways that were never expanded, banks and post offices that closed and never reopened. But it does not champion managed decline. It champions the opposite: more investment, more infrastructure, more services. Its flagship proposal, the 100/30/30 plan, demands 100-megabit internet everywhere, no more than 30 minutes to basic services, no more than 30 kilometers to a major highway. They want to reopen what was closed. They want to see more investment in healthcare and education. They want young people back in the regions.

And it’s hard to blame them. But what that means on the ground, whether in Spain or elsewhere, is that the unrewarding task of managing the shrinkage falls to local bureaucrats, not to the elected politicians. There’s no glory in it, no mandate, just the dumpster fire and whatever makeshift tools happen to be at hand.

***

You can think of it as, in effect, a form of degrowth. GDP per capita almost always falls in depopulating areas, which seems counterintuitive if you subscribe to zero-sum thinking. Shouldn’t fewer people dividing the same economic pie mean more for each?

Well, no. It’s a negative-sum game. As the town shrinks, the productive workforce, disheartened by the lack of prospects, moves elsewhere, leaving the elderly and the unemployable behind. Agglomeration effects are replaced by de-agglomeration effects. Supply chains fragment. Local markets shrink. Successful firms move to greener pastures.

And then there are the small firms that simply shut down. In Japan, over half of small and medium-sized businesses report having no successor. 38% of owners above 60 don’t even try. They report planning to close the firm during their generation. But even if they do not, the owner turns seventy, then seventy-five. Worried clients want a guarantee of continued service and pressure him to devise a succession plan. He designates a successor — maybe a nephew or a son-in-law — but the young man keeps working an office job in Tokyo or Osaka. No transfer of knowledge happens. Finally, the owner gets seriously ill or dies. The successor is bewildered. He doesn’t know what to do. He doesn’t even know whether it’s worth it. In fact, he doesn’t really want to take over. Often, the firm just falls apart.

*** So what is being done about these problems?

Take the case of infrastructure and services degradation. The solution is obvious: manage the decline by concentrating the population.

In 2014, the Japanese government initiated Location Normalization Plans to designate areas for concentrating hospitals, government offices, and commerce in walkable downtown cores. Tax incentives and housing subsidies were offered to attract residents. By 2020, dozens of Tokyo-area municipalities had adopted these plans.

Cities like Toyama built light rail transit and tried to concentrate development along the line, offering housing subsidies within 500 meters of stations. The results are modest: between 2005 and 2013, the percentage of Toyama residents living in the city center increased from 28% to 32%. Meanwhile, the city’s overall population continued to decline, and suburban sprawl persisted beyond the plan’s reach.

What about the water pipes? In theory, they can be decommissioned and consolidated, when people move out of some neighborhoods. At places, they can possibly be replaced with smaller-diameter pipes. Engineers can even open hydrants periodically to keep water flowing. But the most efficient of these measures were probably easier to implement in the recently post-totalitarian East Germany, with its still-docile population accustomed to state directives, than in democratic Japan.

***

And then there’s the problem of abandoned houses.

The arithmetic is brutal: you inherit a rural house valued at ¥5 million on the cadastral registry and pay inheritance tax of 55%, only to discover that the actual market value is ¥0. Nobody wants property in a village hemorrhaging population. But wait! If the municipality formally designates it a “vacant house,” your property tax increases sixfold. Now you face half a million yen in fines for non-compliance, and administrative demolition costs that average ¥2 million. You are now over ¥5 million in debt for a property you never wanted and cannot sell.

It gets more bizarre: When you renounce the inheritance, it passes to the next tier of relatives. If children renounce, it goes to parents. If parents renounce, it goes to siblings. If siblings renounce, it goes to nieces and nephews. By renouncing a property, you create an unpleasant surprise for your relatives.

Finally, when every possible relative renounces, the family court appoints an administrator to manage the estate. Their task is to search for other potential heirs, such as "persons with special connection," i.e. those who cared for the deceased, worked closely with them and so on. Lucky them, the friends and colleagues!

Obviously, this gets tricky and that’s exactly the reason why a new system was introduced to allows a property to be passed to the state. But there are many limitations placed on the property — essentially, the state will only accept land that has some value.

In the end, it's a hot potato problem. The legal system was designed in the era when all property had value and implicitly assumed that people wanted it. Now that many properties have negative value, the framework misfires, creates misaligned incentives and recent fixes all too often make the problem worse. Tax penalties meant to force owners to renovate only add to the costs of the properties that are already financial liabilities, creating a downward price spiral.

Maybe the problem needs fundamental rethinking. Should there be a guaranteed right to abandon unwanted property? Maybe. But if so, who bears the liabilities such as demolishing the house before it collapses during an earthquake and blocks the evacuation routes?

***

Well, if everything is doom and gloom, at least nature benefits when people are removed from the equation, right?

Let’s take a look.

Japan has around 10 million hectares of plantation forests, many of them planted after WWII. These forests are now reaching the stage at which thinning is necessary. Yet because profitability has declined — expensive domestic timber was largely displaced by cheap imports long ago — and the forestry workforce was greatly reduced, thinning often does not occur. As a result, the forests grow too dense for light to penetrate. Little or nothing survives in the understory. And where something does manage to grow, overpopulated deer consume new saplings and other vegetation such as dwarf bamboo, which would otherwise help stabilize the soil. The result is soil erosion and the gradual deterioration of the forest.

The deer population, incidentally, is high because there are no wolves, the erstwhile apex predators, in Japan. But few people want them reintroduced. Instead, authorities have extended hunting seasons and increased culling quotas. In an aging and depopulating countryside, however, there are too few hunters to make use of these measures. And so, this being Japan, robot wolves are being deployed in their stead.

***
Finally, care for the elderly is clearly the elephant in the room. Ideas abound: Intergenerational sharehouses where students pay reduced rent in exchange for “being good neighbors.” Projects combining kindergartens with elderly housing. Denmark’s has more than 150 cohousing communities where residents share meals and social life. But the obvious challenge is scale. These work for dozens, maybe hundreds. Aging countries need solutions for millions.

And then again, there are robot nurses.

***

It’s all different kinds of problems, but all of them, in their essence, boil down to negative-sum games.

Speaking of those, one tends to think of it as of the pie shrinking. And there’s an obvious conclusion: if you want your children to be as well off as you are, you have to fight for someone else’s slice. In a shrinking world, one would expect ruthless predators running wild and civic order collapsing.

But what you really see is quite different. The effect is gradual and subtle. It does not feel like a violent collapse. It feels more like the world silently coming apart at the seams. There’s no single big problem that you would point to. It feels like if everything now just works a bit worse than it used to.

The bus route that ran hourly now runs only three times a day. The elementary school merged with the one in the next town, so children now commute 40 minutes each way. Processing paperwork at the municipal office takes longer now, because both clerks are past the retirement age. The post office closes on Wednesdays and Fridays and the library opens only on Tuesdays. The doctor at the neighborhood clinic stopped accepting new patients because he’s 68 and can’t find a replacement. Even the funeral home can’t guarantee same-day service anymore. Bodies now have to wait.

You look out of the window at the neighboring house, the windows empty and the yard overgrown with weeds, and think about the book club you used to attend. It stopped meeting when the woman who used to organize it moved away. You are told that the local volunteer fire brigade can’t find enough members and will likely cease to operate. You are also warned that there may be bacteria in the tap water. You are told to boil your water before drinking it.

Sometimes you notice how the friends and neighbors are getting less friendly each year. When you need a hand, you call them, but somehow today, they just really, really can’t. It’s tough. They’ll definitely help you next time. But often, they are too busy to even answer the phone. Everyone now has more people to care for. Everyone is stretched out and running thin on resources.

When you were fifty and children started to leave the home, you and your friends, you used to joke that now you would form an anarcho-syndicalist commune.

Ten years later you actually discuss a co-living arrangement, and all you can think about is the arithmetic of care: would you be the last one standing, taking care of everybody else?

Finally someone bites the bullet and proposes moving together but signing a non-nursing-care contract first. And you find yourself quietly nodding in approval.



Discuss

Ads, Incentives, and Destiny

2026-02-14 13:41:44

Published on February 14, 2026 5:41 AM GMT

There’s been some recent unpleasantness regarding Anthropic's Super Bowl ads. To recap:

  • OpenAI started showing ads in some tiers of ChatGPT.
  • Anthropic made some Super Bowl ads making fun of ads in AI.
  • Sam Altman got mad about Anthropic’s ads.

If you haven't already, you should watch one of the ads—they’re very good. Even Sam laughed, right before he got mad about it.

Anthropic’s ads are a lot of fun, but they aren’t completely fair: they implicitly target OpenAI, but show ads that are far worse than what OpenAI is actually doing. But fair or not, they raise a valid concern.

Death, taxes, and enshittification

Let me be clear: OpenAI’s ad policy is thoughtful and ethical and I have no problem with it. If OpenAI rigorously adheres to this policy in the long run I’ll be surprised, delighted, and contrite.

Did I mention that I’d be surprised if OpenAI holds the line? Because I would be quite surprised. The tech industry is littered with companies that began with clear, ethical boundaries about ads, but slowly evolved into user-hostile rent-taking machines. The problem is not that ads are intrinsically bad, but that in certain tech products, the nature of the advertising business creates almost irresistible perverse incentives.

Google was once the canonical example of an ethical tech company. Their motto in those days was “don’t be evil”, and they weren’t. They had a great product that was a delight to use, and their ads were clearly marked as ads, in accordance with a thoughtful and ethical policy much like OpenAI’s new policy. Google was one of the best things about the internet, and they were committed to doing the right thing. But the Ring of Power has a will of its own…

Slowly but inexorably, Google began to change. It turned out that it was possible to make more money per search by showing more ads, and so there were more ads. And people clicked on ads more often if the ads looked more like organic search results, so it became harder and harder to tell them apart. And ads were more valuable if you knew more about the person you were showing them to, so the internet was carpet bombed with increasingly aggressive user-tracking technology.

Google’s downfall wasn’t a lack of good intentions—it was their business model. An ad-supported search engine will inevitably face a million opportunities to become a tiny bit worse and more profitable. And as the years go by? Incentives eat values for breakfast.

Cory Doctorow calls this process “enshittification” and once you know what to look for, it’s everywhere. Google, Facebook, Instagram, Amazon, Instacart… If the business model encourages enshittification, it’s just a matter of time before once-laudable ethical standards begin to bend, and an ad-supported product mutates into a product-supported ad-delivery machine.

Incentives as destiny

The New York Times has maintained ethical boundaries around ads for 175 years, while Google gave in to the dark side within 15 years. Google was once as idealistic as they come, so what went wrong? Why did NYT succeed where Google failed?

It’s complicated, and I don’t pretend to have a single master theory that explains everything. But three factors seem critical for whether a business enshittifies:

  • Are there strong incentives to blur the line between content and advertising?
  • Are there strong incentives to support ads via unethical behavior?
  • Do strong lock-in effects make it hard for customers to leave?

Blurring the line between content and ads

Anthropic’s ads beautifully pointed out the toxicity of presenting advertising as content. Some business models simply offer more opportunity to cross that line than others.

For a newspaper, there’s relatively little money to be made by crossing that line: it’s cheap for the Times to maintain a strict separation between the newsroom and the advertising department. Google, on the other hand, can profit very handsomely by blurring the line between actual search results and “sponsored” results.

Incentives for unethical behavior

Enshittification often spreads beyond how ads are presented. Google, for example, can charge more for ads that are well targeted. It’s no surprise, then, that they have a long history of using very questionable techniques to track user activity across the internet. The Times, on the other hand, simply doesn’t have as many opportunities to profit from questionable behavior.

Lock in

It’s a lot easier to exploit your customers if they’re locked in to your platform. NYT is arguably the best newspaper, but it’s hardly the only one: if the experience of reading the Times becomes too unpleasant, readers will simply leave. Google, on the other hand, has immense lock in: individuals have to use Google because it’s by far the best and easiest way to find things, and businesses have to advertise on it because Google is where people find things. Google has enormous headroom for extractive behavior, because the cost of leaving is so high for both users and advertisers.

Where does that leave OpenAI?

Viewed from an incentives perspective, OpenAI looks more like Google than the New York Times:

  • There is considerable incentive to blur the line between advertising and AI responses. It would be so easy to reduce the visual separation between response and advertisement, or to prompt topics that support more lucrative ads (in Pulse, for example).
  • OpenAI has strong incentives to pursue the same kind of toxic engagement maxing that Facebook does: more time in the product means more ad impressions.
  • Chatbots currently have limited lock in, but that is changing quickly. Features like memory, personalization, and continual learning are very valuable, but make it much harder to switch platforms.

So that’s hardly ideal: OpenAI has strong incentives to enshittify. I believe they don’t intend to do that, but history suggests that good intentions rarely overcome perverse incentives. Sam says:

we would obviously never run ads in the way Anthropic depicts them. We are not stupid and we know our users would reject that.

I trust that he’s sincere, but he’s clearly wrong. Google’s success is proof that when the conditions are right, enshittification is a profitable strategy, and users will tolerate it.

Ads and accessibility

Sam makes a really good point: AI is quickly becoming a vital tool. Just as it’s important that the internet be accessible to everyone, it’s important that everyone be able to access AI. Frontier models are expensive to run, and ads are potentially one of our few tools for ensuring that everyone has access to capable AI. But accessibility considerations just underscore the dangers of enshittification.

Enshittified products are worse than paid products because the advertising model drives user-hostile product design. Google search isn’t just bad because of all the ads, it’s bad because Google relentlessly tracks you across the internet in order to target those ads. Facebook is toxic because it serves ragebait to keep you “engaged” and watching ads.

If OpenAI ensures that everyone has access to AI by serving an ethical ad-supported product, that’s great. But if that devolves into “if you can’t afford to pay for good AI, you get toxic, manipulative AI for free”—I’m not sure that actually helps.

Now we wait

Again: what OpenAI is doing today is absolutely fine. The question is whether they will continue to uphold their current standards, or whether they will follow so many others down the path of enshittification.

If their ads become increasingly difficult to distinguish from their content, and if they start finding reasons why it’s OK to include sponsored content in AI responses, then we’ll have our answer. And we’ll have new information about OpenAI’s ability to ethically manage superintelligence.

And conversely: if they hold the line, and succeed where so many others have failed, I will be delighted to admit that my concerns were unfounded. And I will update positively about how much I trust them with other, bigger decisions.



Discuss

Why I'm Worried About Job Loss + Thoughts on Comparative Advantage

2026-02-14 07:36:03

Published on February 13, 2026 11:36 PM GMT

David Oks published a well-written essay yesterday arguing that the current panic about AI job displacement is overblown. I agree with a few of his premises (and it’s nice to see that we’re both fans of Lars Tunbjörk), but disagree with most of them and arrive at very different conclusions. I see other economists with similar views to David, so I thought it would be best to illustrate my perspective on econ/labor and why I choose to research gradual disempowerment risks.

My main claim is simple: it is possible for Oks to be right about comparative advantage and bottlenecks while still being wrong that "ordinary people don't have to worry." A labor market can remain "employed" and still become structurally worse for workers through wage pressure, pipeline collapse, and surplus capture by capital.

I'm writing this because I keep seeing the same argumentative move in AI-econ discourse: a theoretically correct statement about production gets used to carry an empirical prediction about broad welfare. I care less about the binary question of "will jobs exist?" and more about the questions that determine whether this transition is benign: how many jobs, at what pay, with what bargaining power, and who owns the systems generating the surplus.

Oks' points are as follows:

1: Comparative Advantage Preserves Human Labor

“...Labor substitution is about comparative advantage, not absolute advantage. The question isn’t whether AI can do specific tasks that humans do. It’s whether the aggregate output of humans working with AI is inferior to what AI can produce alone: in other words, whether there is any way that the addition of a human to the production process can increase or improve the output of that process."

Oks brings the Ricardian argument here, and I think it's directionally correct as a description of many workflows today. We are in a cyborg era in which humans plus AI often outperform AI alone, especially on problems with unclear objectives or heavy context. But I don't think the comparative advantage framing does the work Oks wants it to do, because it leaves out the variables that determine whether workers are "fine." 

First, comparative advantage tells you that some human labor will remain valuable in some configuration, but nothing about the wages, number of jobs, or the distribution of gains. You can have comparative advantage and still have massive displacement, wage collapse, and concentration of returns to capital. A world where humans retain “comparative advantage” in a handful of residual tasks at a fraction of the current wages is technically consistent with Oks’ framework, but obviously is worth worrying about and is certainly not fine.

Another issue with the comparative advantage framing (and I think this is a problem with most AI/econ forecasting) – it implicitly assumes that most laborers have the kind of tacit, high-context strategic knowledge that complements AI. The continuation of the “cyborg era” presupposes that laborers have something irreplaceable to contribute (judgement, institutional context, creative direction). I agree with this for some jobs, but it’s not enough for me to avoid being worried about job loss.

Under capitalism, firms are rational cost-minimizers. They will route production through whatever combination of inputs delivers the most output per dollar (barring policy). Oks and David Graeber’s “Bullshit Jobs” thesis agree on the empirical point that organizations are riddled with inefficiency, and many roles exist not because they’re maximally productive but because of social signaling and coordination failures. Oks treats this inefficiency as a buffer that protects workers. But if a significant share of existing roles involve codifiable, routine cognitive tasks, then they’re not protected by comparative advantage at all. They’re protected only by the organizational friction that has yet to be overcome, which I believe will erode (and we’ll discuss later).

Oks links some evidence that demonstrates protection from displacement  – I’ll admit that the counter-evidence so far is inconclusive and contaminated by many other economic factors. I will bring up Erik Brynjolfsson’s “Canaries in the Coal Mine” study, because I think it exemplifies the trend we’ll continue seeing in the next 2-3+ years before AGI.

Brynjolfsson analyzed millions of ADP payroll records and found a 13% relative decline in employment for early-career workers (ages 22-25) in AI-exposed occupations since late 2022. For young software developers specifically, employment fell almost 20% from its 2022 peak. Meanwhile, employment for experienced workers in the same occupations held steady or grew.

So what’s the mechanism at play? AI replaces codified knowledge – the kind of learning you get from classrooms or textbooks – but struggles with tacit knowledge, the experiential judgement that accumulates over years on the job. This is why seniors are spared and juniors are not. But Oks’ thesis treats this as reassurance: see, humans with deep knowledge still have comparative advantage! I believe this is more of a senior worker’s luxury, and the protection for “seniors” will move up and up the hierarchy over time.

Some other stats (again, not totally conclusive but worthy of bringing up):

  1. Youth unemployment hit 10.8% in July 2025, the highest rate since the pandemic, even as overall unemployment remained low.
  2. Entry-level job postings across the U.S. declined roughly 35% between January 2023 and late 2024.
  3. New graduate hiring at major tech companies dropped over 50% between 2019 and 2024, with only 7% of new hires in 2024 being recent graduates.
  4. A Harvard study corroborated these findings: headcount for early-career roles at AI-adopting firms fell 7.7% over just six quarters beginning in early 2023, while senior employment continued its steady climb.

This is a disappearance of the bottom rung of the career ladder, which has historically served a dual function: producing output and training the next generation of senior workers. Oks may point to other sources of employment (yoga instructors, streamers, physical trainers), or indicate that entry-level hiring is slowing down due to other economic forces, but I’ll ask: will the entire generation of incoming college graduates, who are rich with codified knowledge but lacking in tacit knowledge, all find AI-complementary roles? Or will firms slow hiring and enjoy the productive pace of their augmented senior employees? How high are labor costs for entry-level grads compared to the ever-reducing cost of inference?

2: Organizational Bottlenecks Slow Displacement

“People frequently underrate how inefficient things are in practically any domain, and how frequently these inefficiencies are reducible to bottlenecks caused simply by humans being human… Production processes are governed by their least efficient inputs: the more productive the most efficient inputs, the more the least efficient inputs matter.”

This is the strongest part of the essay and overlaps substantially with my own modeling work. The distance between technical capability and actual labor displacement is large, variable across domains, and governed by several constraints independent of model intelligence. The point about GPT-3 being out for six years without automating low-level work is good empirical evidence, though I don’t agree that GPT-3 or GPT-4 era models could automate customer service (they would need tool usage, better memory,  and better voice latency to do that).

Where the analysis is lacking is in treating bottlenecks as if they’re static features of the landscape rather than obstacles in the path of an accelerating force. Oks acknowledges that they erode over time but doesn’t discuss the rate of erosion or that AI itself may accelerate their removal.

The example below is likely overstated, but this is the worst Claude will ever be – are any of these agentic decisions something that we would previously classify as organizational friction?

Image

In my own modeling, I estimate organizational friction coefficients for different sectors and job types. The bottleneck argument is strong for 2026-2029, but I think it’s considerably weaker for 2030-2034. Oks brings up the example of electricity taking decades to diffuse but admits that the timeline isn’t similar. I would agree, it's not similar, and the data is increasingly pointing towards a compressed S-curve where adoption is slow until it isn’t.

Oks’ bottleneck argument is entirely about incumbents – large, existing firms with accumulated infrastructure debt. What happens when AI-native organizational structures compete with legacy enterprises with decades worth of technical debt? The infrastructure bottleneck is a moat that only protects incumbents until someone flies over it.

3: Intelligence Isn’t the Limiting Factor, and Elastic Demand Will Absorb Productivity Gains

“The experience of the last few years should tell us clearly: intelligence is not the limiting factor here… even for the simplest of real-world jobs, we are in the world of bottlenecks.”

“Demand for most of the things humans create is much more elastic than we recognize today. As a society, we consume all sorts of things–not just energy but also written and audiovisual content, legal services, ‘business services’ writ large–in quantities that would astound people living a few decades ago, to say nothing of a few centuries ago.”

Here I’ll lean a little bit on Dean W. Ball's latest pieces on recursive self-improvement as well as some empirical evidence of job loss. Oks writes as though we haven’t seen meaningful displacement yet – I would say we have, within the limited capabilities of models today.

Beyond the entry-level crisis discussed earlier, displacement is already hitting mid-career professionals across creative and knowledge work. See reports linked on illustrators and graphic designers, translators, copywriters, and explicitly AI-related corporate layoffs.

The models doing this aren’t even particularly good yet. These losses are happening with GPT-4-class and early GPT-5-class models; models that still hallucinate, produce mediocre long-form writing, can't design well, and can’t reliably handle complex multi-step reasoning. If this level of capability is already destroying illustration, translation, copywriting, and content creation, what happens when we reach recursive self-improvement? There needs to be some more investigative work to see how displaced designers/translators/copywriters etc. are reskilling and finding new work, but I would estimate it’s extraordinarily difficult in this job market.

Notice the distributional pattern: it’s not the creative directors, the senior art directors, or the lead translators with niche expertise getting hit. It’s everyone below them; the juniors, the mid-career freelancers, the people who do the volume work. Oks’ comparative advantage argument might hold for the person at the top of the hierarchy whose taste and judgment complement AI, but it offers no comfort for the twenty people who work below that person.

Then, we’ll consider the capabilities overhang. We haven’t even seen models trained on Blackwell-generation chips yet, and models are reaching the ability to build their next upgrades

Massive new data centers are coming online this year. Oks’ point about “GPT-3 being out for 6 years and nothing catastrophic has happened” – is looking at capabilities from 2020–2025 and extrapolating forward, right before a massive step-change in both compute and algorithmic progress hits simultaneously. The river has not flooded but the dam has cracked.

Ball offers another good point in his essays – there is a difference between AI that’s faster at the same things versus AI that’s qualitatively different – a Bugatti going 300 instead of 200 mph vs a Bugatti that learns to fly. Oks’ entire analysis assumes incremental improvements that organizational friction can absorb. But, again, automated AI research raises the possibility of capabilities that route around existing organizational structures rather than trying to penetrate them. An AI system that autonomously manages end-to-end business processes doesn’t need to navigate office politics and legacy systems.

As for the Jevons paradox argument (often cited); that elastic demand will absorb productivity gains. I believe it’s real for some categories of output but cherry-picked as a general principle. Software is Oks’ central example, and it’s well-chosen: software is elastic in demand because it’s a general-purpose tool. But does anyone believe demand for legal document review is infinitely elastic? For tax preparation? For freelance video editors? These are bounded markets where productivity gains translate fairly directly to headcount reductions, and I’m still struggling to understand how we are telling early-wave displaced roles to upskill or find new careers.

Someone commented under Oks’ post another example that I’ll jump on. As global manufacturing shifted toward China and other low-cost production regions, total manufacturing output continued to expand rather than contract, a Jevons-like scale effect where cheaper production increased overall consumption. American manufacturing workers, however, bore concentrated losses. The gains flowed disproportionately to consumers, firms, and capital owners, while many displaced workers (especially in Midwestern industrial regions) faced long-term economic decline that helped fuel a broader political backlash against globalization.

We can also address a concrete case – AI video generation. Models like Veo 3.1 and Seedance 2.0 are producing near-lifelike footage with native audio, lip-synched dialogue, and most importantly, automated editorial judgement. Users upload reference images, videos, and audio, and the model assembles coherent multi-shot sequences matching the vibe and aesthetic they’re after. Seedance 2.0 shipped this week.

The U.S. motion picture and video production industry employs roughly 430,000 people – producers, directors, editors, camera operators, sound technicians, VFX artists – plus hundreds of thousands more in adjacent commercial production: corporate video, social content, advertising spots, educational materials. The pipeline between “someone has an idea for a video” and “a viewer watches it” employs an enormous intermediary labor force.

Oks’ elastic demand argument would predict that cheaper video production simply means more video, with roughly equivalent total employment. And it’s true that demand for video content is enormous – McKinsey notes the average American now spends nearly seven hours a day watching video across platforms. But I would challenge his thesis: is the number of people currently employed between producer and consumer equivalent to the number who will be needed when AI collapses that entire intermediary layer? When a single person with a creative vision can prompt Seedance/Veo/Sora into producing a polished commercial that once required a director, cinematographer, editor, colorist, and sound designer, does elastic demand for the output translate into elastic demand for the labor?

People now can produce polished AI anime for about $5-$100. This content exists but the workforce does not. So, yes, there will be vastly more video content in the world. But the production function has changed; the ratio of human labor to output has shifted by orders of magnitude. The demand elasticity is in the content, not in the labor.

To summarize: Jevon's paradox in aggregate output is perfectly compatible with catastrophic distributional effects. You can have more total economic activity and still have millions of people whose specific skills and local labor markets are destroyed. The people being displaced right now are not edge cases, they’re illustrators, translators, copywriters, graphic designers, video producers, and 3D artists who were told their skills would always be valuable because they were “creative.” The aggregate framing erases these people, and it will erase more.

4: We’ll Always Invent New Jobs From Surplus

“We’ll invent jobs because we can, and those jobs will sit somewhere between leisure and work. Indeed this is the entire story of human life since the first agrarian surplus. For the entire period where we have been finding more productive ways to produce things, we have been using the surplus we generate to do things that are further and further removed from the necessities of survival.”

This is an argument by induction: previous technological transitions always generated new employment categories, so this one will too. The premise is correct, the pattern is real and well-documented. I don’t dispute it.

The problem is the reference class issue. Every previous transition involved humans moving up the cognitive ladder, like from physical labor to increasingly abstract cognitive work. Oks mentions this – agricultural automation pushing people into manufacturing, then manufacturing automation pushing people into services, then service automation pushing people into knowledge work. The new jobs that emerged were always cognitive jobs. This time, the cognitive domain itself is being automated.

I don’t think this means zero new job categories will emerge. But Oks’ assertion that “people will find strange and interesting things to do with their lives” doesn’t address three critical questions: the transition path (how do people actually get from displaced jobs to new ones?), the income levels (will new activities pay comparably to what they replace?), and ownership (will the surplus that enables those activities be broadly shared or narrowly held?). There’s also the entry-level → senior pipeline problem I mentioned earlier.

The gesture toward “leisure” as an eventual end state is telling. If human labor really does become superfluous, that’s not a world where “ordinary people” are okay by default, but rather a world where the entire economic operating system needs to be redesigned. Oks treats this as a distant concern. I’d argue it’s the thing most worth worrying about, because policy needs to be built before we arrive there, not after.

5: What’s Missing

The deepest issue with Oks’ essay is the framing, rather than his individual claims. His entire analysis is labor-centric: will humans still have jobs? I think this is assuredly worth asking, but also incomplete.

I’ll be charitable and say that the following section covers something he didn’t write about (instead of not considering), but he says “ordinary people don’t have to worry”, which I think is a bad framing.

The right question is: who captures the surplus? Is that worth worrying about?

If AI makes production 10x more efficient and all those gains flow to the owners of AI systems and the capital infrastructure underlying them, then “ordinary people” keeping their jobs at stagnant or declining real wages in a world of AI-owner abundance is not “fine.” It’s a massive, historically unprecedented increase in inequality. The comparative advantage argument is perfectly compatible with a world where human labor is technically employed but capturing a shrinking share of value.

This is what I’ve been working on in an upcoming policy document – the question of how ownership structures for AI systems will determine whether productivity gains flow broadly or concentrate narrowly. Infrastructure equity models, worker ownership structures, structural demand creation – these are the mechanisms that determine whether the AI transition is benign or catastrophic. Oks’ thesis has no apparent answer to the question.

Oks is right that thoughtless panic could produce bad regulatory outcomes. But complacent optimism that discourages the hard work of building new ownership structures, redistribution mechanisms, and transition support is equally dangerous, and arguably more likely given how power is currently distributed. Benign outcomes from technological transitions have never been the default. They’ve been the product of deliberate institutional design: labor law, antitrust enforcement, public education, social insurance.

I don’t think we should be telling people “don’t worry”. We should worry about the right things. Think seriously about who will own the systems that are about to become the most productive capital assets in human history, and pay attention to whether the institutional frameworks being built now will ensure you share in the gains. The difference between a good outcome and a bad one is about political economy and ownership, and history suggests that when we leave that question to the default trajectory, ordinary people are the ones who pay.



Discuss

METR Time Horizons: Now 10x/Year

2026-02-14 07:01:40

Published on February 13, 2026 11:01 PM GMT

Summary: AI models are now improving on METR’s Time Horizon benchmark at about 10x per year, compared to ~3x per year before 2024. They could return to a slower pace in 2026 or 2027 if RL scaling is inefficient, but the current rate is suggestive of short (median 3-year) timelines to AGI.

METR has released an update to their time horizon estimates for 2026, Time Horizon 1.1. They added new tasks with longer human completion times to the benchmark, and removed other tasks that were flawed. But for AI-watchers, the most relevant news from the update is that their estimates of AI progress have gotten a lot faster.

METR’s old estimate was that models’ time horizons doubled every 7 months. This meant that they increased by about 3.3x per year. METR included a note in their original report that progress might’ve sped up in 2024, to more like one doubling every 3-5 months, but with only 6 data points driving the estimate, it was pretty speculative. Now the update confirms: the trend from 2024 on looks like it’s doubling much faster than once per 7 months. The new estimate includes trends for models released in 2024 and 2025, and one for 2025-only models. Looking at these more recent models, time horizons double every ~3.5 months[1]. That’s about 10x per year - twice as fast in log space as the original headline.

3 trends for METR time horizons: the original 7-month doubling, the new 3.5-month doubling, and a slowdown and return to 7-month doublings by EOY 2026. Interactive version here.

This could be an artifact of the tasks METR chooses to measure. I think their time horizons work is currently the single best benchmark we have available, but it only measures coding tasks, which labs are disproportionately focused on. For a sanity check, we can look at the composite Epoch Capabilities Index and see that…progress also became roughly twice as fast in early 2024!

Why does this matter? I think at least one of the following has to be true:

  1. METR’s and Epoch’s benchmarks could be saturated without leading to AGI.
  2. The 3.5-month doubling pace is temporary, and slows down significantly in 2026-7.
  3. We’re on track for AGI in the late 2020s.

In the past, some lab insiders have said their timelines for AGI were bimodal. This was a bit overstated, but the basic insight was that AI progress is largely due to companies scaling up the number of chips they’re buying. They could reach AGI sometime during this rapid scale-up, in which case it would come soon. But the scale-up can’t last much longer than a few more years, because after 2030, continuing to scale would cost trillions of dollars. So if industry didn’t hit AGI by then, compute scaling would slow down a bunch, and with it, AI progress more broadly. Then they’d have many years of marginal efficiency improvements ahead (to both models and chips) before finally reaching AGI.

But if AI capabilities are advancing this quickly, then AI progress could be less dependent on compute scaling than thought (at least in the current regime)[2]. If capabilities are growing by scaling reinforcement learning, then the pace of progress depends in large part on how quickly RL is scaling right now, and how long it can continue to do so (probably not much more than another year). Or if Toby Ord is right that inference scaling has been the main driver of recent capabilities growth, then progress will also soon return to pre-2024 rates (though see Ryan Greenblatt’s response).

Let’s consider the world where 2024-2025 growth rates were an aberration. In 2026 things start to slow down, and 2027 fully returns to the 7-month trend. 2 years of 3.5-month doubling times would still leave AI models with ~10x longer time horizons than they would’ve had in the world with no RL speedup. That means they fast-forwarded through about 2 years of business-as-usual AI capabilities growth. And more than that, it means they got this AI progress essentially “for free”, without increasing compute costs faster than before. Whitfill et al found that historically, compute increased by 3-4x per year, and so did time horizons. The fast progress in 2024-2025 means that not only are models better - they’re also more compute-efficient than they would’ve been without this speedup. That suggests AI capabilities are less dependent on compute than previously believed, because AI companies can discover paradigms like RL scaling that grow capabilities while using a fraction of total training compute[3].

My best guess is that we do see a slowdown in 2026-7. But progress could also speed up instead, going superexponential like in AI 2027. Progress in scaffolding and tools for AI model, or on open problems like continual learning, or just standard AI capabilities growth, could make the models useful enough to meaningfully speed up research and development inside of AI companies.

We should get substantial evidence on how sustainable the current trend is through 2026; I’ll be watching closely.

  1. One doubling every 3 months per the 2025-only trend, or once every 4 months per the 2024-5 trend. ↩︎

  2. It’s a bit suspicious that time horizons and effective FLOP for AI models, and revenue for AI companies, are all growing by ~10x per year. This could simply be coincidence (other factors like inference scaling might be temporarily speeding up growth in time horizons, which historically have grown slower than eFLOP), or time horizons could be more tightly tied to eFLOP right now than they were in the pre-RL scaling regime. I lean more toward the former, but it’s unclear. ↩︎

  3. If progress continued at its current pace through 2027, models in early 2028 would have something like 6-month time horizons. I think that’s probably enough for AI companies to accelerate their internal R&D by around 2x, and at that point R&D acceleration would start to bend the capabilities curve further upward. Models would soon hit broadly human-level capabilities, then go beyond them in 2028-2029. ↩︎



Discuss

Use more text than one token to avoid neuralese

2026-02-14 05:09:16

Published on February 13, 2026 9:09 PM GMT

You want to relay the contents of a transformer's output vector to the next input: next_input = encode(decode(output)).

You're currently using next_input = embed(sample_token(output)) to do this.

This compresses the output to one row of a lookup table. That's a pretty big squash. Too big, surely -- there must be a better way.

Enter neuralese. If you make next_input=output (or something like that) you lose no bandwidth at all. The bad news is that these vectors no longer correspond to natural language.

But you can add more bandwidth by funneling each vector through more text. That way, encode(decode(output)) doesn't lose too much information.

You could have each vector decode to multiple tokens. Or even a cleverly chosen patch of bytes. Perhaps whole sentences, paragraphs, essays one day.

I don't know which is best, but you do have options here, so it's not obvious to me that "you need more bandwidth" implies "you must abandon natural language" -- unless you're forcing it through a text intermediate as impoverished as, well, a lookup table.

 

Edit: Nothing new under the sun, it seems. Please look here for prior discussion of this topic, and consider this post a bump for it.



Discuss

Hazards of Selection Effects on Approved Information

2026-02-14 02:51:42

Published on February 13, 2026 6:51 PM GMT

In a busy, busy world, there's so much to read that no one could possibly keep up with it all. You can't not prioritize what you pay attention to and (even more so) what you respond to. Everyone and her dog tells herself a story that she wants to pay attention to "good" (true, useful) information and ignore "bad" (false, useless) information.

Keeping the story true turns out to be a harder problem than it sounds. Everyone and her dog knows that the map is not the territory, but the reason we need a whole slogan about it is because we never actually have unmediated access to the territory. Everything we think we know about the territory is actually just part of our map (the world-simulation our brains construct from sensory data), which makes it easy to lose track of whether your actions are improving the real territory, or just your view of it on your map.

For example, I like it when I have good ideas. It makes sense for me to like that. I endorse taking actions that will result in world-states in which I have good ideas.

The problem is that I might not be able to tell the difference between world-states in which I have good ideas, and world-states in which I think my ideas are good, but they're actually bad. Those two different states of the territory would look the same on my map.

If my brain's learning algorithms reinforce behaviors that lead to me having ideas that I think are good, then in addition to learning behaviors that make me have better ideas (like reading a book), I might also inadvertently pick up behaviors that prevent me from hearing about it if my ideas are bad (like silencing critics).

This might seem like an easy problem to solve, because the most basic manifestations of the problem are in fact pretty easy to solve. If I were to throw a crying fit and yell, "Critics bad! No one is allowed to criticize my ideas!" every time someone criticized my ideas, the problem with that would be pretty obvious to everyone and her dog, and I would stop getting invited to the salon.

But what if there were subtler manifestations of the problem, that weren't obvious to everyone and her dog? Then I might keep getting invited to the salon, and possibly even spread the covertly dysfunctional behavior to other salon members. (If they saw the behavior seeming to work for me, they might imitate it, and their brain's learning algorithms would reinforce it if it seemed to work for them.) What might those look like? Let's try to imagine.

Filtering Interlocutors

Goofusia: I don't see why you tolerate that distrustful witch Goody Osborne at your salon. Of course I understand the importance of criticism, which is an essential nutrient for any truthseeker. But you can acquire the nutrient without the downside of putting up with unpleasant people like her. At least, I can. I've already got plenty of perceptive critics in my life among my friends who want the truth, and know that I want the truth—who will assume my good faith, because they know my heart is in the right place.

Gallantina: But aren't your friends who know you want the truth selected for agreeing with you, over and above their being selected for being correct? If there were some crushing counterargument to your beliefs that would only be found by someone who didn't know that you want the truth and wouldn't assume good faith, how would you ever hear about it?

This one is subtle. Goofusia isn't throwing a crying fit every time a member of the salon criticizes her ideas. And indeed, you can't invite the whole world to your salon. You can't not do some sort of filtering. The question is whether salon invitations are being extended or withheld for "good" reasons (that promote the salon processing true and useful information) or "bad" reasons (that promote false or useless information).

The problem is that being friends with Goofusia and "know[ing] that [she and other salon members] want the truth" is a bad membership criterion, not a good one, because people who aren't friends with Goofusia and don't know that she wants the truth are likely to have different things to say. Even if Goofusia can answer all the critiques her friends can think of, that shouldn't give her confidence that her ideas are solid, if there are likely to exist serious critiques that wouldn't be independently reïnvented by the kinds of people who become Goofusia's friends.

The "nutrient" metaphor is a tell. Goofusia seems to be thinking of criticism as if it were a homogeneous ingredient necessary for a healthy epistemic environment, but that it doesn't particularly matter where it comes from. In analogy, it doesn't matter whether you get your allowance of potassium from bananas or potatoes or artificial supplements. If you find bananas and potatoes unpleasant, you can still take supplements and get your potassium that way; if you find Goody Osborne unpleasant, you can just talk to your friends who know you want the truth and get your criticism that way.

But unlike chemically uniform nutrients, criticism isn't homogeneous: different critics are differently equipped by virtue of their different intellectual backgrounds to notice different flaws in a piece of work. The purpose of criticism is not to virtuously endure being criticized; the purpose is to surface and fix every individual flaw. (If you independently got everything exactly right the first time, then there would be nothing for critics to do; it's just that that seems pretty unlikely if you're talking about anything remotely complicated. It would be hard to believe that such an unlikely-seeming thing had really happened without the toughest critics getting the chance to do their worst.)

"Knowing that (someone) wants the truth" is a particularly poor filter, because people who think that they have strong criticisms of your ideas are particularly likely to think that you don't want the truth. (Because, the reasoning would go, if you did want the truth, why would you propose such flawed ideas, instead of independently inventing the obvious-to-them criticism yourself and dropping the idea without telling anyone?) Refusing to talk to people who think that they have strong criticisms of your ideas is a bad thing to do if you care about your ideas being correct.

The selection effect is especially bad in situations where the fact that someone doesn't want the truth is relevant to the correct answer. Suppose Goofusia proposes that the salon buys cookies from a certain bakery—which happens to be owned by Goofusia's niece. If Goofusia's proposal was motivated by nepotism, that's probabilistically relevant to evaluating the quality of the proposal. (If the salon members aren't omniscient at evaluating bakery quality on the merits, then they can be deceived by recommendations made for reasons other than the merits.) The salon can debate back and forth about the costs and benefits of spending the salon's snack budget at the niece's bakery, but if no one present is capable of thinking "Maybe Goofusia is being nepotistic" (because anyone who could think that would never be invited to Goofusia's salon), that bodes poorly for the salon's prospects of understanding the true cost–benefit landscape of catering options.

Filtering Information Sources

Goofusia: One shouldn't have to be the sort of person who follows discourse in crappy filter-bubbles in order to understand what's happening. The Rev. Samuel Parris's news summary roundups are the sort of thing that lets me do that. Our salon should work like that if it's going to talk about the atheist threat and the witchcraft crisis. I don't want to have to read the awful corners of the internet where this is discussed all day. They do truthseeking far worse there.

Gallantina: But then you're turning your salon into a Rev. Parris filter bubble. Don't you want your salon members to be well-read? Are you trying to save time, or are you worried about being contaminated by ideas that haven't been processed and vetted by Rev. Parris?

This one is subtle, too. If Goofusia is busy and just doesn't have time to keep up with what the world is saying about atheism and witchcraft, it might very well make sense to delegate her information gathering to Rev. Parris. That way, she can get the benefits of being mostly up to speed on these issues without having to burn too many precious hours that could be spent studying more important things.

The problem is that the suggestion doesn't seem to be about personal time-saving. Rev. Parris is only one person; even if he tries to make his roundups reasonably comprehensive, he can't help but omit information in ways that reflect his own biases. (For he is presumably not perfectly free of bias, and if he didn't omit anything, there would be no time-saving value to his subscribers in being able to just read the roundup rather than having to read everything that Rev. Parris reads.) If some salon members are less busy than Goofusia and can afford to do their own varied primary source reading rather than delegating it all to Rev. Parris, Goofusia should welcome that—but instead, she seems to be suspicious of those who would "be the sort of person" who does that. Why?

The admonition that "They do truthseeking far worse there" is a tell. The implication seems to be that good truthseekers should prefer to only read material by other good truthseekers. Rev. Parris isn't just saving his subscribers time; he's protecting them from contamination, heroically taking up the burden of extracting information out of the dangerous ravings of non-truthseekers.

But it's not clear why such a risk of contamination should exist. Part of the timeless ideal of being well-read is that you're not supposed to believe everything you read. If I'm such a good truthseeker, then I should want to read everything I can about the topics I'm seeking the truth about. If the authors who publish such information aren't such good truthseekers as I am, I should take that into account when performing updates on the evidence they publish, rather than denying myself the evidence.

Information is transmitted across the physical universe through links of cause and effect. If Mr. Proctor is clear-sighted and reliable, then when he reports seeing a witch, I infer that there probably was a witch. If the correlation across possible worlds is strong enough—if I think Mr. Proctor reports witches when there are witches, and not when there aren't—then Mr. Proctor's word is almost as good as if I'd seen the witch myself. If Mr. Corey has poor eyesight and is of a less reliable character, I am less credulous about reported witch sightings from him, but if I don't face any particular time constraints, I'd still rather hear Mr. Corey's testimony, because the value of information to a Bayesian reasoner is always nonnegative. For example, Mr. Corey's report could corroborate information from other sources, even if it wouldn't be definitive on its own. (Even the fact that people sometimes lie doesn't fundamentally change the calculus, because the possibility of deception can be probabilistically "priced in".)

That's the theory, anyway. A potential reason to fear contamination from less-truthseeking sources is that perhaps the Bayesian ideal is too hard to practice and salon members are too prone to believe what they read. After all, many news sources have been adversarially optimized to corrupt and control their readers and make them less sane by seeing the world through ungrounded lenses.

But the means by which such sources manage to control their readers is precisely by capturing their trust and convincing them that they shouldn't want to read the awful corners of the internet where they do truthseeking far worse than here. Readers who have mastered multiple ungrounded lenses and can check them against each other can't be owned like that. If you can spare the time, being well-read is a more robust defense against the risk of getting caught in a bad filter bubble, than trying to find a good filter bubble and blocking all (presumptively malign) outside sources of influence. All the bad bubbles have to look good from the inside, too, or they wouldn't exist.

To some, the risk of being in a bad bubble that looks good may seem too theoretical or paranoid to take seriously. It's not like there are no objective indicators of filter quality. In analogy, the observation that dreaming people don't know that they're asleep, probably doesn't make you worry that you might be asleep and dreaming right now.

But it being obvious that you're not in one of the worst bubbles shouldn't give you much comfort. There are still selection effects on what information gets to you, if for no other reason that there aren't enough good truthseekers in the world to uniformly cover all the topics that a truthseeker might want to seek truth about. The sad fact is that people who write about atheism and witchcraft are disproportionately likely to be atheists or witches themselves, and therefore non-truthseeking. If your faith in truthseeking is so weak that you can't even risk hearing what non-truthseekers have to say, that necessarily limits your ability to predict and intervene on a world in which atheists and witches are real things in the physical universe that can do real harm (where you need to be able to model the things in order to figure out which interventions will reduce the harm).

Suppressing Information Sources

Goofusia: I caught Goody Osborne distributing pamphlets quoting the honest and candid and vulnerable reflections of Rev. Parris on guiding his flock, and just trying to somehow twist that into maximum anger and hatred. It seems quite clear to me what's going on in that pamphlet, and I think signal-boosting it is a pretty clear norm violation in my culture.

Gallantina: I read that pamphlet. It seemed like intellectually substantive satire of a public figure. If you missed the joke, it was making fun of an alleged tendency in Rev. Parris's sermons to contain sophisticated analyses of the causes of various social ills, and then at the last moment, veer away from the uncomfortable implications and blame it all on witches. If it's a norm violation to signal-boost satire of public figures, that's artificially making it harder for people to know about flaws in the work of those public figures.

This one is worse. Above, when Goofusia filtered who she talks to and what she reads for bad reasons, she was in an important sense only hurting herself. Other salon members who aren't sheltering themselves from information are unaffected by Goofusia's preference for selective ignorance, and can expect to defeat Goofusia in public debate if the need arises. The system as a whole is self-correcting.

The invocation of "norm violations" changes everything. Norms depend on collective enforcement. Declaring something a norm violation is much more serious than saying that you disagree with it or don't like it; it's expressing an intent to wield social punishment in order to maintain the norm. Merely bad ideas can be criticized, but ideas that are norm-violating to signal-boost are presumably not even to be seriously discussed. (Seriously discussing a work is signal-boosting it.) Norm-abiding group members are required to be ignorant of their details (or act as if they're ignorant).

Mandatory ignorance of anything seems bad for truthseeking. What is Goofusia thinking here? Why would this seem like a good idea to someone?

At a guess, the "maximum anger and hatred" description is load-bearing. Presumably the idea is that it's okay to calmly and politely criticize Rev. Parris's sermons; it's only sneering or expressing anger or hatred that is forbidden. If the salon's speech code only targets form and not content, the reasoning goes, then there's no risk of the salon missing out on important content.

The problem is that the line between form and content is blurrier than many would prefer to believe, because words mean things. You can't just swap in non-angry words for angry words without changing the meaning of a sentence. Maybe the distortion of meaning introduced by substituting nicer words is small, but then again, maybe it's large: the only person in a position to say is the author. People don't express anger and hatred for no reason. When they do, it's because they have reasons to think something is so bad that it deserves their anger and hatred. Are those good reasons or bad reasons? If it's norm-violating to talk about it, we'll never know.

Unless applied with the utmost stringent standards of evenhandedness and integrity, censorship of form quickly morphs into censorship of content, as heated criticism of the ingroup is construed as norm-violating, while equally heated criticism of the outgroup is unremarkable and passes without notice. It's one of those irregular verbs: I criticize; you sneer; she somehow twists into maximum anger and hatred.

The conjunction of "somehow" and "it seems quite clear to me what's going on" is a tell. If it were actually clear to Goofusia what was going on with the pamphlet author expressing anger and hatred towards Rev. Parris, she would not use the word "somehow" in describing the author's behavior: she would be able to pass the author's ideological Turing test and therefore know exactly how.

If that were just Goofusia's mistake, the loss would be hers alone, but if Goofusia is in a position of social power over others, she might succeed at spreading her anti-speech, anti-reading cultural practices to others. I can only imagine that the result would be a subculture that was obsessively self-congratulatory about its own superiority in "truthseeking", while simultaneously blind to everything outside itself. People spending their lives immersed in that culture wouldn't necessarily notice anything was wrong from the inside. What could you say to help them?

An Analogy to Reinforcement Learning From Human Feedback

Pointing out problems is easy. Finding solutions is harder.

The training pipeline for frontier AI systems typically includes a final step called reinforcement learning from human feedback (RLHF). After training a "base" language model that predicts continuations of internet text, supervised fine-tuning is used to make the model respond in the form of an assistant answering user questions, but making the assistant responses good is more work. It would be expensive to hire a team of writers to manually compose the thousands of user-question–assistant-response examples needed to teach the model to be a good assistant. The solution is RLHF: a reward model (often just the same language model with a different final layer) is trained to predict the judgments of human raters about which of a pair of model-generated assistant responses is better, and the model is optimized against the reward model.

The problem with the solution is that human feedback (and the reward model's prediction of it) is imperfect. The reward model can't tell the difference between "The AI is being good" and "The AI looks good to the reward model". This already has the failure mode of sycophancy, in which today's language model assistants tell users what they want to hear, but theory and preliminary experiments suggest that much larger harms (up to and including human extinction) could materialize from future AI systems deliberately deceiving their overseers—not because they suddenly "woke up" and defied their training, but because what we think we trained them to do (be helpful, honest, and harmless) isn't what we actually trained them to do (perform whatever computations were the antecedents of reward on the training distribution).

The problem doesn't have any simple, obvious solution. In the absence of some sort of international treaty to halt all AI development worldwide, "Just don't do RLHF" isn't feasible and doesn't even make any sense; you need some sort of feedback in order to make an AI that does anything useful at all.

The problem may or may not ultimately be solvable with some sort of complicated, nonobvious solution that tries to improve on naïve RLHF. Researchers are hard at work studying alternatives involving red-teaming, debate, interpretability, mechanistic anomaly detection, and more.

But the first step on the road to some future complicated solution to the problem of naïve RLHF, is acknowledging that the the problem is at least potentially real, and having some respect that the problem might be difficult, rather than just eyeballing the results of RLHF and saying that it looks great.

If a safety auditor comes to the CEO of an AI company expressing concerns about the company's RLHF pipeline being unsafe due to imperfect rater feedback, it's more reassuring if the CEO says, "Yes, we thought of that, too; we've implemented these-and-such mitigations and are monitoring such-and-these signals which we hope will clue us in if the mitigations start to fail."

If the CEO instead says, "Well, I think our raters are great. Are you insulting our raters?", that does not inspire confidence. The natural inference is that the CEO is mostly interested in this quarter's profits and doesn't really care about safety.

Similarly, the problem with selection effects on approved information, in which your salon can't tell the difference between "Our ideas are good" and "Our ideas look good to us," doesn't have any simple, obvious solution. "Just don't filter information" isn't feasible and doesn't even make any sense; you need some sort of filter because it's not physically possible to read everything and respond to everything.

The problem may or may not ultimately be solvable with some complicated solution involving prediction markets, adversarial collaborations, anonymous criticism channels, or any number of other mitigations I haven't thought of, but the first step on the road to some future complicated solution is acknowledging that the problem is at least potentially real, and having some respect that the problem might be difficult. If alarmed members come to the organizers of the salon with concerns about collective belief distortions due to suppression of information and the organizers meet them with silence, "bowing out", or defensive blustering, rather than "Yes, we thought of that, too," that does not inspire confidence. The natural inference is that the organizers are mostly interested in maintaining the salon's prestige and don't really care about the truth.



Discuss