MoreRSS

site iconUnderstanding AIModify

By Timothy B. Lee, a tech reporter with a master’s in computer science, covers AI progress and policy.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Understanding AI

Six reasons to think there’s an AI bubble — and six reasons not to

2025-11-25 21:03:31

I’m excited to publish this post co-authored with one of my favorite writers, Derek Thompson. Derek recently left the Atlantic to launch his own Substack covering business, technology, science, and politics. It’s one of the few newsletters I read as soon as it hits my inbox, and I bet a lot of Understanding AI readers would enjoy it.


In the last few weeks, something’s troubled and fascinated us about the national debate over whether artificial intelligence is a bubble. Everywhere we look and listen, experts are citing the same small number of statistics, factoids, and studies. The debate is like a board game with a tiny number of usable pieces. For example:

  • Talk to AI bears, and they’ll tell you how much Big Tech is spending.

  • Talk to AI bulls, and they’ll tell you how much Big Tech is making.

  • Talk to AGI believers, and they’ll quote a study on “task length” by an organization called METR.

  • Talk to AGI skeptics, and they’ll quote another study on productivity, also by METR.

Last week, we were discussing how one could capture the entire AI-bubble debate in about 12 statistics that people just keep citing and reciting — on CNBC, on tech podcasts, in Goldman Sachs Research documents, and at San Francisco AI parties. Since everybody seems to be reading and quoting from the same skinny playbook, we thought: What the hell, let’s just publish the whole playbook!

If you read this article, we think you’ll be prepared for just about every conversation about AI, whether you find yourself at a Bay Area gathering with accelerationists or a Thanksgiving debate with Luddite cousins. We think some of these arguments are compelling. We think others are less persuasive. So, throughout the article, we’ll explain both why each argument belongs in the discussion and why some arguments don’t prove as much as they claim. Read to the end, and you’ll see where each of us comes down on the debate.

Let’s start with the six strongest arguments that there is an AI bubble.

All about the Benjamins

When they say: Prove to me that AI is a bubble

You say: For starters, this level of spending is insane

When America builds big infrastructure projects, we often over-build. Nineteenth-century railroads? Overbuilt, bubble. Twentieth-century Internet? Overbuilt, bubble. It’s really nothing against AI specifically to suggest that every time US companies get this excited about a big new thing, they get too excited, and their exuberance creates a bubble.

Five of the largest technology giants — Amazon, Meta, Microsoft, Alphabet, and Oracle — had $106 billion in capital expenditures in the most recent quarter. That works out to almost 1.4% of gross domestic product, putting it on par with some of the largest infrastructure investments in American history.

This chart was originally created by Understanding AI’s Kai Williams, who noted, “not all tech capex is spent on data centers, and not all data centers are dedicated to AI. The spending shown in this chart includes all the equipment and infrastructure a company buys. For instance, Amazon also needs to pay for new warehouses to ship packages.”

Still, AI accounts for a very large share of this spending. Amazon’s CEO, for example, said last year that AI accounted for “the vast majority” of Amazon’s recent capex. And notice that the last big boom on the chart — the broadband investment boom of the late 1990s — ended with a crash. AI investments are now large enough that a sudden slowdown would have serious macroeconomic consequences.

Money for nothing

When they say: But this isn’t like the dot-com bubble, because these companies are for real

You say: I’m not so sure about that…

“It feels like there’s obviously a bubble in the private markets,” said Demis Hassabis, the CEO of Google DeepMind. “You look at seed rounds with just nothing being [worth] tens of billions of dollars. That seems a little unsustainable. It’s not quite logical to me.”

The canonical example of zillions of dollars for zilch in product has been Thinking Machines, the AI startup led by former OpenAI executive Mira Murati. This summer, Thinking Labs raised $2 billion, the largest seed round in corporate history, before releasing a product. According to a September report in The Information, the firm declined to tell investors or the public what they were even working on.

“It was the most absurd pitch meeting,” one investor who met with Murati said. “She was like, ‘So we’re doing an AI company with the best AI people, but we can’t answer any questions.’”

In October, the company launched a programming interface called Tinker. I guess that’s something. Or, at least, it better be something quite spectacular, because just days later, the firm announced that Murati was in talks with investors to raise another $5 billion. This would raise the value of the company to $50 billion—more than the market caps of Target or Ford.

When enterprises that barely have products are raising money at valuations rivaling 100-year-old multinational firms, it makes us wonder if something weird is going on.

Reality check

When they say: Well, AI is making me more productive

You say: You might be deluding yourself

One of the hottest applications of AI right now is programming. Over the last 18 months, millions of programmers have started using agentic AI coding tools such as Cursor, Anthropic’s Claude Code, and OpenAI’s Codex, which are capable of performing routine programming tasks. Many programmers have found that these tools make them dramatically more productive at their jobs.

But a July study from the research organization METR called that into question. They asked 16 programmers to tackle 246 distinct tasks. Programmers estimated how long it would take to complete each task. Then they were randomly assigned to use AI, or not, on a task-by-task basis.

On average, the developers believed that AI would allow them to complete their tasks 24% faster with the help of AI. Even after the fact, developers who used AI thought it had sped them up by 20%. But programmers who used AI took 19% longer, on average, than programmers who didn’t.

We were both surprised by this result when it first came out, and we consider it one of the strongest data points in favor of AI skepticism. While many people believe that AI has made them more productive at their jobs — including both of us — it’s possible that we’re all deluding ourselves. Maybe that will become more obvious over the next year or two and the hype around AI will dissipate.

But it’s also possible that programmers are just in the early stages of the learning process for AI coding tools. AI tools probably speed up programmers on some tasks and slow them down on others. Over time, programmers may get better at predicting which tasks fall into which category. Or perhaps the tools themselves will get better over time — AI coding tools have improved dramatically over the last year.

It’s also possible that the METR results simply aren’t representative of the software industry as a whole. For example, a November study examined 32 organizations that started to use Cursor’s coding agent in the fall of 2024. It found that programmer productivity increased by 26% to 39% as a result.

Infinite money glitch

When they say: But AI is clearly growing the overall economy

You say: Maybe the whole thing is a trillion-dollar ouroboros

Imagine Tim makes some lemonade. He loans Derek $10 to buy a drink. Derek buys Tim’s lemonade for $10. Can we really say that Tim has “earned $10” in this scenario? Maybe no: If Derek goes away, all Tim has done is move money from his left pocket to his right pocket. But maybe yes: If Derek loves the lemonade and keeps buying more every day, then Tim’s bet has paid off handsomely.

Artificial intelligence is more complicated than lemonade. But some analysts are worried that the circular financing scheme we described above is also happening in AI. In September, Nvidia announced it would invest “up to” $100 billion in OpenAI to support the construction of up to 10 gigawatts of data center capacity. In exchange, OpenAI agreed to use Nvidia’s chips for the buildout. The next day, OpenAI announced five new locations to be built by Oracle in a new partnership whose value reportedly exceeds $300 billion. The industry analyst Dylan Patel called this financial circuitry an “infinite money glitch.”

Bloomberg made this chart depicting the complex web of transactions among leading AI companies. This kind of thing sets off alarm bells for people who remember how financial shenanigans contributed to the 2008 financial crisis.

The fear is two-fold: first, that tech companies are shifting money around in a way that creates the appearance of new revenue that hasn’t actually materialized; and second, that if any part of this financial ouroboros breaks, everybody is going down.

In the last few months, OpenAI has announced four deals: with Nvidia, Oracle, and the chipmakers AMD and Broadcom. All four companies saw their market values jump by tens of billions of dollars the day their deals were announced. But, by that same logic, any wobble for OpenAI or Nvidia could reverberate throughout the AI ecosystem.

Something similar happened during the original dot-com bubble. The investor Paul Graham sold a company to Yahoo in 1998, so he had a front-row seat to the mania:

By 1998, Yahoo was the beneficiary of a de facto Ponzi scheme. Investors were excited about the Internet. One reason they were excited was Yahoo’s revenue growth. So they invested in new Internet startups. The startups then used the money to buy ads on Yahoo to get traffic. Which caused yet more revenue growth for Yahoo, and further convinced investors the Internet was worth investing in. When I realized this one day, sitting in my cubicle, I jumped up like Archimedes in his bathtub, except instead of “Eureka!” I was shouting “Sell!”

Are we seeing a similar dynamic with the data center boom? It doesn’t seem like a crazy theory.

Pay no attention to the man behind the curtain

When they say: The hyperscalers are smart companies and don’t need bubbles to grow

You say: So why are they resorting to financial trickery?

Some skeptics argue that big tech companies are concealing the actual cost of the AI buildout.

First, they’re shifting AI spending off their corporate balance sheets. Instead of paying for data centers themselves, they’re teaming up with private capital firms to create joint ventures known as special purpose vehicles (or SPVs). These entities build the facilities and buy the chips, while the spending sits somewhere other than the tech company’s books. This summer, Meta reportedly sought to raise about $29 billion from private credit firms for new AI data centers structured through such SPVs.

Meta isn’t alone. CoreWeave, the fast-growing AI cloud company, has also turned to private credit to fund its expansion through SPVs. These entities transfer risk off the balance sheets of Silicon Valley companies and onto the balance sheets of private-capital limited partners, including pension funds and insurance companies. If the AI bubble bursts, it won’t be just tech shareholders who feel the pain. It will be retirees and insurance policyholders.

To be fair, it’s not clear that anything shady is happening here. Tech companies have plenty of AI infrastructure on their own balance sheets, and they’ve been bragging about that spending in earnings calls, not downplaying it. So it’s not obvious that they are using SPVs in an effort to mislead people.

Second, skeptics argue that tech companies are underplaying the depreciation risk of the hardware that powers AI. Earlier waves of American infrastructure left us with infrastructure that held its value for decades: power lines from the 1940s, freeways from the 1960s, fiber optic cables from the 1990s. By contrast, the best GPUs are overtaken by superior models every few years. The hyperscalers spread their cost over five or six years through an accounting process called depreciation. But if they have to buy a new set of top-end chips every two years, they’ll eventually blow a hole in their profitability.

We don’t dismiss this fear. But the danger is easily exaggerated. Consider the A100 chip, which helped train GPT-4 in 2022. The first A100s were sold in 2020, which makes the oldest units about five years old. Yet they’re still widely used. “In a compute-constrained world, there is still ample demand for running A100s,” Bernstein analyst Stacy Rasgon recently wrote. Major cloud vendors continue to offer A100 capacity, and customers continue to buy it.

Of course, there’s no guarantee that today’s chips will be as durable. If AI demand cools, we could see a glut of hardware and early retirement of older chips. But based on what we know today, it’s reasonable to assume that a GPU purchased now will still be useful five years from now.

A changing debt picture

When they say: The hyperscalers are well-run companies that won’t use irresponsible leverage

You say: That might be changing

A common way for a bubble to end is with too much debt and too little revenue. Most of the Big Tech companies building AI infrastructure — including Google, Microsoft, and Meta — haven’t needed to take on much debt because they can fund the investments with profit. Oracle has been a notable exception to this trend, and some people consider it the canary in the coal mine.

Oracle recently borrowed $18 billion for data center construction, pushing the company’s total debt above $100 billion. The Wall Street Journal reports that “the company’s adjusted debt, a measure that includes what it owes on leases in addition to what it owes creditors, is forecast to more than double to roughly $300 billion by 2028, according to credit analysts at Morgan Stanley.”

At the same time, it’s not obvious that Oracle is going to make a lot of money from this aggressive expansion. There’s plenty of demand: in its most recent earnings call, Oracle said that it had $455 billion in contracted future revenue — a more than four-fold increase over the previous year. But The Information reports that in the most recent quarter, Oracle earned $125 million on $900 million worth of revenue from renting out data centers powered by Nvidia GPUs. That works out to a 14% profit margin. That’s a modest profit margin in a normal business, and it’s especially modest in a highly volatile industry like this one. It’s much smaller than the roughly 70% gross margin Oracle gets on more established services.

The worry for AI skeptics is that customer demand for GPUs could cool off as quickly as it heated up. In theory, that $455 billion figure represents firm customer commitments to purchase future computing services. But if there’s an industry-wide downturn, some customers might try to renegotiate the terms of these contracts. Others might simply go out of business. And that could leave Oracle with a lot of debt, a lot of idle GPUs, and not enough revenue to pay for it all.

And now, the very best arguments against an AI bubble

Read more

An AI “tsunami” is coming for Hollywood — here’s how artists are responding

2025-11-20 04:00:21

For a forthcoming piece I’m looking to talk to people using and building with open-weight models, whether that’s in startups, enterprises or other organizations. I’m happy to talk to people off the record. I’ve opened up some slots on my calendar tomorrow and Friday. If you’re willing to talk to me, please click here to grab a time.


In 2016, the legendary Japanese filmmaker Hayao Miyazaki was shown a bizarre AI-generated video of a misshapen human body crawling across a floor.

Miyazaki declared himself “utterly disgusted” by the technology demo, which he considered an “insult to life itself.”

“If you really want to make creepy stuff, you can go ahead and do it,” Miyazaki said. “I would never wish to incorporate this technology into my work at all.”

Many fans interpreted Miyazaki’s remarks as rejecting AI-generated video in general. So they didn’t like it when, in October 2024, filmmaker PJ Accetturo used AI tools to create a fake trailer for a live-action version of Miyazaki’s animated classic “Princess Mononoke.” The trailer earned him 22 million views on X. It also earned him hundreds of insults and death threats.

“Go generate a bridge and jump off of it,” said one of the funnier retorts. Another urged Accetturo to “throw your computer in a river and beg God’s forgiveness.”

Someone tweeted that Miyazaki “should be allowed to legally hunt and kill this man for sport.”

PJ Accetturo is a director and founder of Genre AI, an AI ad agency. (Photo courtesy of PJ Accetturo)

The development of AI image and video generation models has been controversial, to say the least. Artists have accused AI companies of stealing their work to build tools that put humans out of a job. Using AI tools openly is stigmatized in many circles, as Accetturo learned the hard way.

But as these models have improved, they have sped up workflows and afforded new opportunities for artistic expression. Artists without AI expertise might soon find themselves losing work.

Over the last few weeks, I’ve spoken to nine actors, directors, and creators about how they are navigating these tricky waters. Here’s what they told me.

Subscribe now

The backlash to AI video generation, explained

Actors have emerged as a powerful force against AI. In 2023, SAG-AFTRA, the Hollywood actors’ union, had its longest-ever strike, partly to establish more protections for actors against AI replicas.

Actors have lobbied to regulate AI in their industry and beyond. One actor I talked with, Erik Passoja, has testified before the California legislature in favor of several bills, including for greater protections against pornographic deepfakes. SAG-AFTRA endorsed SB 1047, an AI safety bill regulating frontier models. The union also organized against the proposed moratorium on state AI bills.

A recent flashpoint came in September, when Deadline Hollywood reported that talent agencies were interested in signing “AI actress” Tilly Norwood.

Actors weren’t happy. Emily Blunt told Variety, “This is really, really scary. Come on agencies, don’t do that.”

Natasha Lyonne, star of “Russian Doll,” posted on an Instagram Story: “Any talent agency that engages in this should be boycotted by all guilds. Deeply misguided & totally disturbed.”

The backlash was partly specific to Tilly Norwood — Lyonne is no AI skeptic, having cofounded an AI studio — but it also reflects a set of concerns around AI common to many in Hollywood and beyond.

Here’s how SAG-AFTRA explained its position:

Tilly Norwood is not an actor, it’s a character generated by a computer program that was trained on the work of countless professional performers — without permission or compensation. It has no life experience to draw from, no emotion and, from what we’ve seen, audiences aren’t interested in watching computer-generated content untethered from the human experience. It doesn’t solve any “problem” — it creates the problem of using stolen performances to put actors out of work, jeopardizing performer livelihoods and devaluing human artistry.

This statement reflects three broad criticisms that come up over and over in discussions of AI art:

  • Content theft: Most of the leading AI video models have been trained on broad swathes of the Internet, including images and films made by artists. In many cases, companies have not asked artists for permission to use this content, nor compensated them. Courts are still working out whether this is fair use under copyright law. But many people I talked to consider AI companies’ training efforts to be theft of artists’ work.

  • Job loss: If AI tools can make passable video quickly or drastically speed up editing tasks, that potentially takes jobs away from actors or film editors. While past technological advancements have also eliminated jobs — the adoption of digital cameras drastically reduced the number of people cutting physical film — AI could have an even broader impact.

  • Artistic quality: A lot of people told me they just didn’t think AI-generated content could ever be good art. Tess Dinerstein stars in vertical dramas — episodic programs optimized for viewing on smartphones. She told me that AI is “missing that sort of human connection that you have when you go to a movie theater and you’re sobbing your eyes out because your favorite actor is talking about their dead mom.”

The concern about theft is potentially solvable by changing how models are trained. Around the time Accetturo released the “Princess Mononoke” trailer, he called for generative AI tools to be “ethically trained on licensed datasets.”

Some companies have moved in this direction. For instance, independent filmmaker Gille Klabin told me he “feels pretty good” using Adobe products because the company trains its AI models on stock images that it pays royalties for.

But the other two issues — job losses and artistic integrity — will be harder to finesse. Many creators — and fans — believe that AI-generated content misses the fundamental point of art, which is about creating an emotional connection between creators and viewers.

But while that point is compelling in theory, the details can be tricky.

Dinerstein, the vertical drama actress, told me that she’s “not fundamentally against AI” — she admits “it provides a lot of resources to filmmakers” in specialized editing tasks — but she takes a hard stance against it on social media.

“It’s hard to ever explain gray areas on social media,” she said, and she doesn’t want to “come off as hypocritical.”

Even though she doesn’t think that AI poses a risk to her job — “people want to see what I’m up to” — she does fear people (both fans and vertical drama studios) making an AI representation of her without her permission. And she has found it easiest to just say “You know what? Don’t involve me in AI.”

Others see it as a much broader issue. Actress Susan Spano told me it was “an issue for humans, not just actors.”

“This is a world of humans and animals,” she said. “Interaction with humans is what makes it fun. I mean, do we want a world of robots?”

How one director leaned into AI

It’s relatively easy for actors to take a firm stance against AI because they inherently do their work in the physical world. But things are more complicated for other Hollywood creatives, such as directors, writers, and film editors. AI tools can genuinely make them more productive, and they’re at risk of losing work if they don’t stay on the cutting edge.

So the non-actors I talked to took a range of approaches to AI. Some still reject it. Others have used the tools reluctantly and tried to keep their heads down. Still others have openly embraced the technology.

Kavan Cardoza is a director and AI filmmaker. (Photo courtesy of Phantom X)

Take Kavan Cardoza, for example. He worked as a music video director and photographer for close to a decade before getting his break into filmmaking with AI.

After the image model Midjourney was first released in 2022, Cardoza started playing around with image generation, and later video generation. Eventually, he “started making a bunch of fake movie trailers” for existing movies and franchises. In December 2024, he made a fan film in the Batman universe that “exploded on the Internet,” before Warner Brothers took it down for copyright infringement.

Cardoza acknowledges that he recreated actors in former Batman movies “without their permission.” But he insists he wasn’t “trying to be malicious or whatever. It was truly just a fan film.”

Whereas Accetturo received death threats, the response to Cardoza’s fan film was quite positive.

“Every other major studio started contacting me,” Cardoza said. He set up an AI studio, Phantom X, with several of his close friends. Phantom X started by making ads (where AI video is catching on quickest), but Cardoza wanted to focus back on films.

In June, Cardoza made a short film called “Echo Hunter,” a blend of “Blade Runner” and “The Matrix.” Some shots look clearly AI-generated, but Cardoza used motion capture technology from Runway to put the faces of real actors into his AI-generated world. Overall, the piece pretty much hangs together.

Cardoza wanted to work with real actors because their artistic choices can help elevate the script he’s written: “there’s a lot more levels of creativity to it.” But he needed SAG-AFTRA’s approval to make a film that blends AI techniques with the likenesses of SAG-AFTRA actors. To get it, he had to promise not to re-use the actors’ likenesses in other films.

Subscribe now

“It’s never about if, it’s just when”

In Cardoza’s view, AI is “giving voices to creators that otherwise never would have had the voice.”

But Cardoza isn’t wedded to AI. When an interviewer asked him whether he’d make a non-AI film if required to, he responded “Oh 100%.” Cardoza added that if he had the budget to do it now, “I’d probably still shoot it all live action.”

He acknowledged to me that there will be losers in the transition — “there’s always going to be changes” — but he compares the rise of AI with past technological developments in filmmaking — like the rise of visual effects. This created new jobs making visual effects digitally, but reduced jobs making elaborate physical sets.

Cardoza expressed interest in reducing the amount of job loss. In another interview, Cardoza said that for his film project, “we want to make sure we include as many people as possible,” not just actors, but sound designers and script editors and other specialized roles.

But he believes that eventually, AI will get good enough to do everyone’s job. “Like I say with tech, it’s never about if, it’s just when.”

Accetturo’s entry into AI was similar. He told me that he worked for 15 years as a filmmaker, “mostly as a commercial director and former documentary director.” During the pandemic, he “raised millions” for an animated TV series, but it got caught up in development hell.

AI gave him a new chance at success. Over the summer of 2024, he started playing around with AI video tools. He realized that he was in the sweet spot to take advantage of AI: experienced enough to make something good, but not so established that he was risking his reputation. After Google released Veo 3 in May, Accetturo released a fake medicine ad that went viral. His studio now produces ads for prominent companies like Oracle and Popeyes.

Accetturo says the backlash against him has subsided: “it truly is nothing compared to what it was.” And he says he’s committed to working on AI: “everyone understands that it’s the future.”

“Adapt like cockroaches”

Between the anti- and pro-AI extremes, there’s a lot of editors and artists quietly using AI tools without disclosing it. Unsurprisingly, it’s difficult to find people who will speak about this on the record.

“A lot of people want plausible deniability right now,” according to Ryan Hayden, a Hollywood talent agent. “There is backlash about it.”

But if editors don’t use AI tools, they risk becoming obsolete. Hayden says that he knows a lot of people in the editing field trying to master AI because “there’s gonna be a massive cut” in the total number of editors. Those that know AI might survive.

As one comedy writer involved in an AI project told Wired, “We wanted to be at the table and not on the menu.”

Clandestine AI usage extends into the upper reaches of the industry. Hayden knows an editor who works with a major director who has directed $100 million films. “He’s already using AI, sometimes without people knowing.”

Some artists feel morally conflicted, but don’t think they can effectively resist. Vinny Dellay, a storyboard artist who’s worked on Marvel films and Super Bowl ads, released a video detailing his views on the ethics of using AI as a working artist. Dellay said that he agrees that “AI being trained off of art found on the Internet without getting permission from the artist, it may not be fair, it may not be honest.” But refusing to use AI products won’t stop their general adoption. Believing otherwise is “just being delusional.”

Instead, Dellay said that the right course is to “adapt like cockroaches after a nuclear war.” If they’re lucky, using AI in storyboarding workflows might even “let a storyboard artist pump out twice the boards in half the time without questioning all your life’s choices at 3:00 AM.”

Lines, moral and practical

Gille Klabin is an independent writer, director, and visual effects artist. (Photo by David Solorzano, courtesy of Gille Klabin)

Gille Klabin is an indie director and filmmaker currently working on a feature called “Weekend At The End Of The World.”

As an independent filmmaker, Klabin can’t afford to hire many people. There are many labor-intensive tasks — like making a pitch deck for his film — that he’d otherwise have to do himself. An AI tool “essentially just liberates us to get more done and have more time back in our life.”

But he’s careful to stick to his own moral lines. Any time he mentioned using an AI tool during our interview, he’d explain why he thought that was an appropriate choice. He said he was fine with AI use “as long as you’re using it ethically in the sense that you’re not copying somebody’s work and using it for your own.”

Drawing these lines can be difficult, however. Hayden, the talent agent, told me that as AI tools make low-budget films look better, it gets harder to make high-budget films, which employ the most people at the highest wage levels.

If anything, Klabin’s AI uptake is limited more by the current capabilities of AI models. Klabin is an experienced visual effects artist, and he finds AI products to generally be “not really good enough to be used in a final project.”

He gave me a concrete example. Rotoscoping is a process where you trace out the subject of the shot so you can edit the background independently. It’s very labor-intensive — one has to edit every frame individually — so Klabin has tried using Runway’s AI-driven rotoscoping. While it can make for a decent first pass, the result is just too messy to use as a final project.

Klabin sent me this GIF of a series of rotoscoped frames from his upcoming movie. While the model does a decent job of identifying the people in the frame, its boundaries aren’t consistent from frame to frame. The result is noisy.

AI-rotoscoped frames from “Weekend At The End Of The World.” (Courtesy of Gille Klabin)

Current AI tools are full of these small glitches, so Klabin only uses them for tasks that audiences don’t see (like creating a movie pitch deck) or in contexts where he can clean up the result afterwards.

The power of authenticity

Stephen Robles reviews Apple products on YouTube and other platforms. He uses AI in some parts of the editing process, such as removing silences or transcribing audio, but doesn’t see it as disruptive to his career.

Stephen Robles is a YouTuber, podcaster, and creator covering tech, particularly Apple. (Photo courtesy of Stephen Robles)

“I am betting on the audience wanting to trust creators, wanting to see authenticity,” he told me. AI video tools don’t really help him with that and can’t replace the reputation he’s sought to build.

Recently, he experimented with using ChatGPT to edit a video thumbnail (the image used to advertise a video). He got a couple of negative reactions about his use of AI, so he said he “might slow down a little bit” with that experimentation.

Robles didn’t seem as concerned about AI models stealing from creators like him. When I asked him about how he felt about Google training on his data, he told me that “YouTube provides me enough benefit that I don’t think too much about that.”

Professional thumbnail artist Antioch Hwang has a similarly pragmatic view towards using AI. Some channels he works with have audiences that are “very sensitive to AI images.” Even using “an AI upscaler to fix up the edges” can provoke strong negative reactions. For those channels, he’s “very wary” about using AI.

Antioch Hwang is a YouTube thumbnail artist. (Photo courtesy of Antioch Creative)

But for most channels he works for, he’s fine using AI, at least for technical tasks. “I think there’s now been a big shift in the public perception of these AI image generation tools,” he told me. “People are now welcoming them into their workflow.”

He’s still careful with his AI use, though, because he thinks that having human artistry helps in the YouTube ecosystem. “If everyone has all the [AI] tools, then how do you really stand out?” he said.

Recently, top creators have started using more rough-looking thumbnails for their videos. AI has made polished thumbnails too easy to create, so top creators are using what Hwang would call “poorly made thumbnails” to help videos stand out.

Subscribe now

Exit strategies

Hwang told me something surprising: even as AI makes it easier for creators to make thumbnails themselves, business has never been better for thumbnail artists, even at the lower end. He said that demand has soared because “AI as a whole has lowered the barriers for content creation, and now there’s more creators flooding in.”

Still, Hwang doesn’t expect the good times to last forever. “I don’t see AI completely taking over for the next three-ish years. That’s my estimated timeline.”

Everyone I talked to had different answers to when — if ever — AI would meaningfully disrupt their part of the industry.

Some, like Hwang, were pessimistic. Actor Erik Passoja told me he thought the big movie studios — like Warner Brothers or Paramount — would be gone in three to five years.

But others were more optimistic. Tess Dinerstein, the vertical drama actor, said that “I don’t think that verticals are ever going to go fully AI.” Even if it becomes technologically feasible, she argued, “that just doesn’t seem to be what the people want.”

Gille Klabin, the independent filmmaker, thought there would always be a place for high-quality human films. If someone’s work is “fundamentally derivative,” then they are at risk. But he thinks the best human-created work will still stand out. “I don’t know how AI could possibly replace the borderline divine element of consciousness,” he said.

The people who were most bullish on AI were, if anything, the least optimistic about their own career prospects. “I think at a certain point it won’t matter,” Kavan Cardoza told me. “It’ll be that anyone on the planet can just type in some sentences” to generate full, high-quality videos.

This might explain why Accetturo has become something of an AI evangelist; his newsletter tries to teach other filmmakers how to adapt to the coming AI revolution.

AI “is a tsunami that is gonna wipe out everyone” he told me. “So I’m handing out surfboards — teaching people how to surf. Do with it what you will.”

Waymo’s next five cities are all in red states

2025-11-19 04:12:58

Understanding AI will once again participate in the Tarbell Fellowship program in 2026. Fellows receive three months of journalism training (starting in June 2026) and then get placed with a news organization like Understanding AI. This year’s Tarbell fellow, Kai Williams, has done some amazing work already. The fellowship pays $60,000 to $80,000—or more for experienced journalists. The deadline to apply is January 7. Apply here!


On Tuesday, Waymo announced driverless testing in five cities: Dallas, Houston, San Antonio, Miami, and Orlando. Driverless testing begins immediately in Miami, while the other four cities will begin “over the coming weeks.” Waymo says commercial service will launch in all five cities in 2026.

It’s a palpable acceleration of Waymo’s commercialization efforts.

Waymo launched its first fully driverless service in Phoenix in 2020. Commercial service began in San Francisco and Los Angeles in 2024, with Austin and Atlanta arriving in 2025. Now Waymo is signaling that it will add at least five more cities to its lineup in 2026.

Subscribe now

And probably several other cities as well. Waymo has previously announced 2026 launch plans in six other US cities — Denver, Detroit, Las Vegas, Nashville, San Diego, and Washington DC — plus London. None of these cities has begun driverless testing yet. But if all goes according to plan, Waymo will be offering service in at least 17 cities by the end of next year — more than triple the number Waymo serves today.

I don’t think it’s a coincidence that today’s announcement comes on the heels of last week’s news that Waymo will begin commercial service on freeways. A taxi service that didn’t operate on freeways would have been a non-starter in famously sprawling cities like Dallas or Houston.

It’s also not a coincidence that all of the new driverless cities are in Texas or Florida, Republican-dominated states with favorable regulatory climates.

Waymo announced plans to expand to Washington DC way back in March. As a DC resident, I can confirm that the company has been testing in the city for months. But so far, all of that testing has been conducted with safety drivers. DC law doesn’t provide Waymo with a clear path to launch a driverless commercial service in the city. And the city council has been dragging its feet on legislation to change that.

The story is similar in a few other blue states:

  • Massachusetts law doesn’t provide a clear path to deploy robotaxis in Boston. Instead of trying to fix that, Boston officials have been considering legislation to explicitly ban driverless vehicles.

  • In Washington state, the law allows testing of autonomous vehicles but doesn’t offer a clear path toward commercialization. Legislation to authorize robotaxis in the state hasn’t gotten traction, and other legislators have proposed a bill that would ban the technology.

  • New York law requires a person to have “at least one hand on the wheel.” Legislation to change that hasn’t made much progress through the state legislature.

So although Waymo has done some preliminary testing in Seattle, Boston, Buffalo, and New York City, the company hasn’t been able to begin driverless testing — the last major step before commercialization. This has prompted Waymo to focus its testing efforts on red states with more favorable laws.

Subscribe now

I hope Democratic lawmakers think carefully about the pattern that is emerging here.

In a couple of years, driverless taxis are going to be ubiquitous across the Republican-dominated Sun Belt. Left-leaning cities like Boston and New York like to think of themselves as progressive, cutting-edge jurisdictions. So it would be awkward if they blocked the introduction of a technology that has the potential to reduce crashes while saving passengers money.

It doesn’t have to be this way. Indeed, not all blue states have been hostile to autonomous vehicles. Under the leadership of Governor Gavin Newsom, California has allowed Waymo to deploy its technology in San Francisco and Los Angeles. Last week, Waymo announced it would extend its robotaxi service from San Francisco all the way down the peninsula to San Jose. Waymo plans to launch in San Diego next year.

The situation is similar in Colorado, where Governor Jared Polis has supported autonomous vehicles. Earlier this year, Polis vetoed Teamster-backed legislation that would have banned large driverless trucks in Colorado. Waymo is planning to launch in Denver in 2026.

Three purple states — Michigan, Nevada, and Pennsylvania — also have favorable laws. This may allow Waymo to expand to Las Vegas and Detroit in 2026 and Philadelphia soon after that.

But policymakers in some blue states — as well the District of Columbia, where I live — seem happy to let Republican-dominated states adopt the technology first.

Waymo is finally ready for freeway service

2025-11-13 06:55:03

In May 2023, Waymo expanded its service area to cover a larger chunk of the Phoenix metro area. Soon afterwards YouTuber Lorraine decided to pit Waymo and Tesla against one another in a head-to-head race across this expanded territory.

By taking Interstate 10, the Tesla vehicle was able to complete the 21-mile trip in about 26 minutes. Waymo avoided freeways, which meant it took nearly 55 minutes — more than twice as long — to reach the same destination.

“I’m not sure why Waymo is restricted from using freeways, but as their map grows, freeway support will become necessary,” Lorraine said.

Waymo operates mainly in sprawling southern cities like Phoenix, Los Angeles, and Atlanta. In places like this, the inability to access freeways is a huge handicap. Even in San Francisco, the most urban place Waymo operates, many trips take longer because Waymo doesn’t use Interstate 280.

Now all that is about to change. On Wednesday, Waymo announced it would begin offering freeway service to customers in Phoenix, Los Angeles, and the San Francisco Bay Area. It’s a big technological leap that transforms Waymo’s service from a novelty to a serious rival to Uber and Lyft.

There are several other driverless taxi companies, including Tesla and Amazon’s Zoox. But none of them have begun offering driverless freeway service. Tesla still has a human safety operator in every commercial vehicle. Zoox offers driverless rides, but I haven’t seen any reports of Zoox providing freeway service to the general public.

Waymo says it will route customers onto freeways “when a freeway route is meaningfully faster.” Its vehicles will obey the speed limit except in rare situations. Given how common speeding is on freeways, this could mean that Waymo vehicles wind up moving significantly slower than the flow of traffic.

In this piece I want to answer two questions: Why did it take so long for Waymo to begin offering driverless freeway service? And what’s next for the company?

Why freeway service is so difficult

A Waymo vehicle on a freeway in Los Angeles. (Photo courtesy of Waymo)

In many ways, freeways are less complex than surface streets. A vehicle just needs to stay in its lane and maintain a safe distance from the vehicle ahead. It is unlikely to encounter pedestrians, cyclists, or complicated four-way intersections.

This is why plenty of self-driving projects — including Tesla’s — experimented first with freeways before moving on to surface streets.

By 2013 Waymo, then called the Google self-driving project, had already developed technology that could navigate on freeways under the supervision of a Google employee.

But while it’s fairly easy to build a freeway driving system that operates under human supervision, it’s very difficult to dispense with the driver.

Read more

AI ads are going mainstream

2025-11-13 01:58:00

On October 6, the Internet turned on Taylor Swift. As part of the marketing campaign for her new album, Swift’s team announced a series of 12 short videos hidden across the world on QR codes. These videos, hosted on Taylor Swift’s YouTube account, appeared to be AI-generated. There were certain visual glitches — like messed up text or inconsistent lighting — which clued fans in.

Many fans were mad. Some pointed out that Swift had previously called out AI deepfakes in her 2024 endorsement of Kamala Harris. Self-described Swiftie Alyssa Yung told Rolling Stone, “The most disappointing aspect of this is how utterly hypocritical the use of AI is on Taylor’s project.”

“She’s a BILLIONAIRE. She can afford to pay artists and shoot short videos like those,” one person tweeted. The hashtag #SwiftiesAgainstAI started trending, and several outlets covered the controversy.

However, no one was able to definitively prove that the videos used AI. Swift’s representatives did not reply to media requests, and the story soon died down.

Using a new Google tool called SynthID, I was able to confirm that at least one of Swift’s videos, the “Berlin” video, was made with Google’s video generation tools. The tool works by detecting a watermark that Google hides in AI content generated by its tools. So a positive result is strong evidence that the video was made using a Google AI product. (Swift’s publicist did not respond to a request for comment).

If Taylor Swift — arguably the most famous musician in the world — is using AI to promote an album, the technology has clearly entered the mainstream.

And Swift isn’t alone. AI-generated ads have become more and more prevalent in recent months. Companies large and small have released AI-generated ads — some without even disclosing it.

Others plan to release AI ads soon. A survey by the Interactive Advertising Bureau found that around 30% of digital video ads this year will be made or enhanced using generative AI. The survey predicted that the number will rise to 39% next year.

Subscribe now

Some people think it’ll happen even faster.

“I think, realistically, we could be three to four years away to where every ad you see on television is created by AI,” said Kavan Cardoza, a filmmaker who co-founded an AI-based studio, PhantomX.

AI allows companies to generate ads at a fraction of the cost of traditional methods. And while early AI-generated ads sparked some controversy, that backlash doesn’t seem to have been strong enough to stop the trend toward ever more AI-generated ads.

A brief history of AI-generated advertising

Admakers have been experimenting with AI tools for close to a decade, but until recently those experiments were pretty limited:

  • In 2016, IBM trained a custom machine learning model to select clips for a movie trailer that a human subsequently edited together.

  • Also in 2016, a Japanese ad agency developed an “AI creative director” to help design ads. Its first project: an ad for Clorets mint tabs.

  • In 2019, Lexus released an ad whose script had been written by an AI model trained on previous car ads.

In these cases, the use of AI was a gimmick that helped drive interest in the ad. Also, the AI was doing conceptual work — choosing clips, developing concepts, or writing scripts — but it wasn’t generating the actual video content. It’s only in the last couple of years that AI has become sophisticated enough to do that.

One key step came in April 2023, when the Republican National Committee released an attack ad against Joe Biden. Generative AI tools were used to generate still images that appeared in the ad. These images look pretty fake: Biden’s face resembles a wax figure in the frame I screenshotted below. Still, the ad racked up over 350,000 views on YouTube.

The RNC didn’t use AI to generate any video clips because the technology wasn’t ready, as illustrated by this horrifying clip of Will Smith eating spaghetti:

Image and video generation have improved dramatically since then.

In June 2025, the first major viral AI ad aired during the NBA Finals. The ad for the prediction market Kalshi depicted a series of zany scenarios that users could bet on. The ad only cost around $2,000 to make.

That ad was made using Veo 3, a Google-made model released in May 2025. Veo 3 offered substantially improved generation quality and controllability compared to earlier models.

Most consumers don’t seem to mind AI ads

The first few AI-generated ads were controversial. For example, when Coca-Cola released an AI-generated ad last year, it touched a nerve with some artists. Alex Hirsch, the creator of the animated series Gravity Falls, tweeted that “Coca-Cola is ‘red’ because it’s made from the blood of out-of-work artists!”

Undeterred, Coca-Cola released another ad last week. While it also generated some criticism, the reaction seemed quieter this year.

Google’s use of AI-generated ads has attracted even less negative attention. In late October, Google released an ad showing an animated turkey leaving a farm to escape Thanksgiving.

Google didn’t disclose that the ad was AI-generated. Robert Wong, the co-founder of Google’s in-house marketing team, told the Wall Street Journal that consumers don’t care whether an ad was made with AI or not.

Wong might be right. The Wall Street Journal claimed that the turkey ad was Google’s first completely AI-generated ad. In fact, according to SynthID, Google used AI to generate most of the visuals for an earlier ad about using Google as a study tool (If you look closely at the first frame below, you can see that the writing on the notebook looks like gobbledygook). But outside of a few commenters on YouTube, no one seemed to notice.

SynthID result for the first frame of Google’s “Feeling stuck? Just ask Google.” ad. The lack of detection in the middle of the shot may be due to later editing or that the pattern is too simple to embed the watermark. Almost every frame in the ad contains the SynthID watermark.

All this points to a world where AI-generated ads become an increasingly important part of companies’ marketing strategies, even without being disclosed. Given how much faster and cheaper AI can be — Coca-Cola’s CMO told the Wall Street Journal it only took them a month to make their most recent ad compared to a year for a traditional ad — it may be that AI-generated ads will almost completely replace human-filmed ads.

But it might take a while. One reason is that some companies will try to stand out by touting their use of traditional techniques.

For example, BMW recently released a series of ads contrasting its cars with AI-generated slop. One starts with a video of a pigeon skateboarding before revealing that that clip is fake. The voiceover continues, “In a world where it’s hard to tell what’s genuine, it’s nice to know you can trust a BMW certified vehicle.”

Other brands may take a similar approach, trying to position themselves as authentic by not using AI.

Subscribe now

The nuts and bolts of AI admaking

Another reason the AI transition won’t happen overnight: AI tools still have significant limitations.

At the low end of the market, some people are probably using fully AI-generated ads—including ads that promote scams. But fully AI-generated ads don’t yet meet the needs of mainstream advertising clients.

AI can generate short video clips that can be hard to distinguish from real footage. But it takes significant human effort to turn a series of these clips into a polished ad.

I talked with PJ “Ace” Accetturo in early October. He runs Genre AI, the AI ad agency that created the Kalshi ad. He told me that his studio uses “the standard production process, just truncated and there’s no live-action shoot.”

After an ad concept has been approved by the client, a writer still writes a script and a director still directs it. However, instead of a human artist making sketches for the storyboard, an “AI cinematographer” uses AI tools to create a sequence of still images. Once the storyboard has been approved, an animator uses AI tools to turn each image into a video clip. A human editor then edits those clips together, does sound design, adds music, and so forth. There’s no need for the writer, director, or editor to be “AI native.”

This process isn’t universal, of course. Another filmmaker who works on AI ads, Kavan Cardoza, said that he sometimes works on “hybrid” ads. In one recent ad, another studio filmed real actors, and then Cardoza used AI for visual effects work.

Human involvement is shifting rapidly. Accetturo told me that Sora 2 is “extremely disruptive” to his business model because it can do an okay imitation of a complete clip with barely any human involvement. For a certain part of the ad market, the whole ad process might become something as simple as a single person typing in a prompt, generating a bunch of outputs, and choosing the best to publish.

In the meantime, AI studios like Accetturo’s are disrupting traditional ad agencies.

The limitations

As we’ve covered before, AI models still make mistakes and model reality incorrectly. Fans identified the Taylor Swift ads as AI-generated because of small glitches in text and the physics of the layout.

Consistency is a particular challenge. Dino Burbido created this graphic of all the different ways that the most recent Coca-Cola ad depicts the company’s trucks. Of the ten clips including trucks, there are eight unique wheel configurations!

To combat this, good ads can take hundreds or thousands of individual generations to produce compelling content — Accetturo called this dynamic “slot-machine pulls.” A high level of human involvement helps to deliver a more consistent vision. But even a large number of generations doesn’t ensure quality — the Coca-Cola ad apparently required more than 70,000 video clips.

Over time, better AI video generation models will probably take care of many of these issues. Already, some of the best AI-generated ads are almost impossible to distinguish from reality, such as this Nike spec ad by Karsten Winegeart.

But even as the quality level rises, making small revisions to generated content may continue to be a challenge. Gille Klabin, an independent filmmaker and director, told me that advertisers are exacting clients. They’ll ask for small tweaks, like rotating the product by ten degrees in a shot, which are impossible to do with current AI tools.

Klabin says “the level of specificity is not there, and the level of hallucinations is a lot.“ Even if you can get the model to rotate the product, maybe it gets the logo wrong.

Accetturo acknowledges that this is a challenge. He says he often tells clients, “Stop nitpicking this. This is an AI ad. It costs less.”

But Accetturo also says that this dynamic influences what types of ads his team will make. His studio specializes in comedy ads, because they require less specificity — and don’t need to feel as authentic. With comedy ads, Accetturo says that audiences think “You’re giving me my dopamine. I’ll watch the ad, this is hilarious.”

Even if a lack of steerability continues for a while, it may not matter if AI-generated content is cheap enough.

Subscribe now

Context rot: the emerging challenge that could hold back LLM progress

2025-11-11 00:33:51

Many people believe that the next frontier for large language models is task length. A March study from the research organization METR documented that large language models have steadily gotten better at performing software engineering tasks that require significant time when performed by a human being. If anything, progress seems to be accelerating this year. Here’s an updated version of their chart:

If this trend continues, in a few years LLMs will be able to complete tasks that take human programmers multiple days. Maybe a few years after that it’ll be weeks, and then months. If the trend continues long enough, we could wind up with models that can take over large-scale software engineering projects, putting many human programmers out of work and further accelerating AI progress.

I don’t doubt that the trend toward longer task lengths still has some room to run. But I suspect that relatively soon, we’re going to bump up against fundamental limitations of the attention mechanism underlying today’s leading LLMs.

With attention, an LLM effectively “thinks about” every token in its context window before generating a new token. That works fine when there are only a few thousand tokens in the context window. But it gets more and more unwieldy as the number of tokens grows into the hundreds of thousands, millions, and beyond.

An analogy to the human brain helps to illustrate the problem. As I sit here writing this article, I’m not thinking about what I ate for breakfast in 2019, the acrimonious breakup I had in 2002, or the many episodes of Star Trek I watched in the 1990s. If my brain were constantly thinking about these and thousands of other random topics, I’d be too distracted to write a coherent essay.

But LLMs do get distracted as more tokens are added to their context window — a phenomenon that has been dubbed “context rot.” Anthropic researchers explained it in a September blog post:

Context must be treated as a finite resource with diminishing marginal returns. Like humans, who have limited working memory capacity, LLMs have an “attention budget” that they draw on when parsing large volumes of context. Every new token introduced depletes this budget by some amount, increasing the need to carefully curate the tokens available to the LLM.

This attention scarcity stems from architectural constraints of LLMs. LLMs are based on the transformer architecture, which enables every token to attend to every other token across the entire context. As its context length increases, a model’s ability to capture these pairwise relationships gets stretched thin, creating a natural tension between context size and attention focus.

The blog post went on to discuss context engineering, a suite of emerging techniques for helping LLMs stay focused by removing extraneous tokens from their context windows.

Those techniques are fine as far as they go. But I suspect they can only mitigate the underlying problem. If we want LLMs to reason effectively over much longer contexts, we may have to fundamentally rethink how LLMs work.

Structure matters

In college I paid my rent by working as a web programmer for the University of Minnesota. One of my first projects was to build a simple web application powered by a relational database. It worked fine in testing, but it became glacially slow with real user data. I didn’t understand why.

When I asked a more experienced programmer about it, his first question was “did you add an index to the database?”

“What’s an index?” I asked.

I soon learned that a database index works a lot like the index of a book.

Suppose you’re trying to find the first page in a history book that mentions Abraham Lincoln. If the book has no index, you’ll have to scan every page. This might take several minutes if it’s a long book. But if there is an index, its alphabetical structure will allow you to find the right page in a few seconds.

A database index has the same basic function: organize information so it’s easy to find. As I learned the hard way, an index becomes more and more necessary as data is added to a database.

This kind of scaling analysis is fundamental to any computer science curriculum. As a computer science major, I learned how to determine whether a computer program will scale gracefully or — like my database with no index — choke when applied to large data sets.

So when I started to study how large language models work, I was shocked to learn that one of the foundational concepts, the attention mechanism, has terrible scaling properties. Before an LLM generates a new token, it compares the most recent token to every previous token in its context window. This means that an LLM consumes more and more computing power — per token — as its context window grows.

If there are 101 previous tokens, it takes 100 attention operations to generate the next token. If there are 1,001 previous tokens, it takes 1,000 attention operations. And these costs are per token, so a session with 10 times more tokens takes about 100 times more computing power.1

Good programmers try to avoid using algorithms like this. Unfortunately, nobody has found a viable alternative to attention.

So AI companies have tried to overcome the problem with engineering muscle instead. They’ve developed clever algorithms like FlashAttention that minimize the computational cost of each attention operation. And they’ve built massive data centers optimized for attention calculations. For a while, these efforts had impressive results: context windows grew from 4,096 tokens in 2022 to a million tokens in early 2024.

Industry leaders hope to continue this trend with even more engineering muscle. In a July interview with Alex Kantrowitz, Anthropic CEO Dario Amodei said that “there’s no reason we can’t make the context length 100 million words today, which is roughly what a human hears in their lifetime.”

I don’t doubt that Anthropic could build an LLM with a context window of 100 million tokens if it really wanted to — though using it might be stupendously expensive. But I don’t think anyone will be happy stopping at 100 million tokens.

For one thing, that 100 million figure seems like an underestimate for the number of tokens humans “process” over a lifetime. Studies show the average adult speaks around 15,000 words per day — which works out to around 400 million words over a lifetime. Presumably, most people hear a similar number of words, and read a lot of words as well. They also experience a lot of images, sounds, smells, and other sensations. If we represent all of those experiences as tokens, I bet the total would comfortably exceed 1 billion.

Moreover, AI companies aren’t just trying to match human performance, they’re trying to dramatically exceed it. That could easily require models to process a lot more.

More context, more problems

But there’s also a deeper problem. Today’s leading LLMs don’t effectively use the million-token context windows they already have. Their performance predictably degrades as more information is included in the context window.

In November 2023, OpenAI released GPT-4 Turbo, the first model with 128,000 tokens of context. Later that same month, Anthropic released Claude 2.1, the first model with 200,000 tokens of context.

Greg Kamradt was one of the first people to perform a needle-in-a-haystack test on these models. He took a long document and randomly inserted a “needle” sentence like “The best thing to do in San Francisco is eat a sandwich and sit in Dolores Park on a sunny day.”

Then he’d ask an LLM “what is the best thing to do in San Francisco?” and see if it could answer. He found that both GPT-4 Turbo and Claude 2.1 performed worse on this task as the context length increased — especially if the “needle” was in the middle of the document:

Frontier labs worked hard to improve performance on this kind of task. By the time Anthropic released Claude 3 in March 2024, needle-in-a-haystack performance was a lot better. But this is the simplest possible test of long-context performance. What about harder problems?

In February 2025, a team of researchers at Adobe published research on a more difficult variant of the needle-in-a-haystack test. Here the “needle” was a sentence like “Yuki lives next to the Semper Opera House,” and the model would be asked “Which character has been to Dresden?”

To answer this question, you need to know that the Semper Opera House is in Dresden. Leading language models do know this, so if you give them this challenge in a short prompt (a small “haystack”) they tend to get it right more than 90% of the time. But if you give them the same challenge in a larger “haystack” — for example, a 32,000-token prompt — accuracy drops dramatically:

  • GPT-4o goes from 99% to 70%

  • Claude 3.5 Sonnet goes from 88% to 30%

  • Gemini 2.5 Flash goes from 94% to 48%

  • Llama 4 Scout goes from 82% to 22%

Long-context performance dropped even further when the researchers asked “which character has been to the state of Saxony.” This question required the model to recognize that the Semper Opera House is in Dresden and that Dresden is in Saxony. The longer the context got, the worse models tended to do on questions like this that required two reasoning “hops”:

So not only do LLMs perform worse as more tokens are added to their context, they exhibit more severe performance degradation on more complex tasks.2 I think this bodes poorly for getting LLMs to do the kind of work that takes human workers days, weeks, or even months. These tasks will not only require a lot of tokens, they’re also far more complex than contrived needle-in-a-haystack benchmarks.

The curse of context rot

Photo by Serhii Luzhevskyi via iStock Editorial / Getty Images Plus.

And indeed, technologists have noticed that LLM performance on real-world tasks tends to decline as contexts get longer.

In June, a Hacker News commenter coined the phrase “context rot” to describe the phenomenon where LLMs become less effective as the size of their context grows. The startup Chroma published a widely read study on the phenomenon in July.


Click here for a 15 percent discount on an annual subscription!


No one fully understands how LLMs work, so it’s hard to say exactly why context rot happens. But here’s how I think about it.

Read more