MoreRSS

site iconUnderstanding AIModify

By Timothy B. Lee, a tech reporter with a master’s in computer science, covers AI progress and policy.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Understanding AI

Waymo and Tesla’s self-driving systems are more similar than people think

2025-12-18 06:01:17

The transformer architecture underlying large language models is remarkably versatile. Researchers have found many use cases beyond language, from understanding images to predicting the structure of proteins to controlling robot arms.

The self-driving industry has jumped on the bandwagon too. Last year, for example, the autonomous vehicle startup Wayve raised $1 billion. In a press release announcing the round, Wayve said it was “building foundation models for autonomy.”

“When we started the company in 2017, the opening pitch in our seed deck was all about the classical robotics approach,” Wayve CEO Alex Kendall said in a November interview. That approach was to “break down the autonomy problem into a bunch of different components and largely hand-engineer them.”

Wayve took a different approach, training a single transformer-based foundation model to handle the entire driving task. Wayve argues that its network can more easily adapt to new cities and driving conditions.

Tesla has been moving in the same direction.

Subscribe now

“We used to work on an explicit, modular approach because it was so much easier to debug,” said Tesla AI chief Ashok Elluswamy at a recent conference. “But what we found out was that codifying human values was really difficult.”

So a couple of years ago, Tesla scrapped its old code in favor of an end-to-end architecture. Here’s a slide from Elluswamy’s October presentation:

Conventional wisdom holds that Waymo has a dramatically different approach. Many people — especially Tesla fans — believe that Tesla’s self-driving technology is based on cutting-edge, end-to-end AI models, while Waymo still relies on a clunky collection of handwritten rules.

But that’s not true — or at least it greatly exaggerates the differences.

Last year, Waymo published a paper on EMMA, a self-driving foundation model built on top of Google’s Gemini.

“EMMA directly maps raw camera sensor data into various driving-specific outputs, including planner trajectories, perception objects, and road graph elements,” the researchers wrote.

Although the EMMA model was impressive in some ways, the Waymo team noted that it “faces challenges for real-world deployment,” including poor spatial reasoning ability and high computational costs. In other words, the EMMA paper described a research prototype — not an architecture that was ready for commercial use.

But Waymo kept refining this approach. In a blog post last week, Waymo pulled back the curtain on the self-driving technology in its commercial fleet. It revealed that Waymo vehicles today are controlled by a foundation model that’s trained in an end-to-end fashion — just like Tesla and Wayve vehicles.

For this story, I read several Waymo research papers and watched presentations by (and interviews with) executives at Waymo, Wayve, and Tesla. I also had a chance to talk to Waymo co-CEO Dmitri Dolgov. Read on for an in-depth explanation of how Waymo’s technology works, and why it’s more similar to rivals’ technology than many people think.

Thinking fast and slow

Some driving scenarios require complex, holistic reasoning. For example, suppose a police officer is directing traffic around a crashed vehicle. Navigating this scene not only requires interpreting the officer’s hand signals, it also requires reasoning about the goals and likely actions of other vehicles as they navigate a chaotic situation. The EMMA paper showed that LLM-based models can handle these complex situations much better than a traditional modular approach.

But foundation models like EMMA also have real downsides. One is latency. In some driving scenarios, a fraction of a second can make the difference between life and death. The token-by-token reasoning style of models like Gemini can mean long and unpredictable response times.

Traditional foundation models are also not very good at geometric reasoning. They can’t always judge the exact locations of objects in an image. They might also overlook objects or hallucinate ones that aren’t there.

So rather than relying entirely on an EMMA-style vision-language model (VLM), Waymo placed two neural networks side by side. Here’s a diagram from Waymo’s blog post:

Let’s start by zooming in on the lower-left of the diagram:

VLM here stands for vision-language model — specifically Gemini, the Google AI model that can handle images as well as text. Waymo says this portion of its system was “trained using Gemini” and “leverages Gemini’s extensive world knowledge to better understand rare, novel, and complex semantic scenarios on the road.”

Compare that to EMMA, which Waymo described as maximizing the “utility of world knowledge” from “pre-trained large language models” like Gemini. The two approaches are very similar — and both are similar to the way Tesla and Wayve describe their self-driving systems.

“Milliseconds really matter”

But the model in today’s Waymo vehicles isn’t just an EMMA-like vision-language model — it’s a hybrid system that also includes a module called a sensor fusion encoder that is depicted in the upper-left corner of Waymo’s diagram:

This module is tuned for speed and accuracy.

“Imagine a latency-critical safety scenario where maybe an object appears from behind a parked car,” Waymo co-CEO Dmitri Dolgov told me. “Milliseconds really matter. Accuracy matters.”

Whereas the VLM (the blue box) considers the scene as a whole, the sensor fusion module (the yellow box) breaks the scene into dozens of individual objects: other vehicles, pedestrians, fire hydrants, traffic cones, the road surface, and so forth.

It helps that every Waymo vehicle has lidar sensors that measure the distance to nearby objects by bouncing lasers off of them. Waymo’s software matches these lidar measurements to the corresponding pixels in camera images — a process called sensor fusion. This allows the system to precisely locate each object in three-dimensional space.

In early self-driving systems, a human programmer would decide how to represent each object. For example, the data structure for a vehicle might record the type of vehicle, how fast it’s moving, and whether it has a turn signal on.

But a hand-coded system like this is unlikely to be optimal. It will save some information that isn’t very useful while discarding other information that might be crucial.

“The task of driving is not one where you can just enumerate a set of variables that are sufficient to be a good driver,” Dolgov told me. “There’s a lot of richness that is very hard to engineer.”

Waymo co-CEO Dmitri Dolgov. (Image courtesy of Waymo)

So instead, Waymo’s model learns the best way to represent each object through a data-driven training process. Waymo didn’t give me a ton of information about how this works, but I suspect it’s similar to the technique described in the 2024 Waymo paper called “MoST: Multi-modality Scene Tokenization for Motion Prediction.”

The system described in the MoST paper still splits a driving scene up into distinct objects as in older self-driving systems. But it doesn’t capture a set of attributes chosen by a human programmer. Rather, it computes an “object vector” that captures information that’s most relevant for driving — and the format of this vector is learned during the training process.

“Some dimensions of the vector will likely indicate whether it’s a fire truck, a stop sign, a tree trunk, or something else,” I wrote in an article last year. “Other dimensions will represent subtler attributes of objects. If the object is a pedestrian, for example, the vector might encode information about the position of the pedestrian’s head, arms, and legs.”

There’s an analogy here to LLMs. An LLM represents each token with a “token vector” that captures the information that’s most relevant to predicting the next token. In a similar way, the MoST system learns to capture the information about objects that are most relevant for driving.

I suspect that when Waymo says its sensor fusion module outputs “objects, sensor embeddings” in the diagram above, this is a reference to a MoST-like system.

How does the system know which information to include in these object vectors? Through end-to-end training of course!

This is the third and final module of Waymo’s self-driving system, called the world decoder.

It takes inputs from both the sensor fusion encoder (the fast-thinking module that breaks the scene into individual objects) and the driving VLM (the slow-thinking module that tries to understand the scene as a whole). Based on information supplied by these modules, the world decoder tries to decide the best action for a vehicle to take.

During training, information flows in the opposite direction. The system is trained on data from real-world situations. If the decoder correctly predicts the actions taken in the training example, the network gets positive reinforcement. If it guesses wrong, then it gets negative reinforcement.

These signals are then propagated backward to the other two modules. If the decoder makes a good choice, signals are sent back to the yellow and blue boxes encouraging them to continue doing what they’re doing. If the decoder makes a bad choice, signals are sent back to change what they’re doing.

Based on these signals, the sensor fusion module learns which information is most helpful to include in object vectors — and which information can be safely left out. Again, this is closely analogous to LLMs, which learn the most useful information to include in the vectors that represent each token.

Subscribe now

Modular networks can be trained end-to-end

Leaders at all three self-driving companies portray this as a key architectural difference between their self-driving systems. Waymo argues that its hybrid system delivers faster and more accurate results. Wayve and Tesla, in contrast, emphasize the simplicity of their monolithic end-to-end architectures. They believe that their models will ultimately prevail thanks to the Bitter Lesson — the insight that the best results often come from scaling up simple architectures.

In a March interview, podcaster Sam Charrington asked Waymo’s Dragomir Anguelov about the choice to build a hybrid system.

“We’re on the practical side,” Anguelov said. “We will take the thing that works best.”

Anguelov pointed out that the phrase “end-to-end” describes a training strategy, not a model architecture. End-to-end training just means that gradients are propagated all the way through the network. As we’ve seen, Waymo’s network is end-to-end in this sense: during training, error signals propagate backward from the purple box to the yellow and blue boxes.

“You can still have modules and train things end-to-end,” Anguelov said in March. “What we’ve learned over time is that you want a few large components, if possible. It simplifies development.” However, he added, “there is no consensus yet if it should be one component.”

So far, Waymo has found that its modular approach — with three modules rather than just one — is better for commercial deployment.

Waymo co-CEO Dmitri Dolgov told me that a monolithic architecture like EMMA “makes it very easy to get started, but it’s wildly inadequate to go to full autonomy safely and at scale.”

I’ve already mentioned latency and accuracy as two major concerns. Another issue is validation. A self-driving system doesn’t just need to be safe, the company making it needs to be able to prove it’s safe with a high level of confidence. This is hard to do when the system is a black box.

Under Waymo’s hybrid architecture, the company’s engineers know what function each module is supposed to perform, which allows them to be tested and validated independently. For example, if engineers know what objects are in a scene, they can look at the output of the sensor fusion module to make sure it identifies all the objects it’s supposed to.

These architectural differences seem overrated

My suspicion is that the actual differences are smaller than either side wants to admit. It’s not true that Waymo is stuck with an outdated system based on hand-coded rules. The company makes extensive use of modern AI techniques, and its system seems perfectly capable of generalizing to new cities.

Indeed, if Waymo deleted the yellow box from its diagram, the resulting model would be very similar to those at Tesla and Wayve. Waymo supplements this transformer-based model with a sensor fusion module that’s tuned for speed and geometric precision. But if Waymo finds the sensor fusion module isn’t adding much value, it can always remove it. So it’s hard to imagine the module puts Waymo at a major disadvantage.

At the same time, I wonder if Wayve and Tesla are downplaying the modularity of their own systems for marketing purposes. Their pitch to investors is that they’re pioneering a radically different approach than incumbents like Waymo — one that’s inspired by frontier labs like OpenAI and Anthropic. Investors were so impressed by this pitch that they gave Wayve $1 billion last year, and optimism about Tesla’s self-driving project has pushed up the company’s stock price in recent years.

For example, here’s how Wayve depicts its own architecture:

At first glance, this looks like a “pure” end-to-end architecture. But look closer and you’ll notice that Wayve’s model includes a “safety expert sub-system.” What’s that? I haven’t been able to find any details on how this works or what it does. But in a 2024 blog post, Wayve wrote about its effort to train its models to have an “innate safety reflex.”

According to Wayve, the company uses simulation to “optimally enrich our Emergency Reflex subsystem’s latent representations.” Wayve added that “to supercharge our Emergency Reflex, we can incorporate additional sources of information, such as other sensor modalities.”

This sounds at least a little bit like Waymo’s sensor fusion module. I’m not going to claim that the systems are identical or even all that similar. But any self-driving company has to address the same basic problem as Waymo: that large, monolithic language models are slow, error-prone, and difficult to debug. I expect that as it gets ready to commercialize its technology, Wayve will need to supplement the core end-to-end model with additional information sources that are easier to test and validate — if it isn’t doing so already.

The best Chinese open-weight models — and the strongest US rivals

2025-12-15 23:58:12

DeepSeek’s release of R1 in January shocked the world. It came just four months after OpenAI announced its first reasoning model, o1. The model parameters were released openly. And DeepSeek R1 powered the first consumer-facing chatbot to show the full chain of thought before answering.

The effect was electric.

The DeepSeek app briefly surpassed ChatGPT as the top app in the iOS App Store. Nvidia’s stock dropped almost 20% a few days later. Chinese companies rushed to use the model in their products.

DeepSeek’s success with R1 sparked a renaissance of open-weight efforts from Chinese companies. Before R1, Meta’s Llama models were the most prominent open-weight models. Today, Qwen, from the e-commerce firm Alibaba, is the leading open model family. But it faces stiff competition from DeepSeek, Moonshot AI, Z.AI, and other (primarily Chinese) companies.

American companies have also released a number of notable open-weight models. OpenAI released open-weight models in August. IBM officially released its well-regarded Granite 4 models in October. Google, Microsoft, Nvidia, and the Allen Institute for AI have all released new open models this year — and so has the French startup Mistral. But none of these models have been as good as the top Chinese models.

With so many releases, which ones are worth paying attention to?

In this piece, I’ll cover 13 of the most significant open-weight model developers, starting with the models that deliver the most bang for the buck. For each company, I’ll list a few models worth paying attention to, using Intelligence Index scores from Artificial Analysis as a rough approximation of model quality.

A key inspiration for this article has been Nathan Lambert, a researcher at the Allen Institute for Artificial Intelligence and the author of the excellent Interconnects newsletter. Lambert is concerned about the slow progress of American open-weight models and has been trying to rally support for building a new generation of open-weight models in the US. You can read about that effort here.

1. Qwen from Alibaba

Takeaway: There’s a very good Qwen model at basically every size through 235 billion parameters. The fact that Qwen is Chinese might be the biggest barrier for the average US firm.

Models:

  • Qwen3 4B Thinking:

    • Released April 26, 2025

    • Intelligence Index: 43

  • Qwen3 VL 32B:

    • Released October 19, 2025

    • Intelligence Index: 52

  • Qwen3 Next 80B:

    • Released September 9, 2025

    • Intelligence Index: 54

  • Qwen3 235B A22B 2507:1

    • Released July 25, 2025

    • Intelligence Index: 57

The Qwen family of open-weight models is made by Alibaba, an e-commerce and cloud services tech company.

Qwen models come in many sizes. As Nathan Lambert noted in a talk at the PyTorch conference, “Qwen alone is roughly matching the entire American open model ecosystem today.”

Enterprises often need to execute a series of simple tasks as part of a larger data pipeline. Open models — especially Qwen models — tend to work well here. The company has excelled at producing small models that run on cheap hardware.

The Qwen series faces stiffer open-weight competition at the large end of the spectrum. And the largest Qwen model — Qwen3-Max — is not open-weight.

There’s a robust community around Qwen models. According to an analysis of Hugging Face data by the ATOM Project, Qwen is now the most downloaded model family in the world.

There have also been whispers of American companies adopting Qwen models. In October, Airbnb CEO Brian Chesky caused a stir by telling Bloomberg that the company is “relying a lot on Alibaba’s Qwen model” because it is fast, cheap, and performant enough.

But I spoke with several people whose organizations could not use Qwen (and other Chinese open models) for branding or compliance reasons.

This is one of the biggest barriers to Qwen’s adoption, as Lambert wrote in May:

People vastly underestimate the number of companies that cannot use Qwen and DeepSeek open models because they come from China. This includes on-premise solutions built by people who know the fact that model weights alone cannot reveal anything to their creators.

Lambert argues that many companies worry about the output of Chinese models being compromised. With current techniques, it’s impossible to rule this out without access to the training data — though Lambert believes the models are probably safe.

2. Kimi K2 from Moonshot

Takeaway: Kimi K2 Thinking is arguably the best open model in the world, but it’s difficult to run locally.

Models:

  • Kimi K2 0905 (1 trillion parameters):

    • Released September 2, 2025

    • Intelligence Index: 50

  • Kimi K2 Thinking (1T):

    • Released November 4, 2025

    • Intelligence Index: 67

Moonshot AI is a Chinese startup founded in March 2023. Kimi K2 is their flagship large language model.

Kimi K2 Thinking is arguably the best open model in the world by benchmark score. Artificial Analysis currently ranks it as the strongest model not made by OpenAI, Google, or Anthropic. Epoch’s Capabilities Index ranks it as the second best open model, and 14th overall.

Beyond the benchmarks, most reactions to Kimi have been positive.

  • Many people have praised K2’s writing abilities, both as a reasoning model and not. Rohit Krishnan, an entrepreneur and Substacker, tweeted that “Kimi K2 is remarkably good at writing, and unlike all others thinking mode hasn’t degraded its writing ability more.”

  • Nathan Lambert noted that Kimi K2 Thinking is one of the first open-weight models to be able to make long strings of tool calls, several hundred at a time. This makes agentic workflows possible.

  • On a podcast, Chamath Palihapitiya, a venture capitalist and former Facebook executive, said that his company 8090 has “directed a ton of our workloads to Kimi K2 on Groq because it was really way more performant and frankly just a ton cheaper than OpenAI and Anthropic.”

Some Reddit commenters have noted that K2 Thinking is not quite as strong at agentic coding as other open-weight models like Qwen coding models or Z.AI’s GLM, but it’s still a solid option and a stronger all-around model.

Kimi K2 also uses a lot of tokens; of all the models listed in this piece, K2 Thinking uses the second most tokens on Artificial Analysis’s benchmark suite.

And good luck trying to run this on your own computer. K2 Thinking has more than one trillion parameters, and the Hugging Face download is over 600 GB. One Redditor managed to get a quantized version running on a personal computer (with a GPU attached) at the speed of … half a token per second.

So in practice, using Kimi requires Moonshot’s API service, a third-party inference provider, or your own cluster.

3. gpt-oss from OpenAI

Takeaway: The gpt-oss models are excellent at reasoning tasks, and very fast. But they are weaker outside of pure reasoning tasks.

Models:

  • gpt-oss-20b:

    • Released August 4, 2025

    • Intelligence Index: 52 on high reasoning, 44 on low reasoning

  • gpt-oss-120b:

    • Released August 4, 2025

    • Intelligence Index: 61 on high reasoning, 48 on low reasoning

In August, OpenAI released two open-weight models, gpt-oss-120b and gpt-oss-20b. The 120b version is almost certainly the most capable American open model.

Both models are optimized for reasoning and agentic tasks — OpenAI claimed they were at “near-parity with OpenAI o4-mini on core reasoning benchmarks.” This includes math; the fourth-best-performing entry in the Kaggle competition to solve IMO-level problems — which currently has a $2.5 million prize pot — is based on gpt-oss-120b. (It’s unclear what models are used by entries one through three.)

The gpt-oss models have a generally solid reputation. “When I ask people who work in these spaces, the impression has been very positive,” Nathan Lambert noted in a recent talk on the state of open models.

They are also very fast. A Reddit commenter benchmarking various models was able to run gpt-oss-20b locally at 224 tokens per second, faster than the GPT-5.1, Gemini, or Claude APIs. And according to Artificial Analysis, some inference providers can run the 120B variant at over 3,000 tokens per second.

However, the gpt-oss models aren’t as good outside of coding, math, and reasoning. In particular, they have little factual knowledge. On SimpleQA, gpt-oss-120b only gets 16.8% right and gpt-oss-20b gets a mere 6.7% right. (Gemini 3 Pro gets 70% right, while GPT-5.1 gets 50% right). And when they get stumped by a SimpleQA question, the gpt-oss models almost always hallucinate an answer.

It’s also unclear whether OpenAI will release a follow-up to these two models. In the meantime, it’s a solid choice for reasoning and coding tasks if you need an American model you can run locally.

4. The DeepSeek models

Takeaway: DeepSeek releases strong models, particularly in math. Its most recent release, V3.2, is solid but not exceptional. Future releases might be a big deal.

Models:

  • DeepSeek R1 0528 (685 billion parameters):

    • Released May 28, 2025

    • Intelligence Index: 52

  • DeepSeek V3.2 (685B):

    • Released December 1, 2025

    • Intelligence Index: 66

  • DeepSeek V3.2 Speciale (685B):

    • Released December 1, 2025

    • Intelligence Index: 59

DeepSeek is an AI company owned by the Chinese hedge fund HighFlyer. DeepSeek explicitly aims to develop artificial general intelligence (AGI) through open models. As I mentioned in the introduction, the success of DeepSeek R1 in January inspired many open-weight efforts in China.

At the beginning of December, DeepSeek released V3.2 and V3.2 Speciale. These models have impressive benchmark numbers: Artificial Analysis rates V3.2 as the second best open model on their index, while V3.2 Speciale tops all models — open or closed — in the MathArena benchmark for final answer competitions.

Still, DeepSeek’s recent releases haven’t seemed to catch the public’s attention. Substack writer Zvi Mowshowitz summed up V3.2 as “okay and cheap but slow.” Mowshowitz noted there had not been much public adoption of the model.

It’s also probably a good idea to use DeepSeek’s products through an American provider or on your own hardware. In February, a security firm found that DeepSeek’s website was passing information to a Chinese state-owned company. (It’s unclear whether this is still happening.)

Regardless of how V3.2 fares, DeepSeek will remain a lab to watch. Their next major model release (rumored to be in February 2026) might be a big deal.

5. Olmo 3 from the Allen Institute for AI

Takeaway: Olmo 3 models are open-source, not just open-weight. Their performance isn’t too far behind Qwen models.

Models:

  • Olmo 3 7B:

    • Released November 20, 2025

    • Intelligence Index: 32 for thinking, 22 for instruct

  • Olmo 3.1 32B Think:

    • Released December 12, 2025

    • Intelligence Index: Not yet benchmarked, but Olmo 3 32B Think was 36

The Allen Institute for Artificial Intelligence (Ai2) is a nonprofit research institute founded in 2014. The Olmo series is one of Ai2’s main research products.

Olmo 3, released in November, is probably the best open-source model in the world. Every other developer I discuss here (except Nvidia) releases only open-weight models, where the final set of model parameters is available but not the code and data used for training. Ai2 not only released training code and data, but also several model checkpoints from midway through the training process.

This lets researchers learn from Olmo’s development process, take advantage of the open datasets, and use the Olmo models in their experiments.

Olmo’s openness can also be helpful for enterprise. One of the project’s co-leaders, Hanna Hajishirzi, told me that having several Olmo checkpoints gives companies more flexibility.

Companies can train an earlier Olmo 3 checkpoint to ensure that the model ends up effectively learning from the data for their use case. Hajishirzi said that she hears a lot of people say fine-tuning doesn’t work, but that’s because they’re only training on the “final snapshot of the model.”

For instance, if a model has already gone through reinforcement learning to improve its coding skills, it may have weaker capabilities elsewhere. So if a company wants to fine-tune a model to be good at a non-coding skill — like giving feedback on writing — they are better off choosing an earlier checkpoint.

Still, out of the box, the Olmo 3 models perform a little worse than the best open-weight models of their size, which are the Qwen models. And they’re certainly much weaker than large open-weight models like Kimi K2 Thinking.

In any event, the Allen Institute is an organization to watch. It recently received a $150 million grant from the National Science Foundation and Nvidia to create open models for scientists.

6. GLM 4.6 from Z.AI

Takeaway: The GLM 4.6 models are solid, particularly for coding.

Models:

  • GLM 4.6V-Flash (10B):

    • Released December 8, 2025

    • Intelligence Index score has not been released

  • GLM 4.6 (357B):

    • Released September 29, 2025

    • Intelligence Index: 56

  • GLM 4.6V (108B):

    • Released December 8, 2025

    • Intelligence Index score has not been released

Z.AI (formerly Zhipu AI) is a Chinese AI startup founded in 2019. In addition to its flagship GLM series of LLMs, it also releases some non-text models.

Unlike many of the other startups that form the Six Chinese Tigers, Z.AI was popular in China even before DeepSeek came to prominence. Z.AI released the first version of GLM all the way back in 2021, albeit not as an open model. A market survey in mid-2024 found that Z.AI was the third most popular enterprise LLM provider in China, after Alibaba and SenseTime.

But until recently, Z.AI struggled to attract attention outside of China. Now that is starting to change after two strong releases: GLM 4.5 in July and GLM 4.6 in late September. In November, the South China Morning Post reported that Z.AI had 100,000 users of its API, a “tenfold increase over two months,” and over 3 million chatbot users.

GLM 4.6 is probably not as strong as the best Qwen models, nor Kimi, but it is a very solid option, particularly for coding.

Z.AI may not continue releasing open-weight models if its market position changes.

The company’s product director, Zixuan Li, recently told ChinaTalk that “as a Chinese company, we need to really be open to get accepted by some companies because people will not use your API to try your models.” Z.AI is a startup without a strong pre-existing brand or capital in reserve. Gaining adoption is crucial to the company’s survival. Releasing a model’s weights allows enterprises to try it without having to worry about the data security challenges of using a Chinese API.

Z.AI “only gets maybe 5% or 10% of all the services related to GLM,” according to Li. For now, that’s enough revenue for the company. But if its economic incentives change, Z.AI might go back to releasing closed models.

7. Nemotron from Nvidia

Takeaway: Nvidia is an underrated open-weight developer. The Nemotron models are solid and look to be expanded soon.

Read more

Google and Anthropic approach LLMs differently

2025-12-05 02:49:41

On Monday, OpenAI CEO Sam Altman declared a “code red” in the face of rising competition.

The biggest threat was Google; monthly active users for Google’s Gemini chatbot grew from 450 million in July to 650 million in November (ChatGPT had 800 million weekly active users in October). Meanwhile, the Wall Street Journal reports, “OpenAI is also facing pressure from Anthropic, which is becoming popular among business customers.”

Google ratcheted up the pressure on OpenAI two weeks ago with the release of Gemini 3 models, which set new records on a number of benchmarks. The next week, Anthropic released Claude Opus 4.5, which achieved even higher scores on some of the same benchmarks.

Over the last two weeks, I’ve been trying to figure out the best way to cover these new releases. I used to subject each new model to a battery of bespoke benchmarks and write about the results. But recent models have gotten good enough to easily solve most of these problems. They do still fail on a few simple tasks (like telling time on an analog clock) but I fear those examples are increasingly unrepresentative of real-world usage.

In the future, I hope to write more about the performance of these new Google and Anthropic models. But for now, I want to offer a more qualitative analysis of these models. Or rather, I want to highlight two pieces that illustrate the very different cultures at Google and Anthropic — cultures that have led them to take dramatically different approaches to model building.

Engineering excellence at Google

Jeff Dean, a legendary engineer who has worked at Google since 1999, has led a number of AI projects inside the company. (Photo by THOMAS SAMSON/AFP via Getty Images)

Last week the newsletter Semianalysis published a deep dive on the success of tensor processor units (TPUs), Google’s alternative to Nvidia GPUs. “Gemini 3 is one of the best models in the world and was trained entirely on TPUs,” the Semianalysis authors wrote. Notably, Claude Opus 4.5 was also trained on TPUs.

Google has employed TPUs for its own AI needs for a decade. But recently Google has made a serious effort to sell TPUs to other companies. The Semianalysis team argues that Google is “​​the newest and most threatening merchant silicon challenger to Nvidia.”

In October, Anthropic signed a deal to use up to one million TPUs. In addition to purchasing cloud services from Google, Semianalysis reported, “Anthropic will deploy TPUs in its own facilities, positioning Google to compete directly with Nvidia.”

Recent generations of the TPU were respectable chips, but Semianalysis argues Google’s real strength is the overall system architecture. Modern AI training runs require thousands of chips wired together for rapid communication. Google has designed racks and networking systems that squeeze maximum performance out of every chip.

This is one example of a broader principle: Google is fundamentally an engineering-oriented company, and it has approached large language models as an engineering problem.1 Engineers have worked hard to train the largest possible models at the lowest possible cost.

For example, Gemini 2.5 Flash-Lite costs 10 cents for a million input tokens. Anthropic’s cheapest model, Claude Haiku 4.5, costs 10 times as much. Google was also the first company to release an LLM with a million-token context window.

Another place Google’s engineering prowess has paid off is in pretraining. Google released this chart showing Gemini 3 crushing other models at SimpleQA, a benchmark that measures a model’s ability to recall obscure facts.

As a perceptive Reddit commenter points out, this likely reflects Google’s ability to deploy computing hardware on a large scale.

“My read is that Gemini 3 Pro’s gains in SimpleQA show that it’s a massive model, absolutely huge, with tons of parametric knowledge,” wrote jakegh. “Google uses its own TPU hardware to not only infer but also train so they can afford to do it.”

So Gemini 3 continues the Google tradition of building solid, affordable models. Public reaction to the new model has been broadly positive; the model seems to perform as well in real-world applications as it does on benchmarks.

The new model doesn’t seem to have much personality, but this may not matter. Billions of people already use Google products, so Google may be able to win the AI race simply by adding a good-but-not-amazing model like Gemini 3 to products like search, Gmail, and the Google Workspace suite.

Anthropic: thinking deeply about models

Philosopher Amanda Askell described her work at Anthropic in a recent 60 Minutes interview.

Last week’s release of Claude Opus 4.5 also got a positive reception, but the vibes were different.

Read more

Help some of the poorest people in Rwanda

2025-12-02 20:06:13

It’s Giving Tuesday, and Matt Yglesias—my former colleague at Vox and now author of the excellent newsletter Slow Boring—has organized a consortium of Substack writers to raise money for GiveDirectly. This non-profit organization does exactly what it sounds like: give cash directly to poor people in low-income countries.

This year the group is aiming to raise at least $1 million to help people in rural Rwanda. I’m hoping that Understanding AI readers will contribute at least $20,000 to that total—please use this special link if you’d like to be counted as an Understanding AI reader.

If donations from readers total at least $20,000, my wife and I will donate an additional $10,000.

There are lots of charities out there that try to help poor people in various ways, such as delivering food, building infrastructure, or providing education and health care. Such efforts are praiseworthy, but it can sometimes be difficult to tell how much good they are doing — or whether it would be better to spend the money on something else.

The insight of GiveDirectly is that we can just give cash directly to people in need and let them decide how to spend it. Here’s how Matt describes it:

The organization works in low-income countries, including Kenya, Malawi, Mozambique, Rwanda, and Uganda, to identify villages where a large majority of the population is very poor by global standards. They then enroll the entire population of the village in the program, using mobile banking to transfer approximately $1,100 to each household in town.

This transfer boosts recipients’ short-term living standards, minimizes logistical complications and perverse incentives, and, optimistically, is a kind of shot in the arm to the local economy. After all, one problem with being desperately poor and also surrounded by other desperately poor people is that even when you have useful goods or services to sell, no one can afford to buy them.

Some recipients spend the money on immediate needs like food or medicine. Others use the money in ways that have longer-term benefits, such as buying equipment, starting a business, or sending children to school. In either case, the money will go a lot farther in a Rwanda than it would here in a rich country like the United States.

Understanding AI has more than 100,000 readers, and 98 percent of you are on the “free” list. If you’ve found my newsletter useful, a donation to GiveDirectly would be a great way to say thanks.

Six reasons to think there’s an AI bubble — and six reasons not to

2025-11-25 21:03:31

I’m excited to publish this post co-authored with one of my favorite writers, Derek Thompson. Derek recently left the Atlantic to launch his own Substack covering business, technology, science, and politics. It’s one of the few newsletters I read as soon as it hits my inbox, and I bet a lot of Understanding AI readers would enjoy it.


In the last few weeks, something’s troubled and fascinated us about the national debate over whether artificial intelligence is a bubble. Everywhere we look and listen, experts are citing the same small number of statistics, factoids, and studies. The debate is like a board game with a tiny number of usable pieces. For example:

  • Talk to AI bears, and they’ll tell you how much Big Tech is spending.

  • Talk to AI bulls, and they’ll tell you how much Big Tech is making.

  • Talk to AGI believers, and they’ll quote a study on “task length” by an organization called METR.

  • Talk to AGI skeptics, and they’ll quote another study on productivity, also by METR.

Last week, we were discussing how one could capture the entire AI-bubble debate in about 12 statistics that people just keep citing and reciting — on CNBC, on tech podcasts, in Goldman Sachs Research documents, and at San Francisco AI parties. Since everybody seems to be reading and quoting from the same skinny playbook, we thought: What the hell, let’s just publish the whole playbook!

If you read this article, we think you’ll be prepared for just about every conversation about AI, whether you find yourself at a Bay Area gathering with accelerationists or a Thanksgiving debate with Luddite cousins. We think some of these arguments are compelling. We think others are less persuasive. So, throughout the article, we’ll explain both why each argument belongs in the discussion and why some arguments don’t prove as much as they claim. Read to the end, and you’ll see where each of us comes down on the debate.

Let’s start with the six strongest arguments that there is an AI bubble.

All about the Benjamins

When they say: Prove to me that AI is a bubble

You say: For starters, this level of spending is insane

When America builds big infrastructure projects, we often over-build. Nineteenth-century railroads? Overbuilt, bubble. Twentieth-century Internet? Overbuilt, bubble. It’s really nothing against AI specifically to suggest that every time US companies get this excited about a big new thing, they get too excited, and their exuberance creates a bubble.

Five of the largest technology giants — Amazon, Meta, Microsoft, Alphabet, and Oracle — had $106 billion in capital expenditures in the most recent quarter. That works out to almost 1.4% of gross domestic product, putting it on par with some of the largest infrastructure investments in American history.

This chart was originally created by Understanding AI’s Kai Williams, who noted, “not all tech capex is spent on data centers, and not all data centers are dedicated to AI. The spending shown in this chart includes all the equipment and infrastructure a company buys. For instance, Amazon also needs to pay for new warehouses to ship packages.”

Still, AI accounts for a very large share of this spending. Amazon’s CEO, for example, said last year that AI accounted for “the vast majority” of Amazon’s recent capex. And notice that the last big boom on the chart — the broadband investment boom of the late 1990s — ended with a crash. AI investments are now large enough that a sudden slowdown would have serious macroeconomic consequences.

Money for nothing

When they say: But this isn’t like the dot-com bubble, because these companies are for real

You say: I’m not so sure about that…

“It feels like there’s obviously a bubble in the private markets,” said Demis Hassabis, the CEO of Google DeepMind. “You look at seed rounds with just nothing being [worth] tens of billions of dollars. That seems a little unsustainable. It’s not quite logical to me.”

The canonical example of zillions of dollars for zilch in product has been Thinking Machines, the AI startup led by former OpenAI executive Mira Murati. This summer, Thinking Labs raised $2 billion, the largest seed round in corporate history, before releasing a product. According to a September report in The Information, the firm declined to tell investors or the public what they were even working on.

“It was the most absurd pitch meeting,” one investor who met with Murati said. “She was like, ‘So we’re doing an AI company with the best AI people, but we can’t answer any questions.’”

In October, the company launched a programming interface called Tinker. I guess that’s something. Or, at least, it better be something quite spectacular, because just days later, the firm announced that Murati was in talks with investors to raise another $5 billion. This would raise the value of the company to $50 billion—more than the market caps of Target or Ford.

When enterprises that barely have products are raising money at valuations rivaling 100-year-old multinational firms, it makes us wonder if something weird is going on.

Reality check

When they say: Well, AI is making me more productive

You say: You might be deluding yourself

One of the hottest applications of AI right now is programming. Over the last 18 months, millions of programmers have started using agentic AI coding tools such as Cursor, Anthropic’s Claude Code, and OpenAI’s Codex, which are capable of performing routine programming tasks. Many programmers have found that these tools make them dramatically more productive at their jobs.

But a July study from the research organization METR called that into question. They asked 16 programmers to tackle 246 distinct tasks. Programmers estimated how long it would take to complete each task. Then they were randomly assigned to use AI, or not, on a task-by-task basis.

On average, the developers believed that AI would allow them to complete their tasks 24% faster with the help of AI. Even after the fact, developers who used AI thought it had sped them up by 20%. But programmers who used AI took 19% longer, on average, than programmers who didn’t.

We were both surprised by this result when it first came out, and we consider it one of the strongest data points in favor of AI skepticism. While many people believe that AI has made them more productive at their jobs — including both of us — it’s possible that we’re all deluding ourselves. Maybe that will become more obvious over the next year or two and the hype around AI will dissipate.

But it’s also possible that programmers are just in the early stages of the learning process for AI coding tools. AI tools probably speed up programmers on some tasks and slow them down on others. Over time, programmers may get better at predicting which tasks fall into which category. Or perhaps the tools themselves will get better over time — AI coding tools have improved dramatically over the last year.

It’s also possible that the METR results simply aren’t representative of the software industry as a whole. For example, a November study examined 32 organizations that started to use Cursor’s coding agent in the fall of 2024. It found that programmer productivity increased by 26% to 39% as a result.

Infinite money glitch

When they say: But AI is clearly growing the overall economy

You say: Maybe the whole thing is a trillion-dollar ouroboros

Imagine Tim makes some lemonade. He loans Derek $10 to buy a drink. Derek buys Tim’s lemonade for $10. Can we really say that Tim has “earned $10” in this scenario? Maybe no: If Derek goes away, all Tim has done is move money from his left pocket to his right pocket. But maybe yes: If Derek loves the lemonade and keeps buying more every day, then Tim’s bet has paid off handsomely.

Artificial intelligence is more complicated than lemonade. But some analysts are worried that the circular financing scheme we described above is also happening in AI. In September, Nvidia announced it would invest “up to” $100 billion in OpenAI to support the construction of up to 10 gigawatts of data center capacity. In exchange, OpenAI agreed to use Nvidia’s chips for the buildout. The next day, OpenAI announced five new locations to be built by Oracle in a new partnership whose value reportedly exceeds $300 billion. The industry analyst Dylan Patel called this financial circuitry an “infinite money glitch.”

Bloomberg made this chart depicting the complex web of transactions among leading AI companies. This kind of thing sets off alarm bells for people who remember how financial shenanigans contributed to the 2008 financial crisis.

The fear is two-fold: first, that tech companies are shifting money around in a way that creates the appearance of new revenue that hasn’t actually materialized; and second, that if any part of this financial ouroboros breaks, everybody is going down.

In the last few months, OpenAI has announced four deals: with Nvidia, Oracle, and the chipmakers AMD and Broadcom. All four companies saw their market values jump by tens of billions of dollars the day their deals were announced. But, by that same logic, any wobble for OpenAI or Nvidia could reverberate throughout the AI ecosystem.

Something similar happened during the original dot-com bubble. The investor Paul Graham sold a company to Yahoo in 1998, so he had a front-row seat to the mania:

By 1998, Yahoo was the beneficiary of a de facto Ponzi scheme. Investors were excited about the Internet. One reason they were excited was Yahoo’s revenue growth. So they invested in new Internet startups. The startups then used the money to buy ads on Yahoo to get traffic. Which caused yet more revenue growth for Yahoo, and further convinced investors the Internet was worth investing in. When I realized this one day, sitting in my cubicle, I jumped up like Archimedes in his bathtub, except instead of “Eureka!” I was shouting “Sell!”

Are we seeing a similar dynamic with the data center boom? It doesn’t seem like a crazy theory.

Pay no attention to the man behind the curtain

When they say: The hyperscalers are smart companies and don’t need bubbles to grow

You say: So why are they resorting to financial trickery?

Some skeptics argue that big tech companies are concealing the actual cost of the AI buildout.

First, they’re shifting AI spending off their corporate balance sheets. Instead of paying for data centers themselves, they’re teaming up with private capital firms to create joint ventures known as special purpose vehicles (or SPVs). These entities build the facilities and buy the chips, while the spending sits somewhere other than the tech company’s books. This summer, Meta reportedly sought to raise about $29 billion from private credit firms for new AI data centers structured through such SPVs.

Meta isn’t alone. CoreWeave, the fast-growing AI cloud company, has also turned to private credit to fund its expansion through SPVs. These entities transfer risk off the balance sheets of Silicon Valley companies and onto the balance sheets of private-capital limited partners, including pension funds and insurance companies. If the AI bubble bursts, it won’t be just tech shareholders who feel the pain. It will be retirees and insurance policyholders.

To be fair, it’s not clear that anything shady is happening here. Tech companies have plenty of AI infrastructure on their own balance sheets, and they’ve been bragging about that spending in earnings calls, not downplaying it. So it’s not obvious that they are using SPVs in an effort to mislead people.

Second, skeptics argue that tech companies are underplaying the depreciation risk of the hardware that powers AI. Earlier waves of American infrastructure left us with infrastructure that held its value for decades: power lines from the 1940s, freeways from the 1960s, fiber optic cables from the 1990s. By contrast, the best GPUs are overtaken by superior models every few years. The hyperscalers spread their cost over five or six years through an accounting process called depreciation. But if they have to buy a new set of top-end chips every two years, they’ll eventually blow a hole in their profitability.

We don’t dismiss this fear. But the danger is easily exaggerated. Consider the A100 chip, which helped train GPT-4 in 2022. The first A100s were sold in 2020, which makes the oldest units about five years old. Yet they’re still widely used. “In a compute-constrained world, there is still ample demand for running A100s,” Bernstein analyst Stacy Rasgon recently wrote. Major cloud vendors continue to offer A100 capacity, and customers continue to buy it.

Of course, there’s no guarantee that today’s chips will be as durable. If AI demand cools, we could see a glut of hardware and early retirement of older chips. But based on what we know today, it’s reasonable to assume that a GPU purchased now will still be useful five years from now.

A changing debt picture

When they say: The hyperscalers are well-run companies that won’t use irresponsible leverage

You say: That might be changing

A common way for a bubble to end is with too much debt and too little revenue. Most of the Big Tech companies building AI infrastructure — including Google, Microsoft, and Meta — haven’t needed to take on much debt because they can fund the investments with profit. Oracle has been a notable exception to this trend, and some people consider it the canary in the coal mine.

Oracle recently borrowed $18 billion for data center construction, pushing the company’s total debt above $100 billion. The Wall Street Journal reports that “the company’s adjusted debt, a measure that includes what it owes on leases in addition to what it owes creditors, is forecast to more than double to roughly $300 billion by 2028, according to credit analysts at Morgan Stanley.”

At the same time, it’s not obvious that Oracle is going to make a lot of money from this aggressive expansion. There’s plenty of demand: in its most recent earnings call, Oracle said that it had $455 billion in contracted future revenue — a more than four-fold increase over the previous year. But The Information reports that in the most recent quarter, Oracle earned $125 million on $900 million worth of revenue from renting out data centers powered by Nvidia GPUs. That works out to a 14% profit margin. That’s a modest profit margin in a normal business, and it’s especially modest in a highly volatile industry like this one. It’s much smaller than the roughly 70% gross margin Oracle gets on more established services.

The worry for AI skeptics is that customer demand for GPUs could cool off as quickly as it heated up. In theory, that $455 billion figure represents firm customer commitments to purchase future computing services. But if there’s an industry-wide downturn, some customers might try to renegotiate the terms of these contracts. Others might simply go out of business. And that could leave Oracle with a lot of debt, a lot of idle GPUs, and not enough revenue to pay for it all.

And now, the very best arguments against an AI bubble

Read more

An AI “tsunami” is coming for Hollywood — here’s how artists are responding

2025-11-20 04:00:21

For a forthcoming piece I’m looking to talk to people using and building with open-weight models, whether that’s in startups, enterprises or other organizations. I’m happy to talk to people off the record. I’ve opened up some slots on my calendar tomorrow and Friday. If you’re willing to talk to me, please click here to grab a time.


In 2016, the legendary Japanese filmmaker Hayao Miyazaki was shown a bizarre AI-generated video of a misshapen human body crawling across a floor.

Miyazaki declared himself “utterly disgusted” by the technology demo, which he considered an “insult to life itself.”

“If you really want to make creepy stuff, you can go ahead and do it,” Miyazaki said. “I would never wish to incorporate this technology into my work at all.”

Many fans interpreted Miyazaki’s remarks as rejecting AI-generated video in general. So they didn’t like it when, in October 2024, filmmaker PJ Accetturo used AI tools to create a fake trailer for a live-action version of Miyazaki’s animated classic “Princess Mononoke.” The trailer earned him 22 million views on X. It also earned him hundreds of insults and death threats.

“Go generate a bridge and jump off of it,” said one of the funnier retorts. Another urged Accetturo to “throw your computer in a river and beg God’s forgiveness.”

Someone tweeted that Miyazaki “should be allowed to legally hunt and kill this man for sport.”

PJ Accetturo is a director and founder of Genre AI, an AI ad agency. (Photo courtesy of PJ Accetturo)

The development of AI image and video generation models has been controversial, to say the least. Artists have accused AI companies of stealing their work to build tools that put humans out of a job. Using AI tools openly is stigmatized in many circles, as Accetturo learned the hard way.

But as these models have improved, they have sped up workflows and afforded new opportunities for artistic expression. Artists without AI expertise might soon find themselves losing work.

Over the last few weeks, I’ve spoken to nine actors, directors, and creators about how they are navigating these tricky waters. Here’s what they told me.

Subscribe now

The backlash to AI video generation, explained

Actors have emerged as a powerful force against AI. In 2023, SAG-AFTRA, the Hollywood actors’ union, had its longest-ever strike, partly to establish more protections for actors against AI replicas.

Actors have lobbied to regulate AI in their industry and beyond. One actor I talked with, Erik Passoja, has testified before the California legislature in favor of several bills, including for greater protections against pornographic deepfakes. SAG-AFTRA endorsed SB 1047, an AI safety bill regulating frontier models. The union also organized against the proposed moratorium on state AI bills.

A recent flashpoint came in September, when Deadline Hollywood reported that talent agencies were interested in signing “AI actress” Tilly Norwood.

Actors weren’t happy. Emily Blunt told Variety, “This is really, really scary. Come on agencies, don’t do that.”

Natasha Lyonne, star of “Russian Doll,” posted on an Instagram Story: “Any talent agency that engages in this should be boycotted by all guilds. Deeply misguided & totally disturbed.”

The backlash was partly specific to Tilly Norwood — Lyonne is no AI skeptic, having cofounded an AI studio — but it also reflects a set of concerns around AI common to many in Hollywood and beyond.

Here’s how SAG-AFTRA explained its position:

Tilly Norwood is not an actor, it’s a character generated by a computer program that was trained on the work of countless professional performers — without permission or compensation. It has no life experience to draw from, no emotion and, from what we’ve seen, audiences aren’t interested in watching computer-generated content untethered from the human experience. It doesn’t solve any “problem” — it creates the problem of using stolen performances to put actors out of work, jeopardizing performer livelihoods and devaluing human artistry.

This statement reflects three broad criticisms that come up over and over in discussions of AI art:

  • Content theft: Most of the leading AI video models have been trained on broad swathes of the Internet, including images and films made by artists. In many cases, companies have not asked artists for permission to use this content, nor compensated them. Courts are still working out whether this is fair use under copyright law. But many people I talked to consider AI companies’ training efforts to be theft of artists’ work.

  • Job loss: If AI tools can make passable video quickly or drastically speed up editing tasks, that potentially takes jobs away from actors or film editors. While past technological advancements have also eliminated jobs — the adoption of digital cameras drastically reduced the number of people cutting physical film — AI could have an even broader impact.

  • Artistic quality: A lot of people told me they just didn’t think AI-generated content could ever be good art. Tess Dinerstein stars in vertical dramas — episodic programs optimized for viewing on smartphones. She told me that AI is “missing that sort of human connection that you have when you go to a movie theater and you’re sobbing your eyes out because your favorite actor is talking about their dead mom.”

The concern about theft is potentially solvable by changing how models are trained. Around the time Accetturo released the “Princess Mononoke” trailer, he called for generative AI tools to be “ethically trained on licensed datasets.”

Some companies have moved in this direction. For instance, independent filmmaker Gille Klabin told me he “feels pretty good” using Adobe products because the company trains its AI models on stock images that it pays royalties for.

But the other two issues — job losses and artistic integrity — will be harder to finesse. Many creators — and fans — believe that AI-generated content misses the fundamental point of art, which is about creating an emotional connection between creators and viewers.

But while that point is compelling in theory, the details can be tricky.

Dinerstein, the vertical drama actress, told me that she’s “not fundamentally against AI” — she admits “it provides a lot of resources to filmmakers” in specialized editing tasks — but she takes a hard stance against it on social media.

“It’s hard to ever explain gray areas on social media,” she said, and she doesn’t want to “come off as hypocritical.”

Even though she doesn’t think that AI poses a risk to her job — “people want to see what I’m up to” — she does fear people (both fans and vertical drama studios) making an AI representation of her without her permission. And she has found it easiest to just say “You know what? Don’t involve me in AI.”

Others see it as a much broader issue. Actress Susan Spano told me it was “an issue for humans, not just actors.”

“This is a world of humans and animals,” she said. “Interaction with humans is what makes it fun. I mean, do we want a world of robots?”

How one director leaned into AI

It’s relatively easy for actors to take a firm stance against AI because they inherently do their work in the physical world. But things are more complicated for other Hollywood creatives, such as directors, writers, and film editors. AI tools can genuinely make them more productive, and they’re at risk of losing work if they don’t stay on the cutting edge.

So the non-actors I talked to took a range of approaches to AI. Some still reject it. Others have used the tools reluctantly and tried to keep their heads down. Still others have openly embraced the technology.

Kavan Cardoza is a director and AI filmmaker. (Photo courtesy of Phantom X)

Take Kavan Cardoza, for example. He worked as a music video director and photographer for close to a decade before getting his break into filmmaking with AI.

After the image model Midjourney was first released in 2022, Cardoza started playing around with image generation, and later video generation. Eventually, he “started making a bunch of fake movie trailers” for existing movies and franchises. In December 2024, he made a fan film in the Batman universe that “exploded on the Internet,” before Warner Brothers took it down for copyright infringement.

Cardoza acknowledges that he recreated actors in former Batman movies “without their permission.” But he insists he wasn’t “trying to be malicious or whatever. It was truly just a fan film.”

Whereas Accetturo received death threats, the response to Cardoza’s fan film was quite positive.

“Every other major studio started contacting me,” Cardoza said. He set up an AI studio, Phantom X, with several of his close friends. Phantom X started by making ads (where AI video is catching on quickest), but Cardoza wanted to focus back on films.

In June, Cardoza made a short film called “Echo Hunter,” a blend of “Blade Runner” and “The Matrix.” Some shots look clearly AI-generated, but Cardoza used motion capture technology from Runway to put the faces of real actors into his AI-generated world. Overall, the piece pretty much hangs together.

Cardoza wanted to work with real actors because their artistic choices can help elevate the script he’s written: “there’s a lot more levels of creativity to it.” But he needed SAG-AFTRA’s approval to make a film that blends AI techniques with the likenesses of SAG-AFTRA actors. To get it, he had to promise not to re-use the actors’ likenesses in other films.

Subscribe now

“It’s never about if, it’s just when”

In Cardoza’s view, AI is “giving voices to creators that otherwise never would have had the voice.”

But Cardoza isn’t wedded to AI. When an interviewer asked him whether he’d make a non-AI film if required to, he responded “Oh 100%.” Cardoza added that if he had the budget to do it now, “I’d probably still shoot it all live action.”

He acknowledged to me that there will be losers in the transition — “there’s always going to be changes” — but he compares the rise of AI with past technological developments in filmmaking — like the rise of visual effects. This created new jobs making visual effects digitally, but reduced jobs making elaborate physical sets.

Cardoza expressed interest in reducing the amount of job loss. In another interview, Cardoza said that for his film project, “we want to make sure we include as many people as possible,” not just actors, but sound designers and script editors and other specialized roles.

But he believes that eventually, AI will get good enough to do everyone’s job. “Like I say with tech, it’s never about if, it’s just when.”

Accetturo’s entry into AI was similar. He told me that he worked for 15 years as a filmmaker, “mostly as a commercial director and former documentary director.” During the pandemic, he “raised millions” for an animated TV series, but it got caught up in development hell.

AI gave him a new chance at success. Over the summer of 2024, he started playing around with AI video tools. He realized that he was in the sweet spot to take advantage of AI: experienced enough to make something good, but not so established that he was risking his reputation. After Google released Veo 3 in May, Accetturo released a fake medicine ad that went viral. His studio now produces ads for prominent companies like Oracle and Popeyes.

Accetturo says the backlash against him has subsided: “it truly is nothing compared to what it was.” And he says he’s committed to working on AI: “everyone understands that it’s the future.”

“Adapt like cockroaches”

Between the anti- and pro-AI extremes, there’s a lot of editors and artists quietly using AI tools without disclosing it. Unsurprisingly, it’s difficult to find people who will speak about this on the record.

“A lot of people want plausible deniability right now,” according to Ryan Hayden, a Hollywood talent agent. “There is backlash about it.”

But if editors don’t use AI tools, they risk becoming obsolete. Hayden says that he knows a lot of people in the editing field trying to master AI because “there’s gonna be a massive cut” in the total number of editors. Those that know AI might survive.

As one comedy writer involved in an AI project told Wired, “We wanted to be at the table and not on the menu.”

Clandestine AI usage extends into the upper reaches of the industry. Hayden knows an editor who works with a major director who has directed $100 million films. “He’s already using AI, sometimes without people knowing.”

Some artists feel morally conflicted, but don’t think they can effectively resist. Vinny Dellay, a storyboard artist who’s worked on Marvel films and Super Bowl ads, released a video detailing his views on the ethics of using AI as a working artist. Dellay said that he agrees that “AI being trained off of art found on the Internet without getting permission from the artist, it may not be fair, it may not be honest.” But refusing to use AI products won’t stop their general adoption. Believing otherwise is “just being delusional.”

Instead, Dellay said that the right course is to “adapt like cockroaches after a nuclear war.” If they’re lucky, using AI in storyboarding workflows might even “let a storyboard artist pump out twice the boards in half the time without questioning all your life’s choices at 3:00 AM.”

Lines, moral and practical

Gille Klabin is an independent writer, director, and visual effects artist. (Photo by David Solorzano, courtesy of Gille Klabin)

Gille Klabin is an indie director and filmmaker currently working on a feature called “Weekend At The End Of The World.”

As an independent filmmaker, Klabin can’t afford to hire many people. There are many labor-intensive tasks — like making a pitch deck for his film — that he’d otherwise have to do himself. An AI tool “essentially just liberates us to get more done and have more time back in our life.”

But he’s careful to stick to his own moral lines. Any time he mentioned using an AI tool during our interview, he’d explain why he thought that was an appropriate choice. He said he was fine with AI use “as long as you’re using it ethically in the sense that you’re not copying somebody’s work and using it for your own.”

Drawing these lines can be difficult, however. Hayden, the talent agent, told me that as AI tools make low-budget films look better, it gets harder to make high-budget films, which employ the most people at the highest wage levels.

If anything, Klabin’s AI uptake is limited more by the current capabilities of AI models. Klabin is an experienced visual effects artist, and he finds AI products to generally be “not really good enough to be used in a final project.”

He gave me a concrete example. Rotoscoping is a process where you trace out the subject of the shot so you can edit the background independently. It’s very labor-intensive — one has to edit every frame individually — so Klabin has tried using Runway’s AI-driven rotoscoping. While it can make for a decent first pass, the result is just too messy to use as a final project.

Klabin sent me this GIF of a series of rotoscoped frames from his upcoming movie. While the model does a decent job of identifying the people in the frame, its boundaries aren’t consistent from frame to frame. The result is noisy.

AI-rotoscoped frames from “Weekend At The End Of The World.” (Courtesy of Gille Klabin)

Current AI tools are full of these small glitches, so Klabin only uses them for tasks that audiences don’t see (like creating a movie pitch deck) or in contexts where he can clean up the result afterwards.

The power of authenticity

Stephen Robles reviews Apple products on YouTube and other platforms. He uses AI in some parts of the editing process, such as removing silences or transcribing audio, but doesn’t see it as disruptive to his career.

Stephen Robles is a YouTuber, podcaster, and creator covering tech, particularly Apple. (Photo courtesy of Stephen Robles)

“I am betting on the audience wanting to trust creators, wanting to see authenticity,” he told me. AI video tools don’t really help him with that and can’t replace the reputation he’s sought to build.

Recently, he experimented with using ChatGPT to edit a video thumbnail (the image used to advertise a video). He got a couple of negative reactions about his use of AI, so he said he “might slow down a little bit” with that experimentation.

Robles didn’t seem as concerned about AI models stealing from creators like him. When I asked him about how he felt about Google training on his data, he told me that “YouTube provides me enough benefit that I don’t think too much about that.”

Professional thumbnail artist Antioch Hwang has a similarly pragmatic view towards using AI. Some channels he works with have audiences that are “very sensitive to AI images.” Even using “an AI upscaler to fix up the edges” can provoke strong negative reactions. For those channels, he’s “very wary” about using AI.

Antioch Hwang is a YouTube thumbnail artist. (Photo courtesy of Antioch Creative)

But for most channels he works for, he’s fine using AI, at least for technical tasks. “I think there’s now been a big shift in the public perception of these AI image generation tools,” he told me. “People are now welcoming them into their workflow.”

He’s still careful with his AI use, though, because he thinks that having human artistry helps in the YouTube ecosystem. “If everyone has all the [AI] tools, then how do you really stand out?” he said.

Recently, top creators have started using more rough-looking thumbnails for their videos. AI has made polished thumbnails too easy to create, so top creators are using what Hwang would call “poorly made thumbnails” to help videos stand out.

Subscribe now

Exit strategies

Hwang told me something surprising: even as AI makes it easier for creators to make thumbnails themselves, business has never been better for thumbnail artists, even at the lower end. He said that demand has soared because “AI as a whole has lowered the barriers for content creation, and now there’s more creators flooding in.”

Still, Hwang doesn’t expect the good times to last forever. “I don’t see AI completely taking over for the next three-ish years. That’s my estimated timeline.”

Everyone I talked to had different answers to when — if ever — AI would meaningfully disrupt their part of the industry.

Some, like Hwang, were pessimistic. Actor Erik Passoja told me he thought the big movie studios — like Warner Brothers or Paramount — would be gone in three to five years.

But others were more optimistic. Tess Dinerstein, the vertical drama actor, said that “I don’t think that verticals are ever going to go fully AI.” Even if it becomes technologically feasible, she argued, “that just doesn’t seem to be what the people want.”

Gille Klabin, the independent filmmaker, thought there would always be a place for high-quality human films. If someone’s work is “fundamentally derivative,” then they are at risk. But he thinks the best human-created work will still stand out. “I don’t know how AI could possibly replace the borderline divine element of consciousness,” he said.

The people who were most bullish on AI were, if anything, the least optimistic about their own career prospects. “I think at a certain point it won’t matter,” Kavan Cardoza told me. “It’ll be that anyone on the planet can just type in some sentences” to generate full, high-quality videos.

This might explain why Accetturo has become something of an AI evangelist; his newsletter tries to teach other filmmakers how to adapt to the coming AI revolution.

AI “is a tsunami that is gonna wipe out everyone” he told me. “So I’m handing out surfboards — teaching people how to surf. Do with it what you will.”