MoreRSS

site iconExponential ViewModify

By Azeem Azhar, an expert on artificial intelligence and exponential technologies.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Exponential View

🔮🇨🇳 Inside the Chinese AI labs where America’s AI controls created its toughest competiton

2026-05-14 00:47:43

We spent a week on the ground in China, visiting AI and robotics labs and seeing how things operate firsthand. We traveled through Beijing, Hangzhou and Shanghai to meet representatives from 14 labs, including DeepSeek, MoonshotAI, MiniMax, Z.ai, ByteDance, 01.AI, Alibaba, Ant Group, Xiaomi, AInnovation, Galbot, Unitree, ModelScope, and RWKV; joined by our friends , , , , , , , and who made it all happen.

Our group at ModelScope

We participated in dozens of hours of discussions with researchers, founders, product leaders, and business owners across the infrastructure, hardware, models and application layers. Every lab is obsessed with ByteDance’s Doubao, and respectful of DeepSeek’s scientific process. Claude is the model of choice for coding, universally rated as the best thing out there. The researchers we met were humble, welcoming, focused purely on technical priorities and building the next big model. Researchers were also very young: at one lab in particular, the average age was 25.

But chip constraints are real. Everyone wanted Nvidia chips, and the constraint was showing up in longer and longer pre-training runs and iterative cycles.

We arrived in China and really wanted to understand how severely export controls were biting and how much they were harming AI development. The stock of AI compute in China is running two to three years behind that of the US. In the short term, it’s clear that the controls are making it harder.

But, as we discovered, it’s not as obvious over the long term.

The export controls have become capability-generating – labs in China are forced to be ruthlessly efficient. Despite the three-year compute handicap, Chinese open-source models are only six to eight months behind the US frontier.

By listening to the researchers and digging into the data, we have estimated quantitatively just how significant that efficiency capability is. We reckon Chinese labs are extracting 4-7x as much intelligence per unit of compute as naive scaling predictions would suggest.

On the day a sitting US president lands in Beijing for the first time in nearly a decade, we decided to publish our research into how and why the constraints have inadvertently created the conditions for the most formidable competitors to develop exactly the capabilities that will matter most in the coming years.

Subscribe now


The compute gap, the US lead

In every meeting with every Chinese lab, we heard a common refrain: we do not have enough compute. Less compute means fewer experiments and smaller models. This is a genuine constraint on research, development and deployment of AI.

That isn’t surprising. American researchers complain, too. So do the business people, Microsoft and Anthropic, for example, have explicitly stated that the lack of compute capacity has cost them meaningful revenue.

But the compute constraint in China is different. It isn’t just that there is less capital around – China’s AI startups raised $12.4 billion in 2025 compared to $285 billion in the US. It is that export controls on chips, initiated by Joe Biden in October 2022 and consequently relaxed and tightened at various times by President Trump, have all but choked off the supply of advanced chips to the Chinese market. The interesting thing to us was seeing firsthand how local labs have responded.

Let’s step back for a moment first to put into context the compute gap.

US labs trumpet securing large amounts of compute. In recent weeks and months, Anthropic alone has signed deals totaling over 10 gigawatts of capacity – with Amazon, Google, Microsoft, Nvidia and SpaceX. OpenAI committed to 10 gigawatts of Nvidia systems last September, backed by up to $100 billion in Nvidia investment. These orders are now exclusively for the latest, most powerful silicon: Nvidia’s Blackwell series (B200, B300, BG200) shipping today and the next-generation Vera Rubin platform arriving later this year, and increasingly Google TPUs and others.

These large orders simply are not options for Chinese hyperscalers and labs. Supply isn’t entirely dry, Chinese customers are still getting their hands on Nvidia’s H100s, B200s and B300s. These are coming through from Singapore, by and large, via shell companies where shipments are relabelled as tea or toys. But quantities are at least an order of magnitude below their US rivals.

It is these top-tier American chips, especially the most recent vintages, that count. A single GB300 NVL72 rack (72 of Nvidia’s latest GPUs operating as one system) delivers 30x faster real-time inference than the equivalent H100 cluster from three years earlier, with 3.6x more memory per chip and 25x lower energy per inference. US labs are now ordering these systems by the gigawatt. Chinese labs cannot.

Chinese tech firms, notably Huawei, have made strides in building chips suited for AI. But even Huawei’s latest, the Ascend 950PR, launched in March, is roughly on par with the H100, released in 2022. The systems are shipping in far smaller volumes. NVIDIA is estimated to have shipped 7 million Hopper and Blackwell GPUs through October 2025 alone and the rate is increasing. Huawei plans to ship 750,000 Ascend 950PR chips this year, which is still around a tenth of what Nvidia shipped last year.

The result is that the US has a staggering lead in deployed AI compute capacity.

The lead is widening, not shrinking. In 2023, the US AI sector had triple the amount of deployable compute–almost all of which would have focused on training AI models. By the start of this year, that gap was closer to eightfold.

Put differently, by the end of 2025, Chinese labs could likely access roughly the same scale of compute that the US enjoyed two years earlier.

The difference is how that compute is used. In 2023, most American capacity was tied up in training, not serving customers. By contrast, in 2025, China’s compute stack, augmented by data centers in Malaysia and Singapore, was doing double duty – supporting model training and serving hundreds of millions of consumers, and a rapidly growing base of enterprises, through apps like WeChat, Doubao and Alipay.

It’s important to separate compute capacity for training AI models from serving customers. China has a huge AI market. Doubao alone reaches 100 million daily active users. Token volumes are equally vast. By February 2026, we estimate Chinese token volumes had reached ~9 quadrillion tokens a month – compared to ~4 quadrillion across the main US/Western providers.

Alongside datacenters in Malaysia and Singapore, a large part of Chinese compute infrastructure is going to serve customers through inference. If half of the compute is used to serve those customers, that reduces the available compute for training models. We might conclude, with low confidence, that by the end of 2025, Chinese labs had as much compute available for model training as American labs did in mid-2023.

By that logic, the performance of models from Chinese labs should be at least two years behind American if labs in both countries are using the same approach – more computing and more data to build better models1. The framework treats capability as a function of compute, holding training efficiency roughly constant.

But we aren’t seeing a 2-3 year gap.

The headline is that Chinese models are three to six months behind the US on benchmark performance, according to DeepSeek, and 8 months according to the Center for AI Standards and Innovation, a US government agency.

In fact, Chinese labs appear to be keeping pace with, or perhaps even narrowing the gap in some ways, with US labs. The question for us then became: are the capability headlines wrong, or is something closing the gap that the computed numbers don’t capture?

There is an additional wrinkle: market structure. In the US, five key frontier labs dominate training compute: OpenAI, Anthropic, Google DeepMind, Meta and xAI. In China, a thousand flowers are blooming. The big tech firms are developing their own frontier models:

And even larger firms are coming in, either because they have specific data and expertise, such as Ant Financial, with their Ling series, or Meituan, known for its on-the-go delivery-retail platform, which has also entered the LLM development market.

The impact of so many firms training their own models is that the pool of compute is being divided still further.

Subscribe now

The efficiency moat strikes back

These labs are clearly finding efficiencies in training performant models. The efficiencies are actually being passed through to the models’ inference. That is, when they’re being used to serve customers, because they are much cheaper than roughly equivalent American models.

One shouldn’t put too much weight on AI benchmarks. They can be gamed, and they might not easily reflect how a models “feels” or works in practice. But they are one inadequate reference point. DeepSeek’s V4 Pro, their flagship model, is comparable to Claude’s Opus 4.6, in some ways. Opus 4.6 was released in Feb 2026, and is not Anthropic’s latest model. Cost-wise, though, you can see the difference. DeepSeek charges $0.43 per million input tokens and $0.87 per million output tokens. Opus 4.6 is 11 times more costly in input and 28 times more expensive in output.

These are not promotional one-offs. Across the Chinese frontier, Kimi K2.6 sits at $0.95 input (among the cheapest models in the global top 10 by GPQA Diamond), and Alibaba’s Qwen models are priced in a similar band. The cost-to-serve inference is a function of three factors: the actual cost of serving the model, its compute complexity and energy costs, and the margin the provider is willing to give up.

Z.ai’s public status screen

The margins appear largely healthy. Z.ai serves its GLM-5 model at $1.00 per million input tokens, which is 3x cheaper than Claude Sonnet 4.6, and 5x cheaper on output. Despite this, it boasts a 50% gross margin, and MiniMax enterprise margins sit at 70%, though we don’t know if this holds across the board. DeepSeek, for its part, ran for years on internal funding alone, only turning to outside capital this month.

Finally, that efficiency shows up in how easily these models run on consumer hardware like laptops and phones. The leading local models in the world are almost all Chinese open-source, aggressively distilled down to smaller, lighter variants. A 5GB Qwen3-8B model runs on my Mac, as does DeepSeek R1’s 7B distilled variant, which has been pulled 85 million times on Ollama, the second-most-downloaded local model in the world. Cursor even built its Composer 2 model on top of MoonshotAI’s Kimi K2.5 model. The only US-based open-source model we run locally is Google’s Gemma 4.

How did they do it?

Read more

📈⏳ The broken bargain of Moore’s Law

2026-05-11 19:29:06

A couple of weeks ago, Bloomberg reported that TSMC had no plans to use ASML’s newest chipmaking machine – High-NA EUV (EUV stands for extreme ultraviolet) – through 2029, citing cost as the issue. This may be a big deal.

Moore’s Law was always two things: a physical observation about transistor density, and an economic bargain about cost. The physics has been slowing for years.

Is TSMC’s hesitation the first sign that the economics are reversing, too?

Source: ASML

Subscribe now

I. The bargain breaks

Semiconductor manufacturing experienced one of the steepest learning curves ever recorded in any industry. For five decades and 10 generations of technology, each more expensive tool delivered cheaper chips – reliably, every 18 to 24 months.

The consequences are everywhere around us. The smartphone in your pocket has more raw compute than a 1990s supercomputer and costs less. The cloud infrastructure that runs modern AI exists because each successive generation of chips was cheaper per operation than the last. Software could eat the world because the hardware kept getting cheaper.

The cost per transistor – the broadest measure of chip economics – stopped falling in 2011. The narrower measure that tracks what lithography itself delivers, transistors per wafer-dollar, kept improving for another decade, helped by the industry’s shift to extreme ultraviolet light (EUV) around 2019, a shorter wavelength that let chipmakers print finer features and restored the cost-down curve. Now it has reversed too. High-NA should be the next step in the bargain, but if the best customer won’t take it, the bargain doesn’t hold.

II. What changed

Read more

🔮 Exponential View #573: Are the AI labs building for an intelligence explosion?

2026-05-10 11:05:00

One of the best to understand AI capability curves. — Mark C., a paying member

Subscribe now


Hi all,

Back home after three weeks on the road and easing back in.

In this week’s issue:

First, AI self-improvement. thinks there is a real chance a frontier model trains its successor by 2028. If that is true, what should we already be seeing in how the labs hire, spend and build?

Then, jobs and AI. In some of the occupations most exposed to AI, postings are rising.

Finally, what will more capable AI agents mean for token budgets? Exponential View members get access to my interactive model.

Let’s jump in!


The signs of self-improvement

Anthropic’s has argued that there is a 60% chance that a frontier model will train its successor by 2028. It is an exciting claim, perhaps revolutionary, perhaps frightening; the prospect of a recursive intelligence explosion.

There are plenty of reasons to read Jack’s essay and conclude that the picture might not be so clean.

One objection is that frontier training now looks less like a pure research challenge and more like an industrial scaling problem. The bottlenecks are not only about optimizing CUDA kernels. They are about negotiating land leases in Wyoming, securing power infrastructure, obtaining chips, and hiring the electricians to wire it all together. Over a three-year horizon, those physical constraints may matter more than algorithmic advances.

So what can we infer from the revealed preferences of frontier labs? If automated R&D were truly likely by 2028, what would we expect their behavior to look like now?

First, hiring would change. Labs would still want elite researchers, but the profile would shift towards people who can make research agents useful. Fewer pure researchers, more research multipliers, people who can build an automated research factory.

Second, labs would overinvest in compute before the automation arrives. Because of those physical constraints, they would want more GPUs, more memory, more power, more data centers, more inference capacity, and better internal tooling.

If a lab believes the R&D loop is about to accelerate, then waiting becomes expensive. You would expect it to tolerate ugly near-term cash burn in order to secure the pre-commitments it needs.

Read more

📈 Data to start your week: AI boom, nowhere near the ceiling

2026-05-04 19:15:50

Hi all,

As we wrote three months ago, we are in the midst of a compute crunch – demand is running ahead of supply and our position has stayed the same:

The real risk isn’t that we’ve invested too much in AI. It’s that we haven’t invested nearly enough.

Today’s quick look at the data shows that much of the supply remains latent, waiting on enterprise funding. When firms start spending serious money, the compute crunch will get crunchier.

Subscribe now


On the supply side:

Nvidia B200 GPU rental prices grew 114% in six weeks. Frontier model releases pull demand toward the newest chips; the premium customers pay per hour to get a B200 instead of an H200 has grown by more than 6x.

Infrastructure provider Lightning AI says that some forty of its customers are seeking 400,000 GPUs, ten times the current fleet of 40,000. Providers are already rationing GPUs by customer size. Microsoft now requires Blackwell customers to lock in at least 1,000 chips for a year and is cutting off smaller customers whose servers sit idle.

Read more

🔮 Exponential View #572: AI’s moats, myths and moral loopholes

2026-05-03 10:25:52

Hi all,

Over the past week, I have been in China with , meeting AI and robotics teams including Zhipu and MiniMax (the two publicly listed foundation model companies), as well as Kimi, Alibaba, Xiaomi, Bytedance and others.

Demand is booming. Zhipu, for example, is serving 5.5 trillion tokens per day. Developers are rushing to the platform, joining at about ten a minute. Keeping up with inference loads is a struggle in China, as it is in the US. Teams universally acknowledged compute constraints, particularly the shortage of Nvidia chips. But that isn’t stopping innovation under those constraints.

Researchers and developers are free to use whichever model they want. Anthropic’s Claude was the preferred choice for technical teams, but it was clear that “dog-fooding” is commonplace.

I’m on the flight back to London right now. Hannah is continuing on to Shenzhen to check out more hardware firms. We’ll write something in more depth in the next couple of weeks. For now, here’s a short video from one of the many demos we saw at Unitree:

Huge thanks to our local hosts who were generous with their time, hospitality and swag. And a shout-out to Rintoul, who pulled together this amazing trip with expertise and great humor.

Azeem

Subscribe now


We can’t stop, we won’t stop

gets to grips with that odd Silicon Valley paradox, that many engineers think the very thing they are creating, AI, will mean that the “median person is screwed”. What’s more, those same builders profess to having no clue how to stop this deracination.

Jasmine was in China with me this week. She “gets” the people in American AI labs and the culture surrounding them better than anyone. Her stellar essay in The New York Times is revealing:

In general, tech industry sources expressed more extreme concern about the labor market impacts of A.I. in private conversation — but suddenly became optimists once I turned on the mic.

I don’t agree with the depth of fatalism coming from the labs. Diffusion of the technology will be slower than flicking a light switch, even if it proves faster than the rollout of electricity.

But that fatalism is itself a fact worth taking seriously. The beliefs might become behaviors. An example: junior hiring might slow down, not because AI can do juniors’ jobs well, but because the labs believe it can and persuade employers that it can. A hiring freeze is an almost inevitable consequence of the patterns of belief.

The danger is that the theory of displacement becomes the logic of our responses. The point should be that we live in a state of uncertainty, as argued later in this issue. That uncertainty means we should be skeptical of preordained outcomes, however smart and well-paid those who make them are.

See also:

Subscribe now


Silicon philosophers

A new paper out this week tested seven frontier models on their ability to role-play professional philosophers and simulate expert judgment. Some models looked passable on average, especially on questions where philosophers already agreed. But real philosophers agree far, far less than the AI. The variance of views was two to four times higher in humans’ favor.

Read more

🔮 Exponential View #571: DeepSeek shows the future, again; drones on a learning curve; solar goes up, LLM pixels & tennis robots++

2026-04-26 10:40:51


War on a learning curve

When, a year ago, I spoke with Ukrainian veteran drone pilot Jack de Santis, he said that soldiers who leave the front for rehabiliation and return after eight or nine months, need full retraining because drone warfare changes that fast. Since then, Ukraine accelerated the pace of iteration to seven days. I was so struck by this number, that I asked the team to dig into it and prepare a detailed briefing for members:

Subscribe now


DeepSeek shows us the future, again

When GPT-5.5 rolled out this week, OpenAI’s reasoning research lead Noam Brown, tweeted

with today’s AI models, intelligence is a function of inference compute. Comparing models by a single number hasn’t made sense since 2024. What matters is intelligence per token or per $.

With the compute crunch, doing more with less compute could be a winning strategy.

And it runs counter to the culture American AI labs enjoyed for the past few years. The mantra was “moarrr compute, better benchmarks.” Chinese labs, with less capital and minimal access to cutting-edge compute, didn’t have that luxury. So they adapted. In a nutshell, how much real-world capability can they afford to deploy per token, per user?

DeepSeek’s new V4 model is marginally worse than GPT-5.4 ( has an excellent technical breakdown of it), but it is 4x cheaper, reflecting, in part, lower compute costs. As inference costs approach 10% of total engineering headcount spend, that line item begins to matter.

argues further:

In China, compute is more than an expense line. It is a strategic constraint shaped by export controls, chip supply, cloud capacity, domestic hardware readiness, and inference economics. […] DeepSeek is turning compute scarcity into a set of design specifications.

I am on my way to Beijing and will be looking to understand exactly how this plays out on the ground.


A MESSAGE FROM OUR SPONSOR

The developer conference of the year. On demand on Salesforce+

Agentic AI is reshaping how software gets built, and Agentforce is out in front.

Catch the TDX sessions on Salesforce+ to see what this looks like in practice: deep dives on Agentforce, Data 360, the core Salesforce platform, vibe coding, Slack, and more – all available to stream for free.

By watching TDX on Salesforce+ you can:

  • See what’s coming next: Hear directly from the leaders defining the roadmap for AI-native development.

  • Go beyond the main stage: Access broadcast‑only segments and candid interviews you won’t find elsewhere.

Start with the main keynote to experience how software is being reinvented, then explore the full programme at your own pace.

Watch now


Where humans matter most

In 2018, I argued that “automated perfection is going to be common… What is going to be scarce is human imperfection.” I returned to it a few months ago:

When I first wrote about artisanal cheese, I imagined this shift unfolding more slowly, alongside the automation of routine office work and putting more robots on assembly lines. I didn’t anticipate that, nine years later, I could build custom software in an hour or produce work that once required entire teams while walking through customs.

Economist wrote an excellent essay, a vivid case study of how automation reshapes where humans matter most. When the internet entered the economy, 60% of travel agents lost their jobs. The routine tasks of searching, comparing, reserving etc. moved from the agents’ computers to ours. What happened to the remaining 40%? It was hardly fighting over scraps; what we wanted from travel changed in tandem. And for the surviving two-fifths, it was more upmarket, higher-end, curated. As a result, travel agent salaries grew from 87% of the private‑sector wage average in 2000 to 99% by 2025.

I like ’s word for this destination – “the relational sector”. It’s work whose value is inseparable from the human providing it.

My next luxury holiday

Solar overtakes nuclear

In 2025, the world’s solar panels generated nearly as much electricity as the world’s nuclear reactors. So far in 2026, Ember’s new data shows that solar is starting to overtake nuclear on a 12-month rolling basis.1 Nuclear is still vital, of course. Last year was its best year ever in absolute terms, but its share of global electricity has fallen from a 1996 peak of 17.5% to under 9%.

The main way we explain this is that energy is becoming a technology, not a commodity. Commodities get scarcer and pricier as you extract them. Technologies are on a learning curve, getting cheaper as you make more. Solar module prices have fallen over 90% since 2010, and in 2025, solar met three-quarters of all new electricity demand on the planet. As solar rides learning curves, new markets open and make possible what was previously impossible. We call this the solar supercycle.

Read more