2026-05-15 17:17:16
Cerebras Systems, which makes massive, beautiful chips for AI workloads, hit the Nasdaq yesterday. At one point, the stock was up 157%, before settling for a more muted 107% return.
On the first day, IPO pops bring out the bears and cynics for good reason. The dotcom heyday was full of them. Calico Commerce, VA Linux, TheGlobe.com: each up 300%, 605% and 697.5% on day one, respectively. Remarkable performances, long-term disasters. VA Linux, the “best” of them, lost about 98% of its value. TheGlobe.com and Calico were effectively wiped out.
What did they have in common? Low revenues, fast growth rates and a booming, booming, booming market.
So when Cerebras opened 75% above the expected IPO price, it was tempting to file it away as another overhyped stock riding the frenzy around a new technology, itself bundled in layers of hype.
What I saw instead was a sign that the market is finally starting to grasp the demand for AI inference.
I’ve been following Cerebras since 2018, when I first spoke with Andrew Feldman, the founder. Last year, and I visited their headquarters in Sunnyvale.
2026-05-14 00:47:43
We spent a week on the ground in China, visiting AI and robotics labs and seeing how things operate firsthand. We traveled through Beijing, Hangzhou and Shanghai to meet representatives from 14 labs, including DeepSeek, MoonshotAI, MiniMax, Z.ai, ByteDance, 01.AI, Alibaba, Ant Group, Xiaomi, AInnovation, Galbot, Unitree, ModelScope, and RWKV; joined by our friends , , , , , , , and who made it all happen.
We participated in dozens of hours of discussions with researchers, founders, product leaders, and business owners across the infrastructure, hardware, models and application layers. Every lab is obsessed with ByteDance’s Doubao, and respectful of DeepSeek’s scientific process. Claude is the model of choice for coding, universally rated as the best thing out there. The researchers we met were humble, welcoming, focused purely on technical priorities and building the next big model. Researchers were also very young: at one lab in particular, the average age was 25.
But chip constraints are real. Everyone wanted Nvidia chips, and the constraint was showing up in longer and longer pre-training runs and iterative cycles.
We arrived in China and really wanted to understand how severely export controls were biting and how much they were harming AI development. The stock of AI compute in China is running two to three years behind that of the US. In the short term, it’s clear that the controls are making it harder.
But, as we discovered, it’s not as obvious over the long term.
The export controls have become capability-generating – labs in China are forced to be ruthlessly efficient. Despite the three-year compute handicap, Chinese open-source models are only six to eight months behind the US frontier.
By listening to the researchers and digging into the data, we have estimated quantitatively just how significant that efficiency capability is. We reckon Chinese labs are extracting 4-7x as much intelligence per unit of compute as naive scaling predictions would suggest.
On the day a sitting US president lands in Beijing for the first time in nearly a decade, we decided to publish our research into how and why the constraints have inadvertently created the conditions for the most formidable competitors to develop exactly the capabilities that will matter most in the coming years.
In every meeting with every Chinese lab, we heard a common refrain: we do not have enough compute. Less compute means fewer experiments and smaller models. This is a genuine constraint on research, development and deployment of AI.
That isn’t surprising. American researchers complain, too. So do the business people, Microsoft and Anthropic, for example, have explicitly stated that the lack of compute capacity has cost them meaningful revenue.
But the compute constraint in China is different. It isn’t just that there is less capital around – China’s AI startups raised $12.4 billion in 2025 compared to $285 billion in the US. It is that export controls on chips, initiated by Joe Biden in October 2022 and consequently relaxed and tightened at various times by President Trump, have all but choked off the supply of advanced chips to the Chinese market. The interesting thing to us was seeing firsthand how local labs have responded.
Let’s step back for a moment first to put into context the compute gap.
US labs trumpet securing large amounts of compute. In recent weeks and months, Anthropic alone has signed deals totaling over 10 gigawatts of capacity – with Amazon, Google, Microsoft, Nvidia and SpaceX. OpenAI committed to 10 gigawatts of Nvidia systems last September, backed by up to $100 billion in Nvidia investment. These orders are now exclusively for the latest, most powerful silicon: Nvidia’s Blackwell series (B200, B300, BG200) shipping today and the next-generation Vera Rubin platform arriving later this year, and increasingly Google TPUs and others.
These large orders simply are not options for Chinese hyperscalers and labs. Supply isn’t entirely dry, Chinese customers are still getting their hands on Nvidia’s H100s, B200s and B300s. These are coming through from Singapore, by and large, via shell companies where shipments are relabelled as tea or toys. But quantities are at least an order of magnitude below their US rivals.
It is these top-tier American chips, especially the most recent vintages, that count. A single GB300 NVL72 rack (72 of Nvidia’s latest GPUs operating as one system) delivers 30x faster real-time inference than the equivalent H100 cluster from three years earlier, with 3.6x more memory per chip and 25x lower energy per inference. US labs are now ordering these systems by the gigawatt. Chinese labs cannot.
Chinese tech firms, notably Huawei, have made strides in building chips suited for AI. But even Huawei’s latest, the Ascend 950PR, launched in March, is roughly on par with the H100, released in 2022. The systems are shipping in far smaller volumes. NVIDIA is estimated to have shipped 7 million Hopper and Blackwell GPUs through October 2025 alone and the rate is increasing. Huawei plans to ship 750,000 Ascend 950PR chips this year, which is still around a tenth of what Nvidia shipped last year.
The result is that the US has a staggering lead in deployed AI compute capacity.
The lead is widening, not shrinking. In 2023, the US AI sector had triple the amount of deployable compute–almost all of which would have focused on training AI models. By the start of this year, that gap was closer to eightfold.
Put differently, by the end of 2025, Chinese labs could likely access roughly the same scale of compute that the US enjoyed two years earlier.
The difference is how that compute is used. In 2023, most American capacity was tied up in training, not serving customers. By contrast, in 2025, China’s compute stack, augmented by data centers in Malaysia and Singapore, was doing double duty – supporting model training and serving hundreds of millions of consumers, and a rapidly growing base of enterprises, through apps like WeChat, Doubao and Alipay.
It’s important to separate compute capacity for training AI models from serving customers. China has a huge AI market. Doubao alone reaches 100 million daily active users. Token volumes are equally vast. By February 2026, we estimate Chinese token volumes had reached ~9 quadrillion tokens a month – compared to ~4 quadrillion across the main US/Western providers.
Alongside datacenters in Malaysia and Singapore, a large part of Chinese compute infrastructure is going to serve customers through inference. If half of the compute is used to serve those customers, that reduces the available compute for training models. We might conclude, with low confidence, that by the end of 2025, Chinese labs had as much compute available for model training as American labs did in mid-2023.
By that logic, the performance of models from Chinese labs should be at least two years behind American if labs in both countries are using the same approach – more computing and more data to build better models1. The framework treats capability as a function of compute, holding training efficiency roughly constant.
But we aren’t seeing a 2-3 year gap.
The headline is that Chinese models are three to six months behind the US on benchmark performance, according to DeepSeek, and 8 months according to the Center for AI Standards and Innovation, a US government agency.
In fact, Chinese labs appear to be keeping pace with, or perhaps even narrowing the gap in some ways, with US labs. The question for us then became: are the capability headlines wrong, or is something closing the gap that the computed numbers don’t capture?
There is an additional wrinkle: market structure. In the US, five key frontier labs dominate training compute: OpenAI, Anthropic, Google DeepMind, Meta and xAI. In China, a thousand flowers are blooming. The big tech firms are developing their own frontier models:
And even larger firms are coming in, either because they have specific data and expertise, such as Ant Financial, with their Ling series, or Meituan, known for its on-the-go delivery-retail platform, which has also entered the LLM development market.
The impact of so many firms training their own models is that the pool of compute is being divided still further.
These labs are clearly finding efficiencies in training performant models. The efficiencies are actually being passed through to the models’ inference. That is, when they’re being used to serve customers, because they are much cheaper than roughly equivalent American models.
One shouldn’t put too much weight on AI benchmarks. They can be gamed, and they might not easily reflect how a models “feels” or works in practice. But they are one inadequate reference point. DeepSeek’s V4 Pro, their flagship model, is comparable to Claude’s Opus 4.6, in some ways. Opus 4.6 was released in Feb 2026, and is not Anthropic’s latest model. Cost-wise, though, you can see the difference. DeepSeek charges $0.43 per million input tokens and $0.87 per million output tokens. Opus 4.6 is 11 times more costly in input and 28 times more expensive in output.
These are not promotional one-offs. Across the Chinese frontier, Kimi K2.6 sits at $0.95 input (among the cheapest models in the global top 10 by GPQA Diamond), and Alibaba’s Qwen models are priced in a similar band. The cost-to-serve inference is a function of three factors: the actual cost of serving the model, its compute complexity and energy costs, and the margin the provider is willing to give up.
The margins appear largely healthy. Z.ai serves its GLM-5 model at $1.00 per million input tokens, which is 3x cheaper than Claude Sonnet 4.6, and 5x cheaper on output. Despite this, it boasts a 50% gross margin, and MiniMax enterprise margins sit at 70%, though we don’t know if this holds across the board. DeepSeek, for its part, ran for years on internal funding alone, only turning to outside capital this month.
Finally, that efficiency shows up in how easily these models run on consumer hardware like laptops and phones. The leading local models in the world are almost all Chinese open-source, aggressively distilled down to smaller, lighter variants. A 5GB Qwen3-8B model runs on my Mac, as does DeepSeek R1’s 7B distilled variant, which has been pulled 85 million times on Ollama, the second-most-downloaded local model in the world. Cursor even built its Composer 2 model on top of MoonshotAI’s Kimi K2.5 model. The only US-based open-source model we run locally is Google’s Gemma 4.
2026-05-11 19:29:06
A couple of weeks ago, Bloomberg reported that TSMC had no plans to use ASML’s newest chipmaking machine – High-NA EUV (EUV stands for extreme ultraviolet) – through 2029, citing cost as the issue. This may be a big deal.
Moore’s Law was always two things: a physical observation about transistor density, and an economic bargain about cost. The physics has been slowing for years.
Is TSMC’s hesitation the first sign that the economics are reversing, too?
Semiconductor manufacturing experienced one of the steepest learning curves ever recorded in any industry. For five decades and 10 generations of technology, each more expensive tool delivered cheaper chips – reliably, every 18 to 24 months.
The consequences are everywhere around us. The smartphone in your pocket has more raw compute than a 1990s supercomputer and costs less. The cloud infrastructure that runs modern AI exists because each successive generation of chips was cheaper per operation than the last. Software could eat the world because the hardware kept getting cheaper.
The cost per transistor – the broadest measure of chip economics – stopped falling in 2011. The narrower measure that tracks what lithography itself delivers, transistors per wafer-dollar, kept improving for another decade, helped by the industry’s shift to extreme ultraviolet light (EUV) around 2019, a shorter wavelength that let chipmakers print finer features and restored the cost-down curve. Now it has reversed too. High-NA should be the next step in the bargain, but if the best customer won’t take it, the bargain doesn’t hold.
2026-05-10 11:05:00
One of the best to understand AI capability curves. — Mark C., a paying member
Hi all,
Back home after three weeks on the road and easing back in.
In this week’s issue:
First, AI self-improvement. thinks there is a real chance a frontier model trains its successor by 2028. If that is true, what should we already be seeing in how the labs hire, spend and build?
Then, jobs and AI. In some of the occupations most exposed to AI, postings are rising.
Finally, what will more capable AI agents mean for token budgets? Exponential View members get access to my interactive model.
Let’s jump in!
Anthropic’s has argued that there is a 60% chance that a frontier model will train its successor by 2028. It is an exciting claim, perhaps revolutionary, perhaps frightening; the prospect of a recursive intelligence explosion.
There are plenty of reasons to read Jack’s essay and conclude that the picture might not be so clean.
One objection is that frontier training now looks less like a pure research challenge and more like an industrial scaling problem. The bottlenecks are not only about optimizing CUDA kernels. They are about negotiating land leases in Wyoming, securing power infrastructure, obtaining chips, and hiring the electricians to wire it all together. Over a three-year horizon, those physical constraints may matter more than algorithmic advances.
So what can we infer from the revealed preferences of frontier labs? If automated R&D were truly likely by 2028, what would we expect their behavior to look like now?
First, hiring would change. Labs would still want elite researchers, but the profile would shift towards people who can make research agents useful. Fewer pure researchers, more research multipliers, people who can build an automated research factory.
Second, labs would overinvest in compute before the automation arrives. Because of those physical constraints, they would want more GPUs, more memory, more power, more data centers, more inference capacity, and better internal tooling.
If a lab believes the R&D loop is about to accelerate, then waiting becomes expensive. You would expect it to tolerate ugly near-term cash burn in order to secure the pre-commitments it needs.
2026-05-04 19:15:50
Hi all,
As we wrote three months ago, we are in the midst of a compute crunch – demand is running ahead of supply and our position has stayed the same:
The real risk isn’t that we’ve invested too much in AI. It’s that we haven’t invested nearly enough.
Today’s quick look at the data shows that much of the supply remains latent, waiting on enterprise funding. When firms start spending serious money, the compute crunch will get crunchier.
Nvidia B200 GPU rental prices grew 114% in six weeks. Frontier model releases pull demand toward the newest chips; the premium customers pay per hour to get a B200 instead of an H200 has grown by more than 6x.
Infrastructure provider Lightning AI says that some forty of its customers are seeking 400,000 GPUs, ten times the current fleet of 40,000. Providers are already rationing GPUs by customer size. Microsoft now requires Blackwell customers to lock in at least 1,000 chips for a year and is cutting off smaller customers whose servers sit idle.
2026-05-03 10:25:52
Hi all,
Over the past week, I have been in China with , meeting AI and robotics teams including Zhipu and MiniMax (the two publicly listed foundation model companies), as well as Kimi, Alibaba, Xiaomi, Bytedance and others.
Demand is booming. Zhipu, for example, is serving 5.5 trillion tokens per day. Developers are rushing to the platform, joining at about ten a minute. Keeping up with inference loads is a struggle in China, as it is in the US. Teams universally acknowledged compute constraints, particularly the shortage of Nvidia chips. But that isn’t stopping innovation under those constraints.
Researchers and developers are free to use whichever model they want. Anthropic’s Claude was the preferred choice for technical teams, but it was clear that “dog-fooding” is commonplace.
I’m on the flight back to London right now. Hannah is continuing on to Shenzhen to check out more hardware firms. We’ll write something in more depth in the next couple of weeks. For now, here’s a short video from one of the many demos we saw at Unitree:
Huge thanks to our local hosts who were generous with their time, hospitality and swag. And a shout-out to Rintoul, who pulled together this amazing trip with expertise and great humor.
Azeem
gets to grips with that odd Silicon Valley paradox, that many engineers think the very thing they are creating, AI, will mean that the “median person is screwed”. What’s more, those same builders profess to having no clue how to stop this deracination.
Jasmine was in China with me this week. She “gets” the people in American AI labs and the culture surrounding them better than anyone. Her stellar essay in The New York Times is revealing:
In general, tech industry sources expressed more extreme concern about the labor market impacts of A.I. in private conversation — but suddenly became optimists once I turned on the mic.
I don’t agree with the depth of fatalism coming from the labs. Diffusion of the technology will be slower than flicking a light switch, even if it proves faster than the rollout of electricity.
But that fatalism is itself a fact worth taking seriously. The beliefs might become behaviors. An example: junior hiring might slow down, not because AI can do juniors’ jobs well, but because the labs believe it can and persuade employers that it can. A hiring freeze is an almost inevitable consequence of the patterns of belief.
The danger is that the theory of displacement becomes the logic of our responses. The point should be that we live in a state of uncertainty, as argued later in this issue. That uncertainty means we should be skeptical of preordained outcomes, however smart and well-paid those who make them are.
See also:
Another way to frame the moment is Dostoevskian: intelligent people convince themselves they have discovered a truth so important that ordinary ethical constraints no longer apply to them. See, Palantir.
This week, Chinese courts have ruled that replacing someone with AI is not, by itself, a lawful reason to fire that person. It is one of the first such decisions worldwide.
The US government has classified the grid supply chain as a national defense bottleneck.
A new paper out this week tested seven frontier models on their ability to role-play professional philosophers and simulate expert judgment. Some models looked passable on average, especially on questions where philosophers already agreed. But real philosophers agree far, far less than the AI. The variance of views was two to four times higher in humans’ favor.