2026-03-24 01:56:17
Hi all,
The Iran War has created a chokepoint in the supply of helium, a byproduct of natural gas processing and LNG production that’s used in more than 20 steps of semiconductor fabrication. Headlines are making the helium situation sound apocalyptic for chips, but the reality is more nuanced.
Here’s what’s going on:
A third of the world’s helium is trapped. Qatar is the source of 34% of global helium supply.
One-sixth destroyed. Last week, Iran struck the the world’s largest concentrated source of helium, the Ras Laffan facility near Doha. QatarEnergy has reported “extensive” damage which will cut 14% of annual helium exports and take 3-5 years to repair.1
The two month clock. Liquid helium ships in insulated containers that stay viable for 35 to 48 days. After that, it warms, turns to gas, and escapes Earth’s atmosphere forever.
Memory manufacturing is exposed. South Korea sourced 64% of its helium from Qatar last year. Its fabs use helium to produce approximately 80% of the world’s High Bandwidth Memory, which is essential for leading AI chips.
But the leading edge is secure. South Korea’s SK Hynix says that “there is almost no chance that the company will be affected” due to supply diversification; and another Korean memory player, Samsung, introduced industry’s first helium reuse system before the war started.
Helium is a rounding error in chip costs. Helium accounts for an estimated 0.5-1% of total semiconductor manufacturing cost. Fabs will likely pay up to secure supply.
At the front of the queue. Helium suppliers tend to prioritize allocation to high-priority sectors like semiconductors and MRI. Balloons and welding, which together make up 25% of US demand, are going to be cut first.2
So what’s likely to happen? Even in a severe scenario where Qatar takes years to fully restore helium supply, the direct cost impact on semiconductors remains modest. Helium accounts for roughly 0.5-1% of fab costs; even a tripling of prices only lifts this to around 1.5-3%. For high-value chips, particularly AI accelerators, this is easily absorbed or passed through. If there is a potential constraint, it is the immediate logistics. SK Hynix procurement team may have to work longer hours over the next few weeks but we expect no significant threats to the AI economy in either the short or long term.
Thanks for reading!
Inference based on LNG repair timelines.
Other countries, like South Korea, likely have different demand profiles.
2026-03-22 14:50:10
Hi all,
Happy Sunday. I’m en route to NYC for a week of meetings and speaking. If you’re travelling this week too, it’s a great moment to catch up on my latest podcast episode, where I explain why I changed my mind about Apple’s AI outlook:
Let’s go!
We’re about to discover, the hard way, that most organizations are thinking about AI in entirely the wrong way.
In a training-first world, it made some sense to see AI as a capital project. But we don’t live in that world any more, as Jensen Huang made clear at NVIDIA GTC this week. We’ve shifted to an inference-first economy, where the dominant activity is running models – continuously, at scale, across millions of workflows and, increasingly, AI agents.
In that world, tokens are not an IT line item. They are a productive input, as fundamental to knowledge work as electricity, office space, or salary. I write about this at some length in my latest essay:
Energy is the main weapon in the Iran war. As we wrote a couple of weeks ago, this is the old world of energy scarcity on full display. Several import-dependent Asian economies, including Bangladesh, Pakistan and the Philippines, have taken emergency measures to manage the energy shock.
But one country on that list, Pakistan, is faring better than the others. We were one of the first in 2024 to show the silent energy transition Pakistan underwent in the last two years, importing 17GW of solar panels in a single year, against a total generation capacity of 46GW. This transformation, we argued at the time, would be massive for Pakistan’s energy security:
92% of countries would be more energy secure under a renewable paradigm than under a fossil one. Many countries used fossil fuels as a geopolitical weapon. From OPEC’s oil embargoes to Russia’s manipulation of natural gas, control over these resources gave them significant power. With renewables, this power dissipates. Once you have the solar panel, you have the energy – OPEC can’t block the sun.
Pakistan has so far vindicated that argument. The solar transformation helped reduce their exposure to the price rises in fossil fuels this year by at least $6.3 billion, equivalent to 1.7% of the country’s GDP.
The underappreciated case for solar has to do with the sovereignty. Renewables localize power, as we wrote in our 2024 essay on the dawn of the distributed age; Pakistanis are increasingly responsible for their energy security. And it’s not just the sovereignty that matters here but its compounding effects. Our position is that solar as technology is undergoing a supercycle, the declining cost of solar panels will drive the opening of new markets which means that every year that Pakistan continues to replace fossil fuels with solar, it is building a durable technology stack to participate in the new economy.
Last weekend, I spotted the AI legend, , had vibe-coded an overview of AI labor market risk. It was rapidly spreading across X. As I strolled down to my studio, a three-minute mosey, I asked my OpenClaw agent, R Mini Arnold, to do this for more than just the US.
After about 15 minutes, it came back with an interactive tool covering 25 countries. For each country, it grabbed employment by job category, as defined by the relevant statistical authority. Then it scored 1.4 billion jobs across several hundred categories in those two dozen-and-change countries. You can see the screenshot of the version I built below. I’ve obfuscated some numbers.
In the old world, this kind of first draft would’ve cost a major consultancy seven figures and taken months of work. My second reaction was, I want to share this with the Exponential View community. But I took three deep breaths and decided it would be an irresponsible thing to do. Andrej deleted his v1 after pushback: he said people were misinterpreting the data. The second version is live here.
I won’t get bogged down in whether Andrej was right or wrong to publish this exploratory tool the way he did. But I will tell you why I didn’t publish my casual vibe-coded map of jobs at risk. In simplest terms, information and analysis matter. They carry consequences for how people understand their role in the world.
2026-03-21 18:06:21
Jensen Huang delivered a stunning performance at this year’s GTC. I want to reflect on one thing Jensen said that tells us a lot about the near future of AI:
Every company needs an OpenClaw strategy.
OpenClaw is the most exciting piece of technology I have seen since the web browser in 1992. Jensen reached the same conclusion independently, from the other end of a trillion-dollar supply chain. Today, I want to explain what the shift Jensen described means for the AI economy, for your organization and for the window that is already closing.
When ChatGPT arrived in late 2022, the underlying economics of AI were dominated by training. Building a large language model required astronomical compute – 10²³, 10²⁴ floating point operations or more. GPT-4 reportedly cost over $100 million to train. Meta’s Llama series consumed thousands of NVIDIA chips running for months. Data centers were configured for maximum parallelism, maximum throughput, everything optimized for the one-time act of building a model.
But training is, by definition, a one-time event. A model is trained once and then run — inferred against, in the technical vocabulary — millions or billions of times by actual users. And inference has completely different economics from training.
This is what makes Groq worth $20 billion for NVIDIA. When you send a prompt to a language model, two things happen. First, the pre-fill phase when the model reads and processes your input tokens simultaneously, in parallel. This is enormously compute-intensive and it’s where GPUs shine – they were built for graphics. But then comes the decode phase, the generation of the response, word after word, like a slow teletype. The model produces tokens one at a time, each depending on the previous one. This cannot be parallelized. It is structurally sequential. The bottleneck here is memory bandwidth. How fast can you stream the model’s weights from memory for each individual token step? Thousands of GPU cores sit largely idle, waiting on memory reads, just to produce one token at a time.
GPUs were not built for this. As the AI economy shifts from building models to running them constantly at scale – for billions of users and, crucially, for millions or billions of agents – GPU inefficiency in the decode phase becomes a serious structural problem. Groq’s chips are designed specifically for inference throughput. The combined Vera Rubin and Groq architecture, due later this year, will deliver a 35-fold improvement in throughput per megawatt versus NVIDIA’s current Blackwell chips. That’s a signal about what NVIDIA believes the inference market is about to become.
How large is large? As users shifted from simple chatbot interactions to reasoning models and now to agentic systems that run complex, multi-step workflows, compute applied per user interaction increased by a factor of 10,000. At the same time, the number of users deploying AI systems at scale increased by a factor of 100. The outcome is a million-fold expansion in inference demand over roughly two years.
A million times. In two years.
In summer 2024, I was applying perhaps 100,000 to 150,000 tokens per day – heavy usage by the standards of the time. Two weeks ago, my AI chief-of-staff R. Mini Arnold had reached 100 million tokens per day on average. That’s three orders of magnitude in less than two years.
On Monday, working with R. Mini Arnold and its four sub-agents – R. Veblen for book research, R. Simmons for portfolio management, R. Bradlee for editorial analysis, R. Gulbenkian for AI economy frameworks – I applied 870 million tokens in a single day. Close enough to a billion. My average for the past week has been well above 200 million per day, seven days a week. And this is what work looks like when you have the right harness, OpenClaw.
The standard debate about the AI investment and AI in general has mostly been about the wrong thing. The boosters and the sober-minded researchers at frontier labs argue that capability improvements are not near a wall. On the other end are the skeptics, who point to diminishing returns on training compute and say the AI wave is speculative. Both camps are measuring the performance curve and missing the diffusion curve. And the diffusion curve is the one that determines whether the trillion-dollar infrastructure bet is justified.
Consider the internal combustion engine. A mechanical engineer with access to a bare engine can do something useful – connect it to a generator, build a hedge trimmer, strap it to a bicycle. Impressive. But what makes the internal combustion engine transform an economy is the harness: the automobile. The car is not a clever engine. It’s a chassis, seats, a windscreen, a steering system, and yes, a terrible hi-fi. But that harness is what makes people use the engine constantly, for decades, at scale.
Would you rather have a race-tuned Ferrari engine sitting on your lawn, or a ten-year-old petrol engine actually in a car you can drive somewhere useful? The harness wins every time.
For AI, the harness moment happened at the tail end of 2025. Claude Code began to work reliably enough that you could leave it running overnight and trust what it had done in the morning. Not perfectly, but reliably enough. And that threshold, that “I can leave it to its own devices” threshold, changed everything. It changed what users asked AI to do. It changed how long tasks ran. It changed the token usage profile of every organization that crossed it.
Now, OpenClaw is the harness for the next layer.
The practical question is what all this means for a firm trying to make sensible decisions in 2026. And here I think the consensus response – more pilots, careful evaluation, measured rollout – is dangerously behind the curve.
Tokens are not an IT line item. They are a productive input, as fundamental to knowledge work as electricity or office space. The organizational error most large companies are making right now is treating token budgets as an IT function – a cost center to be rationalized and kept away from the business units. This is wrong. It is the equivalent of requiring engineers to book time on the engine through the facilities department.
2026-03-21 02:51:09
Today’s live is all about the shift from AI training to the inference economy – how running AI agents at scale is the defining business and hardware challenge of 2026, with Nvidia’s $1 trillion order book and my own 870 million daily tokens as evidence.
2026-03-19 22:01:23
I hold my opinions firmly, but when the facts change, so do I. Today I want to tell you why I changed my mind on Apple and AI.
Apple has been conspicuously slow to deliver on AI. Unlike its peers, it isn’t spending hundreds of billions on data centers. Capital expenditure has been reasonably flat. Siri, the feature we all love to turn off, hasn’t meaningfully improved in a decade. No one expects major breakthroughs from Apple in research AI or applied AI in the near future.
John Gruber, the most widely read Apple analyst, called a WWDC demo “a concept video.” Analyst Ben Thompson said Apple was nowhere near the cutting edge. I was in this group.
But what I missed was that every single day, as I was hammering away at ChatGPT or Claude, switching between models, I was doing it through an Apple device. The model I used changed constantly but the device did not. This only really hit me once OpenClaw arrived.
As soon as it launched, I started running OpenClaw agents. I put my first OpenClaw on the small Mac Mini I keep at home. Within a week, I was running it so hard the audio system and our CCTV cameras had stopped working reliably. The agent was consuming everything the machine had. So I bought a new Mac Mini specifically for the agent, R Mini Arnold. This week, Jensen Huang called OpenClaw “the new computer”. He didn’t mean a Mac Mini but for now, that’s what it runs on.
Slowly, delivery times for new Mac Minis stretched. A machine that arrived in three days when I bought mine in early February was now showing a seven-to-eight-week wait if you configured it with 64 gigabytes of RAM. The Mac Studio had similarly extended from two or three weeks to six or eight. Best Buy shelves ran empty.
At Exponential View, we bought a Mac Mini for , who runs his own agents and a tricked-out Mac Studio that hosts the team’s shared agents. If a company of eight buys two machines for AI infrastructure, the equivalent for a company of 100,000 is 25,000 new computers added to their estate.
There is a structural reason behind all this. Demand for AI inference is extremely high. Data center capacity is constrained, utilization is high and demand is growing faster than chips can roll off the production lines. When the frontier is rationed, the relative appeal of running a model locally changes. The Mac Mini shortage is a demand story, driven by people and organizations doing the same calculation I did.
Apple’s chips use unified memory that is shared across the CPU, GPU and Neural Engine, with extraordinarily high bandwidth for a consumer device. The Neural Engine, which runs nearly 40 trillion operations per second, is optimized specifically for matrix multiplication. That matters because matrix multiplication is what transformer models, the architecture behind every major AI system, actually do. The whole stack was built for something else, but it turns out to be almost perfectly suited for running AI locally.
But the silicon is only one layer. Apple also controls the OS, the App Store and a privacy architecture. It is embedded within the enclaves, the software and the operating system. And because of that, they have something rare – genuine consumer trust. Think about everything you own, from your socks to your wallet. What do you touch the most? Your wedding band if you have one, your glasses if you have them and then an Apple device. That is the degree of consumer intimacy Apple has built.

The consequence is a capture mechanism that Apple has been running for years and years without anyone calling it an AI strategy. Every third-party model ends up having to go through something Apple controls and that includes the App Store. They capture the value of those relationships even when they don’t own the model.
When television manufacturers hit 4K resolution, a few pushed further. 8K sets appeared in showrooms, some people bought them. But the human eye can’t resolve detail above roughly 4K at normal viewing distances and the industry found its natural ceiling. No one is seriously building 16K TVs.
I think AI is approaching the same thing – not at the frontier, but for the tasks that actually fill most people’s days. Dario Amodei describes a “country of geniuses in a data centre,” Nobel Laureate-quality intelligence applied to every problem. That capability matters enormously for drug discovery, for scientific research, for genuinely hard reasoning tasks. But most of what we ask our AI is not that. For most daily tasks, you don’t need GPT-19.6. You need something closer to GPT-5.8 and through model distillation and efficiency improvements, that level of capability is already arriving on local devices.
When local is good enough for the task, the question is no longer which model but whose device? And Apple has been building that device, with the unified memory, the Neural Engine, the privacy enclave, the consumer trust for two decades.
Watch my full commentary on YouTube or listen here:
2026-03-17 02:30:29
Hi all,
Here’s your Monday round-up of data driving conversations this week in less than 250 words.
Let’s go!
Bricks and bytes ↑ Data center spending overtook office spending in December 2025.
Silicon gravity ↑ Global semiconductor sales hit $82.5 billion in January 2026, up 46.1% year-on-year.
Doctor’s orders ↑ Over 80% of US physicians now use AI at work.