MoreRSS

site iconTomasz TunguzModify

I’m a venture capitalist since 2008. I was a PM on the Ads team at Google and worked at Appian before.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Tomasz Tunguz

The Minimill of AI

2026-06-05 08:00:00

A laptop on my desk now handles 78% of my AI work, with the rest sent to the cloud. The shift came out of my skill distillation work.

Here’s how it works.

I create tasks in Asana. An agent sees the task : scheduling, email triage, research, a CRM update ; & classifies it as easy or hard. If it’s straightforward, a local model on my Mac handles it in seconds. If it’s complex, the same model routes it to a cloud model.

Local router replacing a single queue with a two-lane scheduler

Across the last seven days, daily peaks reached 88%.

Daily share of model route decisions handled locally, May 29 to June 4

As the workload grew, the two-lane design paid off. Throughput jumped about 25%, average task duration fell from 47 seconds to 19, & queue age dropped from 73 seconds to four. Nothing about the work changed. Small, fast tasks simply stopped waiting behind big, slow ones.

The task factory that uses distilled skills is now humming along with 25% more throughput, queue age down 94%, & a much more responsive system. For now, the cloud handles the hard fifth. The Mac handles the rest.

It’s the minimill of agentic work. Nucor’s minimills1 started small, capital-light, & close to demand; within a generation they outflanked the integrated steel giants.

Every laptop, phone, & edge device with enough memory to host a distilled model becomes its own minimill : routing locally, paying cloud rates only for the hard fifth. Tens of millions of these will proliferate inside companies in the next few years, each one quietly absorbing much of the work that today shows up on a hyperscaler invoice.


  1. Nucor began in the 1960s by melting scrap steel in electric-arc furnaces rather than smelting iron ore in giant integrated blast-furnace mills. Each minimill was a fraction of the size & cost of an integrated plant, sited near regional demand, & ran on flexible, lower-cost labor. The integrated mills dismissed minimills as fit only for low-grade products like rebar. Over the next thirty years Nucor moved up-market into sheet steel & structural beams, & by 2014 had become the largest steel producer in the United States, while most of the integrated giants (Bethlehem, LTV, National) had gone bankrupt. Clayton Christensen used the story as the canonical example of disruptive innovation in The Innovator’s Dilemma↩︎

Intelligence Per Dollar

2026-06-03 08:00:00

Screenshot 2026-06-02 at 9.22.43 PM

Yesterday Microsoft added a new metric to a model release card, one that will likely become a standard.1

Average token usage.

In the first row, the Microsoft model hits 71.6 on SWE-Bench Verified using about a third of the tokens Claude Haiku 4.5 burns.

Benchmarks are now measured on two different dimensions, the overall performance & the cost to achieve that intelligence.

This is yet another sign that the era of subsidies2, tokenmaxxing3, & all-out performance for many use cases is over.

Even the most valuable companies in the world cannot afford state-of-the-art intelligence for every conceivable use case.4 Uber capped employee AI spending after blowing through its budget in four months.5 Salesforce is spending $300M on Anthropic tokens & has frozen engineering hires.6

This new dual benchmark answers the buyer’s only question : what is my intelligence per dollar?

Screenshot 2026-06-03 at 5.49.00 AM

Artificial Analysis already benchmarks this.7 GPT 5.5 & Claude Opus 4.8 land within a point of each other on the Intelligence Index, around 60. Running the index costs $3,357 on GPT 5.5 & $4,685 on Opus 4.8. Same answer, 40% more expensive.

Model companies must now compete on both dimensions. The application layer will compete one level up, on dollars per outcome, what a closed ticket, a shipped PR, or a resolved support case actually costs.

Every layer in the stack now has to price the same way the customer thinks : per result, not per token.



  1. Introducing MAI-Code-1-Flash — Microsoft announces a new coding model with average token usage on the release card. ↩︎

  2. The Unsustainable Subsidy — The era of AI subsidies is ending. ↩︎

  3. Tokenmaxxing — Models that game benchmarks with extra tokens are losing their edge. ↩︎

  4. Microsoft cancels Claude Code licenses, shifting developers to GitHub Copilot CLI — Microsoft cancelled Claude Code licenses across its Experiences and Devices division (Windows, Microsoft 365, Outlook, Teams, Surface) after engineering usage outran budgets. ↩︎

  5. Uber caps employee AI spending after blowing through budget in 4 months — Uber caps employee AI spending after blowing through budget in four months. ↩︎

  6. Salesforce Spends $300M on AI, Freezes Engineering Hires — Salesforce Spends $300M on AI, Freezes Engineering Hires. ↩︎

  7. AI Model & API Providers Analysis — Independent analysis of AI model costs. ↩︎

The Thriving Ecosystem of Open Models

2026-06-02 08:00:00

Competition is a discovery procedure. — Friedrich Hayek

And developers are discovering the value of open models.

OpenRouter offers a useful view into the model market.1 It is not the whole AI economy. But it is close to the API frontier, where developers can switch models quickly, compare price-performance daily, & route each request to the best available option.

Stacked chart of open versus closed model token share on OpenRouter

Since 2025, open models have grown sharply on OpenRouter. In the latest model-level snapshot, open-weight models generated 69.1% of named open-versus-closed token volume. Closed models produced 30.9%.

Open-model demand jumps with launches

New models attract developer attention & large scale testing, after which token use surges. Each new clustered release of different models sustains a new plateau of token volume.

Open-model leadership keeps changing hands

Just as in the closed-model ecosystem, the competition among open models means rapid innovation & leaderboard changes.

DeepSeek’s early lead gave way to MiniMax & Kimi models in late 2025 & early 2026. Later, launches from MiMo, Qwen, Alibaba’s open-weight model family, Hy3, Tencent’s open-weight model release, & DeepSeek reshuffled share again.

Arcee, a US lab focused, makes a strong appearance recently.

Open models still represent a fraction of overall inference, but the thriving competition, increasing usage, & surge of experimentation suggest developers are increasingly willing to route production traffic to them.


  1. Source data: OpenRouter rankings & usage data, analyzed from weekly token-volume snapshots in the OpenRouter analysis dataset. ↩︎

The AI Skepticism Map

2026-06-01 08:00:00

With Michael Burry 1 & Leopold Aschenbrenner 2 placing heavy short trades on AI, questions about GPU depreciation, & the Saaspocalypse, how negative is the financial market on AI?

We can look at the percentage of shares sold short, a bet the stock will decline.

AI shorts have edged higher

Across all software, semiconductor, neocloud, data center, & hyperscalers, the median short interest (short shares / total shares) has increased by about 24% in the last quarter.

AI cloud and neoclouds have the gloomiest sky

One segment stands out for gloomy skies in the cloud: the GPU data center businesses, whose shorted shares have grown 60% in the last year 3. AI cloud and neocloud companies have the highest current median short interest at 16.8% of float.

The negative sentiment for SaaS & Dev Tools is a more abrupt & recent phenomenon. Developer tools and infrastructure software follow at 9.5%. Enterprise SaaS and AI apps sit at 8.9%.

Hyperscalers are at the other end of the spectrum. Their median short interest is 1.1%. NVIDIA, the defining AI infrastructure stock, is also lightly shorted: 1.2%.

Enterprise AI apps saw the sharpest rise

Semiconductor stocks saw a decrease in short-selling. With memory makers like Micron up 742% this year 4, & many ecosystem CEOs pointing to memory & storage as the limiting factor, the newest trillion-dollar companies are all memory.

The stocks with the most actively bearish betters? Most of these are small or mid-cap companies. The updated chart below adds market capitalization to each company label. The largest AI winners are mostly absent.

SoundHound AI is 36.3% short. C3.ai is 32.2%. BigBear.ai is 29.4%. Applied Digital is 28.0%. UiPath is 22.0%. TeraWulf is 21.3%.

Small AI names dominate the short book

This is the market’s current AI skepticism map.

The skepticism is concentrated in companies whose AI exposure still depends on future capital access, future demand, or future operating leverage.

That distinction matters. If short interest were rising uniformly across AI semiconductors, hyperscalers, and software, the message would be broad fatigue with the AI trade. Instead, the data suggest a more specific view: memory has become critical & in short supply; software & devtools businesses need to prove their worth post-AI; & businesses reselling GPUs have more than their fair share of doubters about current prices versus long-term value.

Skill Distillation

2026-05-29 08:00:00

I’ve been using state-of-the-art models to teach small models running on my computer how I work.

My personal agent, based on Pi, runs my inbox, my deal pipeline, my blog publishing, my calendar, & my research. It looks less like a chatbot & more like a small operating system.

The Pi Agent architecture : QMD procedural memory, SKILL.md playbooks, & the agent loop with tools & MCP

The first layer is QMD, a local markdown knowledge base of about eighty workflow files in ~/memories. Before answering any procedural question, the agent searches QMD for the right playbook.

The second layer is Skills, atomic SKILL.md files that describe one job each. The skills are written by a frontier model. So are the evaluations that grade them. The same system writes, tests, and rewrites each skill until accuracy converges. It also checks recall against QMD, so the right keywords always surface the right skill.

The third layer is the Agent Loop, a model running Plan → Tool Call → Observe → Refine, calling out to seventeen Rust APIs & a handful of MCP integrations.

Skill distillation : a frontier model authors SKILL.md files that smaller local models execute

One of the techniques I’ve started to use is skill distillation. A frontier model, Opus 4.7, GPT-5.1, Gemini 3 Pro, authors & refines the skill files. A smaller model, Qwen 35B or Gemma 26B running locally, executes them. The teacher transfers procedural knowledge to the student through markdown. The skill is inspectable, versionable, & hot-swappable.

This is fundamentally different from classical knowledge distillation, which compresses a big model’s soft probability outputs into a smaller model’s weights. It’s different from instruction tuning, which bakes behavior into weights through prompt-response pairs. It’s different from RAG, which retrieves facts.

Skill distillation retrieves procedures. The smaller model doesn’t have to know how to evaluate a company. It just has to know how to follow the steps.

Every night a system runs through historical logs to understand what new skills should be generated, mirroring the loop that Pete Koomen described at Y Combinator earlier this week.

The frontier model becomes a teacher. The library becomes the company’s institutional knowledge. The student becomes whichever model happens to be cheapest this quarter.

Security in the Age of AI Agents: Office Hours with Jonathan Jaffe

2026-05-28 08:00:00

When security practitioners become engineers, the mission changes from managing people to architecting the automated policies that govern an agentic world.

Jonathan Jaffe, CISO at Lemonade, joined me on Office Hours to discuss what this means for how we build, secure, & operate AI systems when both sides are automated.

 

AI is just as powerful for defenders as it is for attackers. The fear narrative underestimates this fact. Defenders harden everywhere, simultaneously, because every vendor in the stack is also racing to ship.

“There are tens of thousands of attack targets out there. The chances that you’re going to be one of those is small. At the same time, all of the vendors that you use will also have access to this to improve their services.”

The window of exploitability is narrowing. Yes, AI will write more vulnerable code. But AI-written code also gets reviewed, pen-tested, & patched faster than any human pipeline. Plus, the total number of bugs within a particular piece of software is finite. As the velocity of solving or resolving bugs increases, software will become far more resilient.

Security teams are becoming engineering teams. At Lemonade, every security person is an engineer. They built their own AI platform with agents on top of it. One agent reads threat intel. Another checks whether the vulnerable method is actually called in production code.

“Automation is the only way you can deal with the scale of what’s coming at us now.”

Every agent needs an identity. On a single endpoint, we could be running 200 or 10,000 agents, but each one of them needs to be numbered and then governed by policy at the point of action.

“Every agent needs to have an identity, and more than that, you need a way to control policy for all of these agents in a much more complex way than current identity and access management systems do.”

Modern agentic security engineering is rapidly transforming, and we should expect to see significantly hardened systems as a result. It’s a bright future for security and security professionals.

I’m grateful to Jonathan for sharing his insights at Office Hours!