MoreRSS

site iconTomasz TunguzModify

I’m a venture capitalist since 2008. I was a PM on the Ads team at Google and worked at Appian before.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Tomasz Tunguz

Private Equity : The New Distribution Channel for AI Startups

2025-11-25 08:00:00

Private equity firms have emerged as the newest distribution channel for AI startups.

From Public to Private : The Reversal in Corporate Ownership

While public companies have decreased from 6,639 in 2000 to 3,550 in 2024, PE-owned companies in the US have grown from 1,950 to 14,300. The rate of growth continues to accelerate.

The crossover happened in 2009, when PE inventory overtook public company counts for the first time. By 2024, PE-backed companies outnumber public firms by roughly 4:1.

The shift is driven by the massive expansion of PE ownership across corporate America.

That’s not to say the sizes of PE-owned companies are the same as publics. In fact, they are smaller.

The point isn’t that startups previously focused on public companies. Rather, the data reveals the immense scale of private equity portfolios.

Metric Public Companies PE-Backed Companies
Revenue Growth (CAGR) 5.4% 7.0%
Typical Headcount >3,000 <500

Data Sources : CRSP1, Wilshire 50002, PitchBook3, American Investment Council4, Citizens Bank5.

The mid-market profile of these PE-owned companies suits AI startups’ desires for faster sales cycles.

Plus, the profit motive of private equity aligns perfectly with AI startups’ capacity to cut costs & drive efficiency. PE firms acquire companies to improve margins & operational performance before exit.

AI tools that reduce headcount, automate processes, or accelerate workflows deliver exactly what PE operating partners need.

A private equity firm owning 25 companies proves value in one or two before rolling out to the entire portfolio. Control enables rapid deployment. This creates an efficient channel for AI startups to demonstrate value & cross-sell.

PE firms gain operational leverage while AI startups access more than 14,000 motivated buyers. This new go-to-market motion redefines how AI software reaches the market, bypassing the traditional enterprise sales grind in favor of networks that can deploy at scale.


  1. CRSP Count™ tracks quarterly changes in publicly listed domestic operating companies. ↩︎

  2. Wilshire 5000 Total Market Index historical component counts. ↩︎

  3. PitchBook 2024 Annual US PE Breakdown provides private equity portfolio company statistics. ↩︎

  4. American Investment Council quarterly research reports on PE trends, employment & portfolio companies. ↩︎

  5. Citizens Bank PE survey data on private equity market composition. ↩︎

The Bacon &amp; the Skillet: When Does the AI Market Congeal?

2025-11-21 08:00:00

The AI market today is bacon in a hot skillet. Everything is sizzling, moving, & changing at an incredible pace. We’re all watching it closely.

Market share is fluid because no one yet knows what AI can do & the second we think have grasped it, models improve. The Nvidia chip performance & the launch of Gemini 3 the biggest gain ever in Google model performance suggest no simmering ahead.

As long as the underlying models hurtle towards PhD level performance, people will continue to test. How much better is Gemini 3 at coding? tool calling? writing?

If the progress is material, then the benefit of switching is worth the activation energy.

Activation energy diagram showing the effort required to switch between AI tools

Today, startups, incumbent software companies, cloud providers & AI labs all are competing. First the model, then infrastructure (memory & retrieval), then tools, then applications. Will the foundational models play at the application layer? Or will the applications differentiate themselves enough to overcome model differences?

Who can take advantage of the next big leap in model performance fastest? Which sales team can reach the target customers first & write the RFP?

This is the Great Game of Risk in Category Creation & aggression wins.

But this era of fluidity won’t last forever. The rate of improvement in AI models will eventually attenuate. When the performance gap between the best model & the second-best model shrinks, the incentive to switch evaporates.

Switching costs will start to matter more than marginal performance gains. The custom tools I’ve built, the muscle memory I’ve developed, the integrations my company has deployed, the enterprise contracts signed, all inertia.

At that point, the fat begins to congeal.

The winners will be those who use the sizzling phase to build fat worth congealing around.

This fun analogy came up during my conversation with Harry, Jason, & Rory.

The Scaling Wall Was A Mirage

2025-11-20 08:00:00

Two revelations this week have shaken the narrative in AI : Nvidia’s earnings & this tweet about Gemini.

Oriol Vinyals tweet about Gemini 3 scaling

The AI industry spent 2025 convinced that pre-training scaling laws had hit a wall. Models weren’t improving just from adding more compute during training.

Then Gemini 3 launched. The model has the same parameter count as Gemini 2.5, one trillion parameters, yet achieved massive performance improvements. It’s the first model to break 1500 Elo on LMArena & beat GPT-5.1 on 19 of 20 benchmarks.

Oriol Vinyals, VP of Research at Google DeepMind, credited improving pre-training & post-training for the gains. He continued that the delta between 2.5 & 3.0 is as big as Google has ever seen with no walls in sight.

This is the strongest evidence since o1 that pre-training scaling still works when algorithmic improvements meet better compute.

Second, Nvidia’s earnings call reinforced the demand.

We currently have visibility to $0.5 trillion in Blackwell and Rubin revenue from the start of this year through the end of calendar year 2026. By executing our annual product cadence and extending our performance leadership through full stack design, we believe NVIDIA will be the superior choice for the $3 trillion to $4 trillion in annual AI infrastructure build we estimate by the end of the decade.

The clouds are sold out and our GPU installed base, both new and previous generations, including Blackwell, Hopper and Ampere is fully utilized. Record Q3 data center revenue of $51 billion increased 66% year-over-year, a significant feat at our scale.

The infrastructure is accelerating headlong into hundreds of billions next year & Nvidia predicts it will be in the trillions, citing “$3 trillion to $4 trillion in data center by 2030”.

As Gavin Baker points out, Nvidia confirmed Blackwell Ultra delivers 5x faster training times than Hopper.

Gemini 3 proves the scaling laws are intact, so Blackwell’s extra power will translate directly into better model capabilities, not just cost efficiency.

Together, these two data points dismantle the scaling wall thesis.

What 375 AI Builders Actually Ship

2025-11-17 01:00:00

70% of production AI teams use open source models. 72.5% connect agents to databases, not chat interfaces. This is what 375 technical builders actually ship - & it looks nothing like Twitter AI.

350 out of 413 teams use open source models

70% of teams use open source models in some capacity. 48% describe their strategy as mostly open. 22% commit to only open. Just 11% stay purely proprietary.

Agents access deep systems: databases, web search, memory, file systems

Agents in the field are systems operators, not chat interfaces. We thought agents would mostly call APIs. Instead, 72.5% connect to databases. 61% to web search. 56% to memory systems & file systems. 47% to code interpreters.

The center of gravity is data & execution, not conversation. Sophisticated teams build MCPs to access their own internal systems (58%) & external APIs (54%).

85% use synthetic data for generating evals vs fine-tuning

Synthetic data powers evaluation more than training. 65% use synthetic data for eval generation versus 24% for fine-tuning. This points to a near-term surge in eval-data marketplaces, scenario libraries, & failure-mode corpora before synthetic training data scales up.

The timing reveals where the stack is heading. Teams need to verify correctness before they can scale production.

Automated methods for improving context: prompt optimization, ablations, manual

88% use automated methods for improving context. Yet it remains the #1 pain point in deploying AI products. This gap between tooling adoption & problem resolution points to a fundamental challenge.

The tools exist. The problem is harder than better retrieval or smarter chunking can solve.

Teams need systems that verify correctness before they can scale production. The tools exist. The problem is harder than better retrieval can solve.

Context remains the true challenge & the biggest opportunity for the next generation of AI infrastructure.

Explore the full interactive dataset here or read Lauren’s complete analysis.

Teaching Local Models to Call Tools Like Claude

2025-11-14 01:00:00

Ten months ago, DeepSeek collapsed AI training costs by 90% using distillation - transferring knowledge from larger models to smaller ones at a fraction of the cost.

Distillation works like a tutor training a student : a large model teaches a smaller one.1 As we’ve shifted from knowledge retrieval to agentic systems, we wondered if there was a parallel technique for tool calling.2

Could a large model teach a smaller one to call the right tools?

The answer is yes, or at least yes in our case. Here’s our current effort :

Screenshot 2025-11-13 at 11.28.01 AM

Every time we used Claude Code, we logged the session - our query, available tools, & which tools Claude chose. These logs became training examples showing the local model what good tool calling looks like.

We wanted to choose the right data so we used algorithms to cherry-pick. We used SemDeDup3 & CaR4, algorithms to find the data examples that lead to better results.

Claude Code fired up our local model powered by GPT-OSS 20b5 & peppered it with the queries. Claude graded GPT on which tools it calls.

Claude’s assessments were fed into a prompt-optimization system with DSPy6 & GEPA7. All of that data was then fed to improve the prompt. DSPy searches for existing examples that could improve the prompt, while GEPA mutates or tests different mutations.

Combined, we improved from a 12% Claude match rate to 93% in three iterations by increasing the data volume to cover different scenarios :

Optimizer Training Examples % of Claude
DSPy Phase 1 50 12%
GEPA Phase 2 50 84%
GEPA Phase 3 15 (curated) 93%

DSPy improved accuracy from 0% to 12%, and GEPA pushed it much higher, all the way to 93%, after three phases. The local model now matches Claude’s tool call chain in 93% of cases.

Make no mistake : matching Claude 93% doesn’t mean 93% accuracy. When we benchmarked Claude itself, it only produced consistent results about 50% of the time. This is non-determinism at work.

This proof of concept works for a small set of tools written in the code mode fashion. It suggests there is a potential for tool calling distillation.

If you’ve tried something similar, I’d love to hear from you.


  1. A Survey on Knowledge Distillation of Large Language Models - Xu et al. (2024) examine knowledge distillation as a methodology for transferring capabilities from proprietary LLMs like GPT-4 to open-source models like LLaMA & Mistral. The survey covers applications in model compression, efficient deployment, & resource-constrained environments, providing a comprehensive overview of distillation techniques for modern language models. ↩︎

  2. ODIA: Oriented Distillation for Inline Acceleration of LLM-based Function Calling - Recent research on distilling function calling capabilities from larger models to smaller ones. ODIA leverages online user interaction data to accelerate function calling, reducing response latency by 45% (expected) & 78% (median) while maintaining accuracy. The method successfully handled 60% of traffic with negligible accuracy loss in production deployment. ↩︎

  3. SemDeDup: Data-efficient learning at web-scale through semantic deduplication - Abbas et al. (2023) present a method that uses embeddings from pre-trained models to identify & remove semantic duplicates from training data. Analyzing LAION, they showed that removing 50% of semantically similar data resulted in minimal performance loss while effectively halving training time, with additional out-of-distribution performance improvements. ↩︎

  4. CaR (Cluster and Retrieve) - A data selection technique that clusters similar training examples & retrieves the most representative ones to improve model performance. This method reduces redundancy in training data while preserving diversity, leading to more efficient learning. ↩︎

  5. This model is sandboxed. It reads production data but doesn’t write for safety. ↩︎

  6. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines - Khattab et al. (2024) introduce DSPy, a framework that programmatically creates & refines prompts through optimization strategies that systematically simulate instruction variations & generate few-shot examples. Research across multiple use cases showed DSPy can improve task accuracy substantially, with prompt evaluation tasks rising from 46.2% to 64.0% accuracy through bootstrap learning & teleprompter algorithms. ↩︎

  7. GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning - Agrawal et al. (2025) present GEPA, a reflective prompt optimizer that merges textual reflection with multi-objective evolutionary search. GEPA outperforms GRPO by 10% on average (up to 20%) while using up to 35x fewer rollouts. It surpasses the previous state-of-the-art prompt optimizer MIPROv2 on every benchmark, obtaining aggregate optimization gains of +14% compared to MIPROv2’s +7%. The system iteratively mutates prompts based on natural language feedback from execution traces. ↩︎

Running Out of AI

2025-11-12 08:00:00

By Monday lunch, I had burned through my Claude code credits. I’d been warned ; damn the budget, full prompting ahead.

Screenshot 2025-11-12 at 8.25.37 AM
I typed ultrathink to solve a particularly challenging coding problem, knowing the rainbow colors of the word was playing with digital fire.
Screenshot 2025-11-12 at 8.26.41 AM

When that still couldn’t solve the issue, I summoned Opus, the biggest & most expensive model, to solve it.

Now two days on, I’ve needed to figure out alternatives. Do I :

  • Switch to API billing (how much will that cost?)
  • Try another vendor? Gemini’s model is great, but ageing ; at nearly 8 months old, its a capable jalopy. Cursor’s free coding model Composer 1 sprints at problems with aplomb but a bit overwhelmed at times. Codex, the plodding giant is brilliant at large-scale technical challenges.
  • Create another Max subscription & switch between them? Can I ask AI to write a script to save me the hassle of changing my identity?
  • Stand-up GPT-OSS to run locally? A little bit more latency but potent & twice as fast on llama.cpp compared to Ollama.
  • Go back to writing code the old way? The hedonic treadmill moves quickly. I tried to return to the old ways, but it was painful. I’ve already forgotten where the blog server script is. Claude? Do you remember? Claude?

I’m working through the math of which option will cost more. How much is the Max plan subsidized? Will knowing the true API cost of my Claude Code usage increase my willingness to pay?

Switching between tools incurs costs. The tools, the workflow, the prompts that I’ve optimized for Claude code must all be ported (at my expense!) to other tools.

As the capabilities of these models begin to plateau, the costs to shift increase. So does my willingness to pay for Claude to answer me.