MoreRSS

site iconTomasz TunguzModify

I’m a venture capitalist since 2008. I was a PM on the Ads team at Google and worked at Appian before.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Tomasz Tunguz

A Narrative Violation on a Friday Morning

2025-08-22 08:00:00

I set out to write a very different post : one with the thesis that venture capital holding periods were lengthening relative to private equity.

But the data violates the narrative!

vc_pe_timeseries_v2

Conor Quigley at PitchBook ran an analysis on Theory’s behalf to answer the question: How many years from the first funding round to exit on average between PE & VC?1

The answer is it’s about the same : 5 years.

There’s no doubt that the exit markets have been extremely quiet until the beginning of the summer of this year. But there is no yawning gap in holding periods between private equity & venture capital.

There is survivorship bias in the data. We are only measuring companies that have exited, not the backlog or the average age of the backlog.

vc_pe_exit_counts

However, looking at the exit counts, we see relatively consistent volumes year over year. This suggests that the data should not be hugely skewed.

Venture capital generates roughly 10 times more exits annually than PE in the software sector. VC exits peaked at 638 companies in 2021, while PE reached just 79 exits that same year. Aside from a big spike in 2021 for both asset classes, the deal volumes have been relatively consistent.

The secondary market in venture capital should continue to grow, but not because VC holding periods are longer than PE. Instead, it should grow because VC generates 10 times more exits annually, creating more liquidity opportunities.

Scale, not latency, drives secondary market potential.


  1. Exit counts are limited to companies with a First VC or PE Deal Date. Only M&A & IPO exits are included in the data. This exclusively covers US-based companies tagged to the Software Industry Group. Data as of August 20th, 2025. ↩︎

Achieving Flow with AI

2025-08-21 08:00:00

Flow State Event

On September 4 at 5-9pm PDT in Berkeley, Hamel Husain will be leading a conversation featuring Claire Vo, Greg Ceccarelli, and me talking about how to achieve Mihaly Csikszentmihalyi’s flow state with AI.

Hamel Husain is a machine learning engineer with over 20 years of experience. He has worked with companies such as Airbnb & GitHub, which included early LLM research used by OpenAI for code understanding. He has also led & contributed to numerous popular open-source machine-learning tools & is currently an independent consultant helping companies build AI products.

Claire Vo is the founder of ChatPRD & host of the “How I AI” podcast. As a 3x Chief Product & Technology Officer, she’s run scaled product & engineering teams at companies like LaunchDarkly, Color Health, & Optimizely. She’s known for using AI agents to build features end-to-end & has created a six-figure AI-powered side project while running LaunchDarkly’s product & engineering organizations.

Greg Ceccarelli is the co-founder & CPO of SpecStory. Previously he led product at Pluralsight & ran data science at GitHub, with earlier roles at Google & Dropbox. His experiments are wide & far ranging, from shipping mini products like tny.dev, to creating high quality product marketing videos with AI, as well as his day job of building SpecStory.

If you’re curious about flow state in AI, please register here.

When One AI Grades Another's Work

2025-08-19 08:00:00

Since launching EvoBlog internally, I’ve wanted to improve it. One way of doing this is having an LLM judge the best posts rather than a static scoring system.

I appointed Gemini 2.5 to be that judge. This post is a result.

The initial system relied on a fixed scoring algorithm. It counted words, checked readability scores, & applied rigid style guidelines, which worked for basic quality control but missed the nuanced aspects of good writing.

What makes one paragraph flow better than another? How do you measure authentic voice versus formulaic content?

EvoBlog now takes a different approach. Instead of static rules, an LLM evaluator scores each attempt across five dimensions: structure flow, opening hook, conclusion impact, data integration, & voice authenticity. image

The theory is that magic happens in the iterative refinement cycle.

After each generation round, the system analyzes what worked & what didn’t. Did the opening hook score poorly? The next iteration emphasizes stronger first paragraphs. Was data integration weak?

evo_blog_best_run_detail

The LLM judge experiment yielded mixed results. The chart shows swings in performance across 20 iterations, with no clear convergence pattern. The best run achieved 81.7% similarity to my writing style, a 3.1 percentage point improvement over the initial 78.6%.

But the final iteration scored 75.4%, actually worse than where it started.

The LLM as judge sounds like a good idea. But the non-deterministic nature of the generation & the grading doesn’t produce great results.

Plus it’s expensive. Each 20 iteration run requires about 60 LLM calls or about $1 per post. So, maybe not that expensive!

But for now, the AI judge isn’t all that effective. The verdict is in: AI judges need more training before they’re ready for court.

Explore vs. Exploit in Agentic Coding

2025-08-18 08:00:00

AI coding assistants like Cursor & Replit have rewritten the rules of software distribution almost overnight.

But how do companies like these manage margins? Power users looking to manage as many agents as possible may find themselves at odds with their coding agent providers.

Let’s create a hypothetical million user AI coding company & play around with some numbers.

user_revenue_percentage_chart

Let’s assume this company has four pricing plans: $20 per month, $50 per month, $500 per month, & $1,500 per month. We assume a 1% conversion rate for the first two plans, a 0.5% conversion rate for the $500 per month pricing plan, & 0.1% for the $1,500 plan.1

The revenue concentration is dramatic. While the $20 & $50 tiers capture 77% of paying users, they generate just 15% of total revenue. The enterprise tiers drive 85% of revenue from only 23% of users. The $1,500 Ultimate tier alone generates nearly 32% of all revenue from just 3.8% of users.

So the majority of the revenue will be at the enterprise, but where will the margin come from?

The reality is there are plenty of pathways to increase margin:

  • Caching helps tremendously with better memory management on stable codebases meaning higher cache hit rates & dramatically lower query costs. The more stable the codebase, the greater the cache hit rate
  • Microsoft is reporting 90% more tokens per GPU, showing infrastructure efficiency gains are real & accelerating
  • Local coding models for smaller tasks can run on-device, reducing cloud inference costs entirely
  • Bring Your Own Cloud arrangements, where enterprises use their prepurchased cloud credits, shift inference costs off the vendor’s balance sheet entirely & increase margins for those deployments to well north of 90%, depending on the customer success costs
  • Rate limit users to manage outlier usage & maintain predictable unit economics

Today, the most valuable asset is distribution. Venture capital is willing to subsidize that distribution, & over time that distribution will generate profits.

At the point where the companies shift from penetration to maximization, they will need to decide whether the cost of customer acquisition at the lower part of the market is a continued strategic marketing cost or simply too expensive on a margin basis to bear.

The companies that master this transition will define the next decade of software development. Those that don’t will become cautionary tales of the great AI coding economics reckoning.


  1. It is very likely that the conversion rates for these kinds of products from free to paid are significantly higher than those that we found in our go-to-market survey of 2-4% unassisted conversion, but let’s be conservative for now. ↩︎

The Groupon Era of AI

2025-08-15 08:00:00

Groupon grew from a staff of a few dozen to over 350. Revenue and bookings also grew swiftly, and the company was valued at over $1 billion after just 16 months in business, the fastest company ever to reach this milestone.[20]

Wikipedia

A little-known Midwest company called The Point became the fastest-growing company ever up until that age.

By developing a collective coupon, the idea that if a certain number of people agreed to buy a product or a service, all of them would receive a discount, the Point surged on explosive viral growth.

One problem with the Groupon model: Anyone can replicate it. More than 200 copycat sites have sprung up in the U.S., with another 500 overseas, including 100 in China.

Forbes

Fast growth brought with it tremendous competition. At the time, U.S. press reported there were a thousand Chinese copycats in China & at least 200 within the U.S.

The AI gold rush is on.

It’s a gold rush with Groupon-like characteristics.

Just as daily deals transformed commerce overnight, AI has fundamentally altered how companies acquire customers. Consumers’ insatiable curiosity about AI’s transformational change drives huge volumes of avid buyers with wallets in hand.

In the Groupon era, the cost to try many of these solutions was quite small, & the initial retention could be quite low, particularly in product-led growth companies. Consumers flitted to whichever platform had the best deal.

Likewise in AI. Who has the best model? Let’s try that. Who has the better workflow engine? Let’s try that. The differences with each new iteration are significant enough to try before settling.

As the performance improvements plateau, however, this rabid desire to try newness will attenuate.

But don’t mistake this dissipation of interest for value destruction.

There is lasting value within distribution. The ability to cross-sell to a massive audience acquired inexpensively is a fundamental & sustaining competitive advantage, both in consumer & software.

Fifteen years after the Groupon era, Groupon is worth about $2 billion, but within China, where the competition was more intense, a massive conglomerate called Mei Tuan is now worth $97 billion, converting the initial distribution into a conglomerate. The others? Time has not been kind enough to remember their names.

Decrying this wave of innovation & its inevitable transformation undervalues its disruptive force.

These singular platform shifts can produce massive businesses. They can produce many companies who are unable to convert initial distribution into longer-term sustainable businesses.

The key is finding great leaders who can both manage the wave, acquire large audiences, & iterate to sustainable businesses.

EvoBlog: Building an Evolutionary AI Content Generation System

2025-08-14 08:00:00

One of the hardest mental models to break is how disposable AI generated content is.

When asking me to generate one blog post, why not just ask it to generate three, pick the best, use that as a prompt to generate three more, and repeat until you have a polished piece of content?

This is the core idea behind EvoBlog, an evolutionary AI content generation system that leverages multiple large language models (LLMs) to produce high-quality blog posts in a fraction of the time it would take using traditional methods.

The post below was generated using EvoBlog in which the system explains itself.

– Imagine a world where generating a polished, insightful blog post takes less time than brewing a cup of coffee. This isn’t science fiction. We’re building that future today with EvoBlog.

Our approach leverages an evolutionary, multi-model system for blog post generation, inspired by frameworks like EvoGit, which demonstrates how AI agents can collaborate autonomously through version control to evolve code. EvoBlog applies similar principles to content creation, treating blog post development as an evolutionary process with multiple AI agents competing to produce the best content.

The process begins by prompting multiple large language models (LLMs) in parallel. We currently use Claude Sonnet 4, GPT-4.1, and Gemini 2.5 Pro - the latest generation of frontier models. Each model receives the same core prompt but generates distinct variations of the blog post. This parallel approach offers several key benefits.

First, it drastically reduces generation time. Instead of waiting for a single model to iterate, we receive multiple drafts simultaneously. We’ve observed sub-3-minute generation times in our tests, compared to traditional sequential approaches that can take 15-20 minutes.

Second, parallel generation fosters diversity. Each LLM has its own strengths and biases. Claude Sonnet 4 excels at structured reasoning and technical analysis. GPT-4.1 brings exceptional coding capabilities and instruction following. Gemini 2.5 Pro offers advanced thinking and long-context understanding. This inherent variety leads to a broader range of perspectives and writing styles in the initial drafts.

Next comes the evaluation phase. We employ a unique approach here, using guidelines similar to those used by AP English teachers. This ensures the quality of the writing is held to a high standard, focusing on clarity, grammar, and argumentation. Our evaluation system scores posts on four dimensions: grammatical correctness (25%), argument strength (35%), style matching (25%), and cliché absence (15%).

The system automatically flags posts scoring B+ or better (87%+) as “ready to ship,” mimicking real editorial standards. This evaluation process draws inspiration from how human editors assess content quality, but operates at machine speed across all generated variations.

The highest-scoring draft then enters a refinement cycle. The chosen LLM further iterates on its output, incorporating feedback and addressing any weaknesses identified during evaluation. This iterative process is reminiscent of how startups themselves operate - rapid prototyping, feedback loops, and constant improvement are all key to success in both blog post generation and building a company.

A critical innovation is our data verification layer. Unlike traditional AI content generators that often hallucinate statistics, EvoBlog includes explicit instructions against fabricating data points. When models need supporting data, they indicate “[NEEDS DATA: description]” markers that trigger fact-checking workflows. This addresses one of the biggest reliability issues in AI-generated content.

This multi-model approach introduces interesting cost trade-offs. While leveraging multiple LLMs increases upfront costs (typically $0.10-0.15 per complete generation), the time savings and quality improvements lead to substantial long-term efficiency gains. Consider the opportunity cost of a founder spending hours writing a single blog post versus focusing on product development or fundraising.

The architecture draws from evolutionary computation principles, where multiple “mutations” (model variations) compete in a fitness landscape (evaluation scores), with successful adaptations (high-scoring posts) surviving to the next generation (refinement cycle). This mirrors natural selection but operates in content space rather than biological systems.

Our evolutionary, multi-model approach takes this concept further, optimizing for both speed and quality while maintaining reliability through systematic verification.

Looking forward, this evolutionary framework could extend beyond blog posts to other content types - marketing copy, technical documentation, research synthesis, or even code generation as demonstrated by EvoGit’s autonomous programming agents. The core principles of parallel generation, systematic evaluation, and iterative refinement apply broadly to any creative or analytical task.