2026-05-15 08:00:00
The fastest-growing companies in AI & software are either selling AI directly or reselling inference. At worst, they are the first derivative of inference.
Inference is the largest & fastest growing market in technology today, surpassing the database market & projected to be three times the size within seven years at $250 billion.1, 2 By selling inference or indexing a business to it, they grow at spectacular rates.
Anthropic has booked $9b & $10b in consecutive months.3 Google Cloud is growing 63% at an $80 billion run rate.4 Most businesses selling inference are exploding.
For public software & infrastructure companies that predate AI, there are two standouts so far : Twilio & Datadog.
Both of these companies are benefiting as the first derivatives of inference. They don’t sell inference primarily, but anyone building AI systems needs to understand how they perform, & agentic companies with voice use Twilio.
“The number of spans sent to our LLM Observability product nearly tripled quarter-over-quarter.” — Olivier Pomel, CEO, Datadog Q1 2026 Earnings Call
As a result of AI growing so spectacularly, there are huge power law dynamics.
“We now have over 6,500 customers sending data for one or more of our AI integrations. Though this is only 20% of total customers, they represent about 80% of our ARR.” — Olivier Pomel, CEO, Datadog Q1 2026 Earnings Call 5
This is also true for another element of core infrastructure, voice & SMS via telephone.
“Voice reimagined through the lens of AI is increasingly an entry point to the Twilio platform for AI natives & enterprises alike.” — Khozema Shipchandler, CEO, Twilio Q1 2026 Earnings Call 6
A few customers can drive tremendous gains. This level of concentration is characteristic of the current cycle.7
For any pre-AI company, the key question must be put at the board level : how do we either resell inference or benefit from our customers buying huge volumes of it?
That’s the only way out of the Saaspocalypse.
2026-05-14 08:00:00
In yesterday’s post (which an agent pushed in raw outline form via email!), I wrote about the future of AI email. What does that future cost?
If you are using state-of-the-art model ranging, it costs between $22 to $130 per month. Would you pay for that? At work, I imagine, many would. Let’s take the middle case of $26/month raw cost.
A software company seeking 75% gross margin would charge about $350 per year for that product excluding hosting & serving costs. So let’s call it a $500 per year list with a 15% discount at scale.
A Google Enterprise plan is $11-18/month. A fully agentic solution would then cost about twice as much.
Smaller models help. They cut cost by 10 to 20x, but we can do better.
By running the models locally, when the cost plummets to zero : users’ GPU does the work.
It’s this type of cost optimization that I have done crudely here that I think will define the next 12 to 24 months of AI software : determining which components can be executed deterministically, like the email filters, which are just rules. And the next is matching the model to the workload.
With some basic heuristics and techniques we can drop the overall cost by 100x. Given the tremendous shortage of GPUs, this segmentation of inference is inevitable.
2026-05-13 08:00:00
Nobody will open Gmail five times a day in five years.
The average knowledge worker receives 121 emails per day.1 That’s one every four minutes during working hours.
The inbox is a conveyor belt that keeps accelerating. You open Gmail. You read. You decide. You respond. One at a time. But the belt doesn’t wait. It just moves faster.
Today’s triage is generic : “This is from your boss. I need to work on that today. Next. Spam. Archive. Spam. Archive. Newsletter, read & archive.” Tomorrow’s is personal. User-defined skills & rules. Programming in English that encodes your priorities, your relationships, your workflow.
A receipt arrives & forwards itself to the expense platform before you see it. An inbound lead hits the CRM, gets scored, & a draft proposal waits in your outbox. The workflow starts the moment the email lands.
Then there’s the archive. Years of context about every relationship, commitment, & decision you’ve made. That history becomes a personal context layer that informs how your AI handles the next message. On-device models process sensitive messages privately.2
The inbox disappears. What remains are the 6 messages that actually matter.
2026-05-12 08:00:00
It’s time for the 2026 Annual Theory Go-to-Market Survey. This is a brief 25-question survey.
Our goal is to understand how startups have evolved their sales, marketing, customer success, and cash management over the last several years by comparing these results to our surveys from 2022 through 2025.
We will publish these results and answer questions about them at upcoming Office Hours.
This year, we’re focused on five key hypotheses. Each is designed to be rigorously testable with the survey data:
Augmented reps outperform both autonomous AI and unaugmented humans. As AI tools mature, companies face a choice: deploy AI alongside SDR teams, replace them with autonomous tools, or forego AI entirely. We expect augmented teams, humans plus AI, will show the best conversion rates and productivity gains.
AI is widening the performance gap between top and bottom quartile GTM teams. The companies investing most heavily in AI may be pulling ahead, widening the efficiency, growth, and conversion gap between quartile-one and quartile-four sellers.
Buyer-side AI adoption is the bigger GTM disruptor than seller-side AI. While most companies focus AI investment on their own GTM teams, an emerging dynamic is buyers using AI themselves: automated RFPs, AI-assisted evaluations, and even AI negotiation. We expect this shift will lengthen sales cycles and create new objections.
AI efficiency gains are being captured as headcount reduction, not revenue growth. When companies report “AI productivity” gains, the primary result may be flatter SDR hiring, not faster pipeline or higher revenue. The headcount flattening signal, not cost savings, is the real story.
Founder expectations on AI have reset downward as reality caught up. In 2024, the most optimistic respondents expected 500% efficiency gains from AI and recorded 0%. We expect this perceived-measured gap has narrowed as companies recalibrate what AI can realistically deliver.
With this data, we should be able to draw broader conclusions about the continued shift from growth to efficiency, measure the real impact of AI on GTM teams, and understand the emerging dynamics of AI-to-AI selling.
If you complete the survey, I will share with you the anonymized raw data so you can perform your own analyses. If you have questions, just message me on Twitter or send me an email.
2026-05-11 08:00:00
As demand for AI inference explodes, I’ll be asking a lot more of my little computer.
How much more?
Over the past five weeks, I’ve been using local models to see how much of my daily work I can accomplish without the trillion parameter models in the cloud. The answer is half.
| Category | Count | % of Total | Example |
|---|---|---|---|
| Other | 521 | 35.3% | Catch-all for unstructured requests |
| Scheduling | 254 | 17.2% | Check availability, propose meeting times |
| Market Research | 192 | 13.0% | Competitor analysis, fundraising data |
| Summarization | 184 | 12.4% | Transcript review, video summaries |
| Email & Inbound | 170 | 11.5% | Draft replies, follow-ups, forwards |
| Engineering | 147 | 9.9% | Debug scripts, API fixes, CLI tasks |
| Admin | 10 | 0.7% | Travel, expenses, reimbursements |
If you classify these 1.4k tasks by category, half can succeed on a local 35B model. Email & Inbound, Scheduling, Summarization, & Admin total 618 tasks (41.8%). Market Research & Engineering split roughly 50/50 between simple tasks (data lookups, script fixes) and complex ones (multi-source synthesis, architectural decisions). That gets us to 50%.
There are many reasons to use local models : privacy, cost, asset depreciation.1
But in reality, the only one that really matters is latency.
I ran a head-to-head benchmark this morning. Eight agentic tasks, same prompts, both models warmed. Qwen 3.6 35B-A3B-4bit on my MacBook Pro M5 vs Claude Opus 4.5 via API.
The local model isn’t smarter. Opus 4.5 scores ~20% higher on reasoning benchmarks. Local models lag frontier by 3-4 months, and for large-scale complex tasks, that gap matters. But for routine agent tasks, it rarely does.
Opus wins on structure & polish : bullet points, headers, cleaner code. Qwen wins on brevity, often half the tokens. I read every output side by side, and both completed the tasks correctly. For agent tasks where output feeds into another system, terseness is a feature.
Localmaxxing, pushing more inference to local models, is an inevitable response to tokenmaxxing. As local models improve & close the gap with frontier, more users will shift workloads to their own hardware.
If half the work runs 2x faster on my laptop, I’ll take that trade every time. My little computer is about to earn its keep.
A MacBook Pro depreciates whether you use it or not. Running local inference extracts compute value from a sinking asset before resale. ↩︎
2026-05-08 08:00:00
Enterprises run on AI agents. So do the attackers.
What does it mean to build, secure, and operate AI systems when both sides - defenders and attackers - are automated?
Jonathan Jaffe, CISO at Lemonade, is one of the most forward-thinking security leaders in this age of AI, with more than 25 years of experience in technical security roles since 1997.
Join us for a conversation covering :
The format. 15 minutes online. One topic. Call-in questions live. No slides. No pre-written questions. Just a real conversation.
Register here for Office Hours : Securing the Agentic Enterprise with Jonathan Jaffe.