2024-09-26 22:23:25
In my last post I talked through the big stack game of poker that is happening with the LLM players. So how does the game play out? There is a sentiment you hear in the valley that these players are doomed to razor thin margins, and we all should be grateful for their hard work and sacrifice. I'll take the other side, and long term this is an enormous risk for a whole class of startups being funded.
It seems inevitable that as the underlying foundation models become more powerful, the LLM players will seek to justify the enormous investment that has gone into training their models by moving "up the stack", and evolve from an API or chat interface, to async agents.
I can't help but think back to Twitter's early incarnation as a platform, which then gradually competed with its platform developers. Right now, OpenAI/Anthropic/et al have an API consumable by almost anyone, but it's not hard to imagine a world in which they begin to compete with some of their API developers. I'd guess the async coding agents are most vulnerable to this potential in the near term given the seemingly unbounded economic value of owning this use case, and the already existing product market fit LLMs have found with coding.
But this will extend beyond coding. The most AI-optimists amongst us (Leopold Aschenbrenner does an excellent job articulating this view) believe that as the underlying LLMs get more powerful, they will get to a place where they can power “drop in” async AI workers that will act like super intelligent remote employees with code-creating superpowers. In this view, the AI workers obviate the need for most AI application software.
As an example, imagine a future enterprise buying decision: Why buy a specialized AI application that lets you automate internal IT ticket response, when the foundation model companies offer an AI agent that if you point it in the right direction with a job spec, will read your knowledge base, build its own integrations to connect to your existing systems of record (e.g., Jira), and then handle all the internal requests automatically?
Some might laugh at this scenario, but I’d suggest that if you are a B2B founder building an AI-native application, you NEED to do the thought experiment of assuming it over the next 3-5 years as you consider the strategy for your company. Not just because of the risk of this scenario happening, but because any progress down the path of the scenario will meaningfully increase competition for your company (as I describe in #3 below). So how do you future proof your B2B AI application layer company?
The best answer I have, in the face of this fast-changing future, is three-fold:
A network effect. If you've got one, run like hell to get the flywheel to tip. And email or DM me :). By the way, the last investment I led was because of a cold email from a founder who read one of my posts, so I promise you: it works.
Capture some proprietary data or hard to access data, either that you’ve accrued as you grow, or that you have access to through some other means. This forms a moat.
Execute like hell and land grab in an overlooked vertical. The foundation model companies will inevitably focus on the big markets (e.g., coding, as discussed). But outside of that, it’s hard to imagine the foundation models ever develop a GTM and packaged offering to go after the smaller (but still large!) verticals, which do require more care and packaging for a less sophisticated customer. So if you are going after these other verticals, assume it will be more symmetrical warfare with other focused startups. The difference is that as the underlying LLMs continue to improve, it will become a lot easier for other startups to compete. Imagine what a “LLM Wrapper” startup can accomplish now vs two years from now. So you have to assume more startup competition and more homegrown competition. For example, eventually, it might just take one employee to decide to train an Anthropic agent to compete with the offering you took years to get right. Being obsessively customer focused is always critical. If anything, that obsession will lead you to finding more workflows that you automate faster — which means you’ll add more value out of the box than anything else. That might just be enough to hang your hat on.
2024-08-20 22:35:14
I’m sure you read David Cahn’s provocative piece "AI's $600B Question", in which he argues that, given NVDIA’s projected Q4 2024 revenue run rate of $150B, the amount of AI revenue required to payback the enormous investment being made to train and run large language models is now $600B, and we are at least $500B in the hole on that payback. The numbers are certainly staggering… and are just going to get bigger. Until we reach an efficient frontier of the marginal value of adding more compute, or we hit some other roadblock that causes people to lose faith in the current architecture, this is a contest now of “not blinking first”. If you’re a big stack player like META, MSFT, GOOG, or any of the foundation model pure plays, you have no choice but to keep raising your bet — the prize and power of “winning” is too great. If you blink, you are left empty handed, watching someone else count your chips. It’s likely hundreds of billions will be destroyed, and trillions earned. It’s too early to know who the winner or losers are. But for all of us in the startup ecosystem, among many things, it’s going to create new waves of AI opportunities.
Taking a step back, as LLMs progress, they are able to handle more complicated tasks. If today LLMs can handle tasks that would have taken a human thirty minutes to complete, as LLMs progress, they'll be able to handle increasingly complicated tasks that would have taken a human more time. In the next decade, they should be able to handle tasks that would take years for a human to do. Therefore, as the LLMs become more and more sophisticated, the economic value that they will be able to unlock becomes greater and greater.
For example, annually, it is estimated that we spend $1T on software engineers globally. When people talk about GitHub Copilot, you hear people throw around numbers like 10-20% productivity improvements (of course, GitHub claims higher). That translates to $100-200B of value annually were it to be fully deployed (of which GitHub would capture some percentage).
As LLMs progress and are able to go beyond code completion ("copilot") to code authoring ("autopilot"), there is almost no limit in value creation as it would dramatically expand the market – a potential multi-trillion dollar opportunity if someone emerges a dominant player. And that's just coding. We've all experienced the productivity-improving benefits of LLMs (or been on the receiving end of an automated customer support response). The potential value creation and capture with AI is beyond our existing mental models.
The challenge is the amount of capital required to train each successively more sophisticated LLM increases by an order of magnitude, and once a model is leapfrogged by another, the pricing power of the older model quickly falls to zero. There are now more GPT3.5 equivalents for a developer to choose from than would make sense for them to test. Not surprisingly, when GPT3.5 launched in November 2022 it was head and shoulders ahead of any competitive model and cost $0.0200 for 1000 tokens. It's $0.0005 now – 2.5% of its original pricing in just 1.5 years. I can’t remember another technology that has commoditized as quickly as LLMs. It’s a dynamic that makes it almost impossible to rationalize any ROI at this stage in the game because any investment in a LLM is almost instantly depreciated by the next version. But you can’t really skip a step. You need to go through countless worthless versions to get to the ultimate (the idealized “AGI”).
So you have a bit of a perfect storm:
The economic value you are able to unlock as models become more sophisticated should increase significantly with each upgrade of the model. The economic value of AGI is constrained only by our imaginations.
Pricing leverage comes from being a step function ahead of the competition, at least along some dimension. If you fall behind, the value of your model to external customers gets rapidly commoditized (of course, there is still value for your internal use cases).
MSFT, GOOG, and META have core businesses that produce fire hydrants of cash, Anthropic has found love with GOOG and AMZN, and OpenAI should continue to be able to raise money from sovereigns that have their own (more physical) fire hydrants of cash.
The net result is that in the short term, until an efficient frontier is reached on the marginal value of continuing to invest in infrastructure with the existing transformer architecture, or we run out of electricity, or a group pulls ahead with an untouchable lead thanks to some smart algorithmic work, investment in this space by these giants should continue to increase dramatically, and costs necessarily precede revenue. The prize is theoretically so large, and if a clear winner emerges, their market opportunity so uncapped, you have to keep increasing your bet.
We all are massive beneficiaries of this battle playing out. The extreme pace of investment in infrastructure / training / etc, combined with the urgency that only comes from intense competition, is giving us all the gift of an insane pace of innovation with models that are able to handle increasingly complicated tasks at bargain basement prices. Applications that might not be possible today, let alone economic (such as most voice and video applications), will be profitable before we know it. Giddy up!
2024-04-19 23:19:23
One of the fascinating things about what's happening in AI is that, rather than be a few distinct moments of technological disruption that unlocks new opportunities for startups (e.g., when Apple launched the AppStore, or integrated a GPS chip into the iPhone), I believe we're going to have a rolling thunder of AI breakthroughs that catalyze startup opportunities.
Yes, it's certainly true that as the foundation models progress from 3 to 4 to 5, etc., we will mark time in retrospect with these milestones and how each step-function improvement unlocked increasingly complicated tasks that can be automated by LLMs. What feels different here is that it’s also true that single research papers will unlock new opportunities.
To take two recent examples:
What preceded ElevenLabs? https://arxiv.org/abs/2305.07243
What preceded Krea.ai? https://latent-consistency-models.github.io
The combination of both broad-based (foundation model upgrades) and narrow (research breakthroughs) step-function changes will continue to unlock brand new AI opportunities.
As Ben wrote in his comment on my last post:
One concept I like is that while the raw capacity of something like an LLM is increasing continuously over time, there's a hard threshold at which it crosses from being [not at all useful] to [useful] for a given application. Until we get true human-level AI-generated audio, ElevenLabs is impossible...but the second we do, it's a 10x improvement. Feels like part of the reason it's harder to spot these opportunities in advance.
So if you are a founder worried you’ve missed the window, don’t. It’s a land grab right now, but a single research paper can mean the difference between [not at all useful] to [useful] and therefore a new opportunity unlock. Obvious in hindsight, but tricky timing to predict. It's going to be an exciting (and wild) few years.
2024-04-01 23:50:09
A common trope repeated in the media and behind closed doors is that incumbents, not startups, are best positioned to take advantage of AI. In this line of thinking, AI is a sustaining technology that reinforces the positions of existing players, rather than a disruptive force that empowers new entrants. There is certainly some truth to this perspective, particularly for startups that focused on increasing the productivity of employees pursuing their existing work — just look to companies like Adobe and Notion and the lightning speed with which they integrated AI features. In the classic race of a startup needing to figure out distribution before an incumbent figures out innovation, for too many use cases, it has felt like incumbent innovation is just an OpenAI API call away. But like all tropes, it is too simplistic. We're now seeing a class of B2B AI companies unlock tremendous revenue momentum by leveraging a timeless recipe from the consumer world: providing a 10x better experience at a fraction of the cost.
Amazon, Netflix, eBay, Uber, Airbnb - when each launched, they seemingly targeted small markets, and yet had insatiable demand for their offerings. The secret, obvious in retrospect, was that they provided a 10x better experience than the incumbent substitute, and provided that service cheaper by using technology to have a structural cost advantage over the incumbents.
In each case, almost everyone underestimated the potential opportunity of these companies. The “miss by a mile” mistake is that when you use technology to create a step-function improvement in an experience and are able to provide that service at a markedly lower cost, the demand and potential use cases for that product explodes.
Which brings me back to AI.
What if, instead of having to find an artist on a freelancer website, go through alllll the effort of selecting someone who has the style you like, has great reviews, actually responds to you, and then iterating with that person to get to the design you want over days if not weeks, you could quite simply describe what you want with words, and in seconds, an image is generated at a small fraction the cost? That’s MidJourney.
Or what if, rather than having to find, hire, and manage human translators, you could instantly have your documents or application translated as quickly as words are written in whatever languages you want, at a price impossibly cheaper than hiring a human. That’s DeepL (a Benchmark company).
HeyGen opens up a market that few would have bothered with given the cost and friction - hiring an actor for your company. The hours and hours of effort of hiring an actor, staging the production and post-production of a shoot…. Compressed into five minutes, with limitless iteration on scripts and other control mechanisms moving forward.
Companies like ElevenLabs provides a 10x better experience than hiring voice actors (instant audio, none of the overhead of hiring, none of the cost of recording). And because it's built with AI, it provides that experience significantly cheaper than hiring humans, thus dramatically expanding the market opportunity for voice acting.
What’s critical in each of these cases is that these are not explicitly productivity improvements to existing employees for existing workflows. Instead, they take markets that were constrained by all the effort, friction, and cost of hiring and working with people, and unlock them. Exactly the type of market that gets overlooked by an incumbent.
If the last couple waves of startups felt like 10x improvements, AI provides what feels like a 100x better experience than the incumbent substitute (humans!) by compressing what is almost always the significant effort of hiring and managing another person, into a near instant experience that will only get better over time. To do this at a small fraction of the cost of hiring/managing that human dramatically opens up limitless use cases and therefore dramatically expands the market. If people underestimated the size of Uber's market initially, we're all underestimating the size of many AI startups' opportunity.
The fun question to ask is which categories are next?As I wrote about before, it’s easy to go down this list of Service sector value added to the Gross Domestic Product (GDP) of the United States of America in 2021, by industry, and start to imagine where the startups might emerge within each of these high level categories… And as always, if you are building one of these companies, I’d love to meet you. Sarah at Benchmark dot com.
As I write this, I am once again aware of the 3rd rail I am touching – the fear that AI will replace humans over time. With every new technology, there is this fear. And time and time again, we see that while the new technology does indeed take on the work that humans used to do, it also creates new work for humans that lets them take advantage of what is most special about being human - our originality and creativity. As much enthusiasm as I know I have for AI, the reality is that we are still very early with this technology, and there is still near infinite work that only humans can do. AI gives humans even more space and bandwidth to achieve those pursuits.
2024-01-19 22:34:08
[I feel ridiculous for the delay in this post. I wrote 90% of it months ago, but with the Oct 7th attacks and everything that resulted, the 10% lingered. I don’t write that as a political statement, more an emotional one. It takes mental space for me to write, and last quarter, I just didn’t have a lot of it. But back at it.]
I am grateful for all the conversations I've had with founders tackling "sell work, not software" opportunities.
Some follow-up reflections from those conversations. As always, if you are working on something in this space, I'd love to hear from you (sarah at benchmark dot com).
One of the core hypotheses behind "selling work, not software" is that when you sell a 95% productivity improvement (vs squishier productivity improvement you sell against with software), you are able to charge *substantially* more for your service than you would have otherwise been able to if you sold software on a per seat basis. This can mean that the same end markets could be 10-50x larger than your software-model intuition would tell you. As I mentioned in my previous post, I’d guess a good test of the viability of a market opportunity to sell AI-built “work” is, crudely, whether there already exists a focused, outsourced group internationally to support it.
I've seen a bunch of companies that talk about selling "virtual employees". My gut is this is a limiting mental model. First, the reality is that LLM technology, while seemingly magic, is still early. It still performs best when given a limited scope vs the more general variety of tasks an "employee" typically handles. Second, how do you measure and compare the best employees? It's squishy. I prefer to think about what the skills or “jobs to be done" are, and what is the success metrics for each job. As you scale, you add more "jobs to be done" modules, which gives you more advantages with scale. This may eventually bundle back up into a reconstituted "employee", but by the winner, not the entrants.
The reality is that LLMs are still early, and especially in cases where preciseness is important (e.g., legal use cases), and that preciseness requires a human in the loop to extend the surface area of what's possible. I think embrace this. But when embracing this looks like having a customer’s employees in the loop, it pulls you back into a world of selling software. On the other hand, if it is your own employees or contractors that are in the loop, you not only maintain more clarity on your business model of “selling work”, you also can ensure that those humans don’t just perform QA but also act as power users to give explicit feedback into your system to help it get better faster. Your job is to make progress towards automating them away.
I believe there will be both. The companies that align with the VC model are going to be ones that have the potential to escape competition -- to be dominant in their market. I've written about this here. My hypothesis is that the winner in a market is going to be able to keep on adding more "jobs to be done" automations, so eventually they are able to have the largest breadth of automations vs any competitor, giving them the economies of scale classic to software and the potential to escape competition. But as I said recently to one founder, to do this, you need to run like the wind. This is a skills land grab and there is going to be a lot of competition in the most obvious categories.
I'd guess that the founding teams going after "work" are going to look different than cliche "college drop-out" profile. They are going to be teams that have lived and breathed a problem in a vertical, and see the wedge in the form of the first task to go after.