RSS preview of Elad Gil

AI Market Clarity

2025-07-23 00:47:51

AI Markets Have Crystalized

AI markets have evolved significantly over the last 4 years. When GPT-3 came out and scaling laws were openly discussed in the AI literature, it seemed clear that you could extrapolate the rate of progress from GPT-2 to GPT-3 onwards through GPT-4, 5 etc and realize a revolution was going to happen.

4 years ago, I started looking for Generative AI companies to back or help start given this curve. I ended up leading or participating in early rounds in companies like Harvey, Perplexity, Character.AI, BrainTrust, and others. At the time it was clear to just “back all the best people working on all the biggest problems” because very few people were actually starting generative AI companies. OpenAI seemed to be the only clear foundation model company (with Anthropic pre-launch and promising, Llama non-existent, and Google clearly someone who could (and would eventually) aggressively innovate but was stymied at the time by its own internal processes).

As more people outside of the core AI community woke up to this opportunity, or researchers and engineers from the main labs left to start new companies in AI, the world of AI became murkier. I used to say that the more I learn about AI, the less I knew about AI – because it was unclear in many early markets who the likely winners would be and the underlying models and tech were changing so fast. For example in 2022, it was clear code / AI driven software engineering was going to be important, but it was unclear who the winners would be (for example Cursor was not launched until 2023, Codium launched Windsurf in GA about 9 months ago, and Cognition launched Devin in limited release a bit over a year ago).

We have now entered an era where the first set of AI markets have solidified and a likely set of winners have emerged. This does not mean others won’t show up over time to compete in these markets or that current leaders won’t get acquired or eventually die (just as Stripe launched over a decade after PayPal and 4 or so years after Braintree, and Facebook launched a few years after Friendster and Myspace). We will also see a new set of markets crystalize in the coming few years and these markets today seem quite uncertain.

Markets with more clarity

1. Foundation Models- LLMs

There are many types of foundation models including large language models (LLMs), as well as models for voice, images, video, music, chemistry, biology, materials, physics and other areas. Foundation models are often driven by scale (of data, compute, certain types of post training and feedback, etc). Scale means capital, so to win in the LLM market you need high availability of capital now entering the many billions.

In the LLM market, a core set of companies have clearly emerged as the ongoing players of the future. They are often partnered with hyperscalers (Amazon with Anthropic, Google GCP with Gemini, Microsoft Azure with OpenAI and its own efforts) as these companies have an economic incentive (cloud spend on AI via AI adoption) to fund these companies that is independent of whether these companies are good investments or not (they often are). Revenue ramps for foundation model companies are rumored to be in the $0 to many $billions range in just 3 or so years, while cloud spend on “AI” has been reported to have reached a few billion per quarter for some of the main clouds.

The core players in the LLM world are now Anthropic, Google, Meta (via Llama), Microsoft, Mistral, OpenAI, X.AI. Three or four of these companies are the clear winners on various benchmarks and most broadly adopted by developers and enterprise, and are also driving most of the spend in the industry. There are newer entrants like SSI and Thinking Machine Labs driven by brilliant AI researchers, which may either come up with innovative approaches or raise ongoing money to compete, or end up in a few years as acquisitions for companies wishing to enter the market or double down on talent.

In parallel, Chinese companies have launched new open source efforts such as Deepseek, Alibaba Qwen, and more recently Kimi, that perform well on benchmarks. There is a lot to say on Chinese open source LLMs, that might be best covered in a separate post.

In the future, it is unlikely that many new core LLM companies will get started due to capital moats, barring some new breakthrough that somehow doesn’t spread quickly.

Other foundation model markets are still lacking the clear winner or winners although promising companies exist in multiple segments.

2. Code

Code is one of the earliest and clearest large-scale applications of generative AI and LLMs. For example, Github copilot was launched in October 2021 and was already well adopted and useful despite more limited functionality and fidelity of its day. Code has all sorts of properties that make it especially amendable to generative AI approaches. Revenue ramps in code are rumored to be in the $0 to $50 up to $500M in the first 2 years of some of the players live product lifetime – an insanely fast pace.

Like foundation model companies, the core likely winners in code are evident, with a handful of companies seeming to be interesting over time. Large incumbents may still enter the coding market, and products to date do not seem especially sticky but moats tend to come with time. However, this handful of companies will play an important role in the next few years of code barring the unexpected and includes Anthropic’s Claude Code, Cognition / Windsurf, Cursor, Google / Windsurf, Microsoft/Github, OpenAI, potentially startups like Magic or Poolside, and then vibe coding companies like Lovable, Replit, and others. Intriguingly, both Figma and Canva have launched vibe coding tools and one can imagine lots more coming.

A number of questions continue to exist for code (and other markets) including agentic vs IDE-based work flows and their eventual overlap (they will undoubtedly converge), as well as the degree to which the foundation model companies will integrate or subsume more of the coding companies functionality directly given both large economic value, as well as code being a bootstrap path to AGI / SI. These core questions will drive who the final winners in code will be.

3. Legal

The core players in core legal markets have solidified with Harvey (both law firms and enterprise) and CaseText as the current leaders. Other startups working in either overlapping (Legora) or new areas (Crosby) are starting to emerge. EvenUp moved in the post AI era into personal injury and related areas, while Eve and Supio are focused on plaintiff work flows. We are still very early in full workflow automation but startups like Harvey and EvenUp have started to take core legal workflows and build systems to complete the work start from finish. With the centrality of legal, companies like Harvey may be able to naturally integrate into other professional services workflows over time

Given the breadth of legal across vertical areas (patents, contracts, etc.) and type (law firms, enterprise, SMB, consumer) there may be new legal areas yet to explore.

4. Medical Scribing

Physician tools and scribing is another area with a clear set of main players that markets have consolidated against include Abridge, Ambience, Commure / Athelas, and Nuance (Microsoft). Some international players have also emerged in this area and may either end up as stand alone, or consolidated into the main players.

The key next step for these players is to expand products into other areas of the healthcare stack.

5. Customer service / experience

The customer experience market in the US appears to have consolidated in the short term into a few core startup players – Decagon and Sierra, while incumbents like Intercom, Zendesk and others work to add and cross-sell generative AI capabilities. Other startups of interest include Forethought, Maven, Parahelp, Wonderful, and others. Like many of the markets above, customer experience has the characteristic that generative AI is displacing or augmenting humans strongly with agentic work, versus another seat-based workflow tool.

We are starting to see the shift from selling seats, to selling units of cognition. This is an underdiscussed aspect of AI companies in general.

Reasoning model advancements and agentic infrastructure products will only accelerate this shift.

6. Search and IR re-invention

Players focused here include Google, OpenAI (chatGPT), Perplexity, and Meta. Perplexity is notably the main startup in this market, while most of the other players are incumbents. This may end up being true for other consumer and prosumer markets as well, although there is a lot of room for new consumer use cases.

Perplexity and the other players are moving quickly into the agentic future (see section below), both through tools like Deep Research, but also more recently via entry into the browser market. For example, Perplexity’s Comet browser already incorporates agentic actions for shopping on the web and other actions.

Future markets that will be important

3 years ago it was clear foundation models / LLMs, code, healthcare, customer service and other areas would be important AI markets, but it was much less clear who would win or be important. Today the market leaders that are likely to play a short term role are clear (although as mentioned often new startups or incumbents launching products can still join later in the game).

The next set of markets that seem highly interesting and tractable to generative AI include, but are not limited to the below. I and my team are highly interested in investing behind companies in these areas:

Accounting. People are both building software here and in some cases doing rollups.
Compliance. Lots of different forms of compliance. Pharma compliance is one example with early players like Blue Note Health.
Financial tools. Tools to help financial analysts or others. There are a number of great teams and early companies working in this area.
Sales tooling and agents. Lots to do here – from AI agents doing SDR work to tools to augment enterprise sales.
Security. An enormous number of potential applications here, particularly in areas of high service utilization or large team complexity. There will separately emerge security companies focused on AI-end points, agents, or foundation model usage that could lead to data leaks or other exploits.
Other markets TBD. Lots to do!
What have I missed? Let me know on X.com or directly.

There are a set of exciting companies in each of these areas, and which of them will pull ahead or win will likely crystalize in the coming months or quarters. In some cases models may not yet be good enough to address these markets well, in others a broader workflow tool or better GTM approach needs to be developed. In some cases it might just be a matter of time – it takes time to fully understand a customer, build against their needs, and see product-market pull and fit.

I (and my team) am actively looking to invest in all these areas – so please ping if working on it. :)

Model vs GTM vs Team?

For the “new” market segments above, one core question is what has been preventing market crystalization?

Part of it for some markets is the models need to advance in reasoning or fidelity. For example, legal workflows did not work on GPT 3.5 but with GPT4 really took off and Harvey benefited from this plus custom model work. Similarly, coding tools like Cursor benefited from Claude 3.5 (launched June 2024) and were just not as useful until the models used crossed some tipping point of fidelity. Building early against customers pre-model fidelity allows you to capture market share once the models get better.

This leads to the concept of the GPT Ladder - at a certain level of GPT (or Claude, or Gemini, or Grok, or Llama) release, specific new markets will open up. For example, GPT-5 (or Claude X, or Gemini Y) should support an entirely new market that simply was not tractable technologically in the absence of the model.

Beyond models, other things that may prevent adoption of generative AI in a market may include:

Go-to-market, buying behavior & competition.
- Startups may be getting go-to-market wrong (selling to the wrong customers in the wrong way)
- There may be too much incumbent lock in (or ability to counter startups quickly with worse product or cross sell against an existing UI or workflow owned by the incumbent).
- Buyers in the market may just be slow to move and adopt and will get there over time.
Teams. Perhaps the right team has not shown up yet, or the founders are new to the market and will take a bit of time to learn and iterate what customers need.
Time to build. It takes time to build and ship useful products. Sometimes activity in a market is just too nascent and in 6-12 months a big enough product footprint will exist for winner(s) to emerge.

Agents everywhere

One big ongoing shift is moving from pure tooling “AI chat” to agentic workflows. Agents are AI software that do actions on your behalf. It is the difference between looking up information about travel to Spain on Google, and having Google spawn an AI agent that goes and books the travel for you and takes actions on your behalf. Coding tools like Devin and customer service tools like Decagon/Sierra seem the earliest B2B adopters of agentic workflows, while informational tools like chatGPT, Gemini, and Perplexity are adding agents to do deep research for their users.

As reasoning models and agents proliferate, new infrastructure to support agentic deployment & workflows is accelerating. A number of startups are working on agentic frameworks or infrastructure, while consulting or large-scale deployment firms are also adding agents to their roster of tools to deploy in enterprises.

We are starting to see the shift from selling seats, to selling units of cognition or human equivalent labor.

AI Roll Ups

I have been talking about, and investing in, generative AI driven roll ups for 3 years or so. From the earliest days of generative AI it was clear that this new form of scaled transformer-based AI was very good at human knowledge work – which is much of the white collar services economy. In AI driven roll ups, buying a company versus just selling them software can lead to outsized faster adoption and economics than just selling software.

Often, adoption of AI is not a technology problem – it is an organization, process, and people issue. Can you rework an entire organization or its way of doing things around the AI tooling? This is often a much harder problem than just the AI tooling itself and requires owning the actual company to be able to rework organizational processes on a short enough time frame to matter (at least for a startup – incumbents often take a long time).

I have funded 2 AI driven buyouts to date and am excited to back more- reach out if you are working in this area!

Market ending moves

In another post, I talk about “market ending moves” – what big strategic play can you do (M&A? Enormous capital scale and investment? Other?) can you do to just win first place in a market.

As markets consolidate, the strategic moves to just win a market become clear. This will likely lead to various forms of M&A, partnership, channel lock in, or other strategies as the handful of players per market consolidate down into one or a small handful.

We should see a lot of consolidation and M&A coming quite soon due to the ability to win a market by combining the two main startup leaders (often hard to negotiate but worth doing) or incumbent/startup pairs (distribution + tech = winning).

Very exciting times are ahead.

Summary - AI markets have crystalized

AI markets are now the clearest they have been in a few years. The leaders in a given segment are clear for many of the earlier generative AI markets like Code and Legal, while new markets are ripe for disruption. Exciting times ahead.

Fertility

Given how important fertility is, it is shocking how few companies are working in the area. Two main thrusts of fertility that are interesting:

Allowing any adult to have children with anyone else. There is strong scientific results out of Japan in mice, in which many cell types can be converted into stem cells, and then differentiated into either sperm or egg cells. The human version of this means (roughly) anyone could have kids with anyone else. Gay couples could have kids with each other (one person would have their cells expanded into sperm, the others eggs), as long as they had a surrogate (see for example this paper from Hayashi’s group in Japan about mice produced via this technique using two fathers). Women of any age could have children. It is a huge unlock societally. Some reviews. Popular version. See also.
Expanding usable egg populations in women. Girls are born with 1 to 2 million oocytes (egg cells) and by puberty end up with ~300,000. Only a small number of these ever mature into the eggs that can be harvested for IVF or used for natural pregnancies. Methods to expand and mature egg cells seem dramatically under developed. See for example.

Anti-aging / longevity

There is clear data that aging is a genetically manipulatable and drugable phenotype. Given the size and important of the prize (e.g. adding 100+ years of productive health life to each person) it is a surprisingly empty field with at most a half dozen or so legitimate companies working on it (including eg BioAge and NewLimit - I am an investor in both).

Neurosensory aging. As you age you lose aspects of hearing and site. For example, the muscle holding the lens of the eye weakens in part leading to blurrier vision and the need for reading glasses for people in their 40s. This seems drugable. Ditto loss of hearing in aging adults. This example of reversing aging in mouse age dependent glaucoma models is intruiging (much simpler approaches then this seem like to work).

Cosmetic aging. Cosmetic use of Botox is roughly at $1.6 billion per year. People are literally injecting a bacterial toxin into their skin. Imagine anti-aging drugs that actually rejuvenate aspects of aging? Examples:

Skin aging and wrinkles. There is enormous demand for things that will reverse or stop skin aging, as seen via botox, face lifts, face fillers, and basic cosmetics.
Balding. Hair loss is age dependent. Some treatments like Minoxidal or Propecia can halt or partially reverse it. Why aren’t there drugs that restore hair completely?
Grey hair. As you age you lose melanin cell production in/around hair follicles. This leads to greying hair. A lot of the biology has been worked out. Why are there not more active efforts to reverse this?

Dental

Tooth regrowth. Many species like sharks can regrow teeth indefinitely, while humans naturally produce a second set of teeth during childhood. Why cant we just regrow a tooth with a bad cavity versus do extensive dental work? Genes like USAG-1 may allow for tooth regrowth in certain animal models, and other approaches and factors exist.

Biomarkers

Often the best way to run a clinical trial or discover drugs rapidly is to have an easily interrogatable biomarker that is a proxy for a biological effect. For example, we measure lipid levels as a proxy for certain types of heart health (and people are trying to develop new biomarkers for cardiac health). For most diseases, we do not have any form of biomarker from blood, saliva, or anywhere else that would be easy to track and use to expedite drug discovery. Given all the data we could theoretically generate per person, and all the ML/AI algorithms and approaches, it is a bit shocking that biomarkers are in such a primitive state.

Novel biomarkers. Develop novel biomarkers for disease states as a way to expedite drug development and study disease course. This may not be great as a stand alone company, unless some of the biomarkers are good replacements for less comfortable procedures. For example, Exact Biosciences is a $10 billion market cap company, as its product lets you do a biomarker test of your poop for colon cancer risk versus the less comfortable colonoscopy. One could imagine many more of these sorts of tests.

Smell / volatile molecules. Dogs can smell if people have certain types of cancer. This means some biomarker(s) of cancer is emitted into the air and detectable by a dog nose. Why don’t we have similar sensors built? One example in this direction to screen lung cancer.

Other health areas.

There are a wide variety of other things that may impact human health that go ignored - for example air pollution levels and cognitive function - that may have broader societal value. Those may be subject of a future post.

NOTES

[1] There are one off companies in each area. In some cases good ones, in other cases the translational biotech market seems to not have any legitimate players.

[2] References included are not the canonical or key papers in many cases - rather just wanted to show evidence these things are possible. Full scientific citing would take a lot of time and I am a bit overloaded.

[3] I am generally not investing in much biotech for the last few years. However, if there was an incredibly compelling team taking a smart approach in this area I would be interested.

Examples of problems in Delaware causing companies to leave

Part of the concern in Delaware is, as a WSJ opinion piece recently put it, “the likelihood of expensive, meritless or value-destroying litigation leads public companies in Delaware to avoid deals they would otherwise make.” The other part is a few more extreme Delaware activist judges who apply “creative writing” to interpreting the law. The most famous example of this is a court invalidating Elon Musk's board and then subsequently a shareholder-approved compensation package. This ruling was viewed as political activism by many versus following any form of legal framework. Tesla subsequently reincorporated to Texas with 84% non-controller shareholder support.

However, there are other recent examples beyond Tesla. These include:

Trade Desk - Delaware court required special committee approval for controller transactions that were fully disclosed at IPO and had operated without challenge for years
Activision - Delaware court invalidated merger approval process on technical grounds despite 98% shareholder approval and no evidence of harm
Moelis & Company - Delaware court invalidated long-standing stockholder agreements between the company and its founder that were disclosed in IPO; required legislative fix
TripAdvisor - Delaware court ruled that merely reincorporating to Nevada required special committee approval and minority shareholder vote because reducing litigation exposure was deemed a "non-ratable benefit" to the controller

Three articles on why companies are leaving Delaware.

State by State Comparisons from Ben Potter & Team at Latham, Watkins LLC.

Most analysis of which state to incorporate in to collapses down to a choice between Delaware, Nevada and Texas. You can find a full analysis from Ben Potter & Team at Latham here (they have kindly open sourced a quick analysis they did for me and my team):

Latham State Of Incorporation Comparison

1.26MB ∙ PDF file

Download

In general, Delaware has been historically seen to be stronger on:

Historical development & predictability of corporate law (although judicial activism is impacting this)
Experience of judiciary (specialized corporate law judges)
Office of the Secretary of State (ease of set up and doing business)

While Nevada and Texas have been better on:

Litigation risk
Business judgment rules
Takeover defenses
Director and Officer indemnification & exculpation
Expenses of doing business

Between NV and TX, there seem to be slight tradeoffs between each state enumerated below. One plus for NV is that fiduciary duties are established in statute rather than case law (Delaware and Texas being in the case law camp), so theoretically Texas could be subject to the same risk of politicization / creative writing over time that is leading companies to exit Delaware. The hope is that Nevada’s statute-based approach will offer more predictability, which was historically the draw for Delaware. However there are obvious benefits to Texas as well, leading companies like Tesla and SpaceX to decide to move there.

Who is moving where?

Companies that have left, or announced, plans to leave Delaware for:

Nevada: Dropbox, TripAdvisor, Neuralink, Pershing Square, and others
Texas: SpaceX, Tesla, and others

Tradeoffs DE, NV, TX

Tradeoffs to consider in terms of where to incorporate include case law, judges, fiduciary duties and other interests, as well as expense and cost. See also a write up from TradeDesk.

Full analysis from Ben Potter & Team at Latham here:

Latham State Of Incorporation Comparison

1.26MB ∙ PDF file

Download

On the margin it seems a few more companies are choosing Nevada over Texas, although obviously SpaceX and Tesla are giant companies who decided to move to Texas. Anecdotally it seems like more corporate entities have moved to Nevada, while more market cap has moved to Texas.

Legal, governmental, and statutory considerations

Expenses

The other tradeoff faced by companies is the cost of doing business in each state. Expenses seem possible lower in Nevada or Texas versus Delaware when comparing the following:

How to leave Delaware

If you are just incorporating a company now, it is worth considering Nevada or Texas up front versus dealing with it later.

If you are a private company, moving to Nevada or Texas is straightforward. It will probably take a few months start to finish, but with most of the lift done by lawyers and accountants working in the background and not require much of your time as CEO. This process is known as domestication or conversion, depending on the state.

You will have to do a shareholder vote and/or board approval, initiate a conversion/domestication filing in both DE and NV/TX, obtain formation/articles of incorporation and pay a filing fee. You lawyers will then draft new bylaws compliant with new target state laws, revise or update any stock certificates, etc. Finally you may get a new EIN, file final returns in DE and initiate in the new state, and update contracts, employee benefits etc. There may be some industries with special licensing, permitting, or professional accreditation needs, but these tend to be manageable. It is basically a bunch of paperwork, but not too onerous.

If you are a public company seeking to leave Delaware, the most important thing to focus on is proper process. Process is impossible to fix after the fact, and Delaware prefers companies not leave (it is a major source of state revenue). Adhering to good process up front allows you to avoid Delaware court action or unnecessary litigation. This was an issue for TripAdvisor as it tried to leave the state for Nevada.

For a public company, the first step is to create a special committee of independent directors to start and hold the conversation on where to domicile the company. The committee needs to take a broad view of the potential move and re-incorportation and whether it benefits all shareholders (and not just the controller and/or directors of the company - which is what happened in the TripAdvisor case). Things that benefit all shareholders may include lowering costs of doing business, decreased litigation risk, or related. Everything needs to be cleanly documented and kept objective.

If you have 50% of more voting control of your company, you can initiate the move via board approval & written consent. If not, you will likely need to go down the full shareholder vote route. This opens up the potential for more complexity not only in terms of e.g. proxy filings, but also increased potential rejection by minority shareholders. This will not block the move entirely but will increase litigation risk quite a bit.

TradeDesk has a good write up on some aspects of this.

Politically exposed and founder-controlled companies may be the first to move

Given recent activism in Delaware courts, there are two types of companies most likely to leave the state and convert to Nevada or Texas entities. This includes founder-controlled companies that want more leeway and protection in how they operate, and politically exposed companies. Political exposure may include who the owner is (Elon Musk for example), what is the nature of the business, and its perception as politically relevant to be made an example out of. Undoubtedly this will be a minority of companies (right now >60% of the Fortune 500 are in Delaware). Just as a subset of companies has been trickling their headquarters out of California due to its high tax rates, extra regulatory burdens, and poor governance (see Palantir, Oracle, HP, Tesla, and others), one could expect an increasing flow of companies moving their incorporation out of Delaware over time in parallel.

The hope, of course, is that Delaware is able to correct some of the issues it has had in the recent past in order to maintain its position as the default for company incorporation. In the meantime, more companies are actively considering a move.

OTHER POSTS

My book: High Growth Handbook. Amazon. Online.

Markets:

Firesides & Podcasts

Startup life

A brief guide to startup pivots

4 Ways Startups Fail

Founder Investors and Scout Programs

Co-Founders

Raising Money

Discussion w Arthur Mensch, CEO of Mistral AI

2024-03-22 20:19:30

Topics covered:

Mistral
Open and closed source AI
Future tech (small models, context windows, etc)
EU AI & startup scene
Enterprise AI needs
Building fast moving teams

Video link:

Transcript:

DYLAN FIELD

Hi everybody. Welcome. Thank you so much for being here. I am so glad that we're able to host this at Figma. My name is Dylan Field. I'm the CEO and co founder of Figma. And a big welcome to everybody here, but also everyone who's joining us via live stream as well. And I'm really excited for tonight. I think this is going to be a pretty credible conversation and I'm proud to be able to introduce the two folks who'll be having it.

So first, Elad Gil. Elad is not only a dear friend and mentor of mine, but also to many in Silicon Valley and the startup community globally, and also Arthur Mench. Arthur is a former academic who has turned CEO and co founder of Mistral. And mistral for the one or two people in the room that do not know, is breaking incredible ground in open source models and I would dare say changing quite a lot about the future of AI.

And with that, I'll pass it off for their fireside. Welcome.

ELAD GIL

Oh, thanks. Thanks so much to Figma for hosting us, and thanks everybody for making it today. And of course to Arthur. Arthur made a heroic effort to join us where he literally had to jump out into traffic, grab a bike and bike over here. So thank you so much for coming.

ARTHUR MENSCH

Discovering the US, I guess.

ELAD GIL

So from a background perspective, you got your PhD in machine learning, you were a staff research scientist at DeepMind, and then you started Mistrall, and you started it, I believe, with both some folks from Google, such as yourself, and then some folks from meta and the llama project there. You folks have taken an open core approach, which I think is super interesting and we can talk about in a little bit. But I was just curious to start off, what was the impetus for starting? Mistral All, how did you decide to do it? What were the motivations and the initial formation of the company?

ARTHUR MENSCH

Yeah, so I think this has always been on the mind of me, Guillome and Timothe. So I was at DeepMind, they were at meta, and I guess we were waiting for the hour, and the hour came with GPT to some extent, so that we realized we had an opportunity to create a company pretty quickly with a good team that we could hire from day one and go and try to do speedrun a bit because we weren't starting the first. So that's how we got started.

ELAD GIL

I guess the people who are probably watching the live stream, I think the people in the audience are probably well versed with what Mistral does. Can you explain a little bit about the set of products you have the platform, all the various components now.

ARTHUR MENSCH

Yeah, for sure. So Mistral is actually a company building foundational models. We are the leading in open source models. So we have started the company by creating text to text generation models, which are really the foundational block for creating today's generative VA applications. I know we're at Figma, so we're not focusing on images yet, but this is obviously coming at some point. And so, yeah, the differentiation we have is that we took this open core approach to release Mistral 7B mixed hole 87 B in December and build a platform on top of these open source models with the addition of commercial models that we introduced in December and then in February. So we're building an open source models and we're building a portable platform for enterprises with focusing on the developers and building tools for developers.

ELAD GIL

How long did it take from when you founded the company to when you.

ARTHUR MENSCH

Launched 7B took four months, approximately.

ELAD GIL

Yeah. That's amazing. So I think one of the things that's really noticeable is the immense speed in terms of how rapidly Mistral actually launched its very first product and then the rapid adoption of that right as 7B came out suddenly, I think people realized that you could have these small performant models that were very fast. Inference time and time to first token were very cheap, which made a big difference if you were doing things at high throughput. How did you build something so rapidly? Or how did you focus a team on such a singular goal so quickly?

ARTHUR MENSCH

Well, I guess we thought about what was missing in the field and we realized that small models were actually quite compelling for people. We saw a community building on top of llama at the time, on top of llama 7B. But llama 7B wasn't good enough. And so we realized that we could make it much better. We could make a 7B model much better. And so that's the sweet spot we targeted for our introduction to the world. And basically we had to build the entire stack from scratch. So getting the data, building the training code, getting the compute, which was a bit of a challenge because in these four months we were ramping up. So we started at zero GPUs and we actually trained on like 500 GPUs, 7B, I guess we went fast because the team was very motivated and so not a lot of holidays during these four months. And generally speaking, AI teams that succeed and go are typically like four to five people. And AI teams that invent things have always been this size. So we are trying to have an organization where we have squads of five people working on data, working on pretraining, and so far this has worked out quite well.

ELAD GIL

Is there anything you can share in terms of what's coming next in your roadmap?

ARTHUR MENSCH

Yeah, so we have new open source models, both generalist and focused on specific verticals. So this is coming soon. We are introducing some new fine tuning features to the platform and we have introduced a chat based assistant called the Shah that is currently just using the model. So it's pretty raw. It's a bit like chat GBT V zero, and we're actively building on building data connectors and ways to enrich it to make it a compelling solution for enterprises.

ELAD GIL

What kind of verticals do you plan to focus on, or can you share that yet?

ARTHUR MENSCH

Well, I guess we started with financial services because that's where most of the maturity was. Basically we have two go to markets. So enterprises starting with financial services because they are mature enough, and the digital native. So talking to developers like building AI companies or introducing AI to formerly non AI companies, and so that's the two, I guess, go to market pools that we're talking to. The first one through some partnerships with cloud because as it turns out, they're a bit controlling the market in that respect. And then through our platform, we're talking directly to developers.

ELAD GIL

I guess on the cloud side, one of the relationships you recently announced was with Microsoft and Azure. Is there anything you can say there about that relationship or that access that it's providing you to the enterprise?

ARTHUR MENSCH

Yes, this opened up new customers. A lot of enterprises can't really use third party SaaS providers easily because you need to go through procurement, risk assessment, et cetera. But if you go as a third party provider through the cloud, you actually get an accelerator. And so when we believed Mistral Large on Azure, we got like 1000 customers pretty right away. The truth is you need to adapt to the fact that enterprises are using cloud and they don't want to introduce new platforms easily. And so you need to go through that, at least at the beginning.

ELAD GIL

And then one of the things that a lot of the industries focus on right now is scaling up models and ever larger, ever more performant versions. How do you think about the scale that you all are shooting for in the next six months or year? Or is the plan to have very large models over time? Or how do you think about the mix of things that you want to offer?

ARTHUR MENSCH

Yeah, so we first focused on efficiency to be able to train models more efficiently than what was currently done. And then once we had achieved this efficiency, we started to scale so that's why we did another fundraising, and that's why we started to increase the amount of compute we had. And so we can expect new models that will be more powerful because we are pouring more computes in it, models that might be a bit larger, because when you grow the compute, you need to increase the capacity of models. But something that remains very important for us is to be super efficient at inference and to have models that are very compressed. And so that's the kind of model that will continue shipping, especially to the open source world.

ELAD GIL

One of the things that was pointed out to me that I'd love to get your views on is that as you reach certain capabilities within a model, you can start to accelerate the pace at which you can build the next model, because you can use, say, a GPT four level model to do rlaif, or to generate synthetic data, or to do other things that really accelerate what you're doing. Data labeling, all sorts of things, in some cases superhuman performance. How do you think about using models to bootstrap each other up? And does that actually accelerate the timeline for each subsequent release?

ARTHUR MENSCH

Yeah, I guess. Generally speaking, two years ago, RLHF was very important. Today it's actually less important because the models have become better, and they're actually sometimes good enough to self supervise themselves. And what we have noticed is that we scale as we scale. This is definitely improving. So that means that the costly part of going through human annotations is actually reducing. And so this is also lowering the barrier of entrance.

ELAD GIL

I guess another sort of adjacent area is reasoning. And a lot of people feel that as you scale up models, they'll naturally acquire reasoning. And then there's other approaches and entire companies that have recently been founded around just focusing on the reasoning aspect of some of these models. How do you think about that? Are you going to be training sub models for reasoning, or do you think it's just going to come out of scaling the existing models? Is it a mix of the two?

ARTHUR MENSCH

Well, I guess reasoning comes from. At this point, the only validated way of improving reasoning is to train models on larger data and make them bigger. There's obviously some possibilities that you have by building an auto loop, adding new function, calling, adding data as well for the model to reason about grounded aspects instead of trying to imagine stuff. So I guess we don't pretend to have like a secret recipe for reasoning, but we've made models that are pretty good at reasoning by focusing on the data. We're pretty good at using mathematics in our data. And so that's a good way of improving reasoning. There's many ways in which to improve it. Code has helped as well, and so there's no magic recipe, but just focusing on the little things makes it work.

ELAD GIL

Yeah, I guess one of the reasons I ask is, I feel like if you look at sort of the world of AI, there's a few different approaches that have been done in the past. One is sort of the transformer based models and scaling them. The other is a little bit more in the lines of like, Alphago and poker and some of the gaming related approaches where you're doing self play as a way to bootstrap new strategies or new capabilities. And those are in some sense forms of reasoning. And I know that there are certain areas where that may be very natural to do in the context of model training. Code would be an example. There's a few others where you can sort of test things against real rubric. And so I don't know if you folks are considering things like that, or if that's important or not in your mind.

ARTHUR MENSCH

So Guillaume and Timote were doing theorem proving with LLMs back in the day at meta. So that's very linked to, well, using LLM as the reasoning brick and then building an auto loop that involves sampling, that involve Monte Carlo research, all these kind of things. I think the one thing that was standing in the way for this is the fact that models have very high latency, and if you want to sample heavily, you need to make them smaller. And so it's very much tied to efficiency. So as we grow efficiency, as hardware increases in capacity as well, you become more able to explore more and to sample more. And that's a good way effectively to increase reasoning through the autoloup development.

ELAD GIL

And then I guess the other thing a lot more people are talking about or thinking about is memory and some ability to maintain a longer view of state in different ways across actions or chaining things for agents. Do you expect to go down any sort of agentic routes anytime soon? Or is the focus much more on sort of core APIs that are enabling in all sorts of ways?

ARTHUR MENSCH

So that's what we started to enable with function calling, which is a good way to handle, to start creating agent that store states. So memory, when we talk about memory, like of conversation, the way you make it happen is that you basically introduce some crude functions on your middleware part that you give to the model, and so it can actually use that to update its memory and its representation. And so function calling is the one multipurpose tool that you can use to create complex agent. It's hard to make it work, it's hard to evaluate as well. So I think this is going to be one of the biggest challenge. How do you make agent that work, evaluate them and make them work better for feedback? And this is one of the challenge that we'd like to tackle on the product side.

ELAD GIL

And then I guess the one other thing that a lot of people have been talking about recently is just context window. And for example, I know that there's some recent results around, for example, biology models, where if you increase the context window, you can end up with better protein folding and things like that. So the context and the context really matters. I think Gemini launched a million, up to a few million sort of context window, and then magic, I think, has had 5 million for a while. How important do you think that is? Do you think that displaces other things like rag or fine tuning? Are all these things going to work coincident with each other?

ARTHUR MENSCH

So it doesn't displace fine tuning because fine tuning has a very different purpose of pouring your preferences and basically demonstrating the task. On the other hand, it simplifies rag approaches because you can pour more knowledge into the context. And so what we hear from users is that it's like a drug. So once you start to use models with a large context, you don't want to go back. And so that's effectively something we want to try to improve and extend. There's a few techniques for making it happen. On the infrastructure side, it's actually quite a challenge because you need to handle very large attention matrices, but there are ways around it.

ELAD GIL

I see what you're saying. So basically, like on the RAM, on the GPU, you basically ran out of space for something as you're building a bigger and bigger context window. Or is it something else?

ARTHUR MENSCH

Yeah, there's a variety of techniques you need to rethink for sharding and communication to handle the big matrices. And then you do pay a cost because it basically becomes slower because of the quality cost.

ELAD GIL

When do you think we hit a moment where these models are better than humans at most white collar tasks? Do you think that's two years away, five years away, ten years away?

ARTHUR MENSCH

I guess it depends on the task. There's already a few tasks on which the model are actually better. And so I expect this to unfold pretty quickly, actually. So hard to say a date, but I would say in three years this is going to look very different, especially if we find a way to deploy agent and to evaluate them and make them robust and reliable.

ELAD GIL

What about displacing the CEO of Figma? No, I'm just kidding. Just kidding. Dylan, please keep us. So I guess there's a lot of different foundation models that people are starting to work on, right? There's obviously a lot of attention on the LLMs, and there have been diffusion models for image gen, although it seems like people are moving more and more towards image or transformer based approaches for image and video and other things. Are there big holes in terms of where you think there are gaps where people aren't building foundation models, but they should be?

ARTHUR MENSCH

I would say we've seen some things happening on the robotic side, but I think it's still at the very early stage on the audio. This is covered on video. This is starting to be covered, essentially, like models that can take actions and become very good at taking actions. I don't think this is very well covered. There's some progress to be made there, but, yeah, overall, I expect all of this to converge towards similar architectures and at the end of the day, like a joint training as we move forward in time.

ELAD GIL

So do you think eventually everything is a transformer based model?

ARTHUR MENSCH

Well, transformer are a very good way of representing associations between tokens or between information, so it really does not really matter, but it seems to be enough, it seems to be a sufficient representation to capture most of the thing we want to capture, and we know how to train them well so we can transfer information between what we learn from text on images, et cetera. And so that's why I think this is going to be quite hard to displace.

ELAD GIL

Do you think that'll also apply to the hard sciences? If you're trying to do, like, physics simulation, material sciences, pure math.

ARTHUR MENSCH

I don't expect just next token prediction to solve that. And so you do need to move to the outer loop, and you need to figure out also a way to make models interact with simulators, potentially, because at some point, you need the model to learn the physics, and so you need to bootstrap that with the simulator. But I'm not an expert, to be honest.

ELAD GIL

And then all these models, of course, need a lot of GPU, and people have very publicly talked about how there's a GPU crunch right now, and there's shortages of different sorts. When do you think that goes away, or do you think that goes away?

ARTHUR MENSCH

So I think that probably eases as the H and rit comes, and we'll start to see some competition on the hardware space, which is going to improve cost, I think. I expect that also as we move to foundational models that are multimodal, et cetera, we can actually train on more flops. And so I don't think we haven't hit the wall there in scaling. And so I expect this is probably going to continue on the training part and on the inference part as we move on to production and we have models running agent on the background. So really removing this bottleneck that we had at the beginning, which was the speed at which we could read information, I expect that inference capacity will spread pretty significantly.

ELAD GIL

Do you think that will be done through traditional GPU based approaches, or do you think we'll start having more and more custom asics, either for specific transformer models where you burn the weights on the silicon, or more generally for transformers in general, where you can just load a set of weights or something.

ARTHUR MENSCH

So the good thing about the fact that everybody is using transformer is that you can specialize hardware to this architecture and you can make a lot of gains there. There's a few unfortunate bottleneck on Nvidia chips, for instance, the memory bandwidth is a problem. And so by moving on to more custom chips, you can reduce significantly the cost of inference. It's not really ready yet, so we're not betting on it right now, but I really expect that this is going to improve cost pretty significantly.

ELAD GIL

So mistral really started off as a developer centric product, right? You launched to something that was very open source. Now you're starting to serve a variety of enterprises. Is there any commonality in terms of the types of use cases that people are coming with or the areas that enterprises are most quickly adopting these sorts of technologies or approaches?

ARTHUR MENSCH

Yeah. So enterprises adopts the technology for mostly three use cases. So the first one is developer productivity. And usually they kind of struggle with the off the shelf approach because it's not feed to their way of making, of developing. They also use knowledge management tools, and usually they've built their own assistant connected to their database. And the last thing is customer service. So the most mature company have made a large progress toward reducing their human engagement with customers and just making it much more efficient. And so these are really the free use cases we see with enterprises. And with AI companies, it's much more diverse because they are a bit more creative. But yeah, overall, enterprises have these free use cases. It's also the reason why we are starting to think of moving a bit on the value chain and offer things that are a bit more turnkey, because sometimes they need a little bit of help.

ELAD GIL

Yeah, that makes sense. I'm guessing many people here saw the tweet from the CEO of Klarna where he's talking about customer success and how they added a series of tools based on top of OpenAI that basically reduced the number of people they needed by 700. In terms of customer support. They launched it in a month and they had 2.3 million responses in that single month. So it seems like there's this really big wave coming that I think is almost under discussed in terms of impact of productivity, impact of jobs and things like that.

ARTHUR MENSCH

Yeah, so we saw even more diverse use cases. One of them was having a platform that engaged with temporary workers to try and find a job for them. So through texting, and so the customer in question went from 150 people activating this well, engaging directly with customers to seven, and they were actually able to scale the platform much more and to enable temporary workers to work more easily. And generally speaking, this approach of automating more of the customer service is a way to improve the customer service. And so that's, I think, what is exciting about this technology.

ELAD GIL

What do you think is missing right now, or what is preventing enterprise adoption from accelerating further?

ARTHUR MENSCH

So our bet is that they still struggle a bit to evaluate and to figure out how to verify that the model can actually be put in production. What's missing are a bunch of tools to do continuous integration, also tools to automatically improve whatever use case the LLM is used for. And so I think this is what is missing for developer adoption within enterprises. Now for user adoption within enterprises, I think we're still pretty far away from creating assistant that follows instruction, well that can be customized easily by users. And so yeah, on the user side, I think this is what is missing.

ELAD GIL

One thing that I think you've been very thoughtful about is how to approach AI regulation. And I know that you've been involved with some of the conversations in terms of EU regulation and other regulation of AI. Could you explain your viewpoint in terms of what's important to focus on today versus in the future and how to think about it more generally?

ARTHUR MENSCH

Yeah, so we had to speak up because at the time, in October, there was a big movement against open source AI. And so we had to explain that this was actually the right way to today, make the technology secure and well evaluated. And overall we've been continuously saying that we're merging very different conversations about existential risk, which is ill defined and that has little scientific evidence for this is merged with a discussion about, I guess, national security and AIA and LLMs being used to generate bioweapons. But again, this is something that is lacking evidence. And then there's a bunch of very important problems that we should be focusing on, which is how do you actually deploy models and control what they are saying? How do you handle biases? How do you set the editorial tone of a model in a way that you can evaluate and control? And I think this is the most important part. How do you build safe products that you can control well and that you can evaluate well? And this is the one thing we should be focusing on. That's what we've been saying for a couple of months because we were a bit forced to speak up.

ELAD GIL

Yeah, it seems like one of the areas that people are kind of worried about in the short term on AI is things like deepfakes or people spoofing voices or other things like that, either for financial attacks, for political purposes, et cetera. Do you all have plans to go down the voice and sort of multimodality side?

ARTHUR MENSCH

So generating things that are not text is effectively a bit more of a trap on the safety side, and that we've avoided it effectively. Imitating voices and deepfakes are very concerning. And this is not something that we pretend to be able to sort text. It's much easier because there's never this kind of problem. So you can generate text is generating text is never an enabler of very harmful behavior. Misinformation has been mentioned, but usually misinformation is bottlenecked by diffusion and not by creation. So by focusing on text, we kind of circumvent these issues, which are very real.

ELAD GIL

I think one of the things that's very striking about Mistral is, and I should say in Europe in general right now, is there's a very robust startup scene. And if I look at the two biggest pockets of AI right now in terms of startup formation, it's basically here in Silicon Valley, and then it's like the Paris London corridor, and you have eleven labs and you have Mastrall and you have all these great companies forming. What do you think is driving that?

ARTHUR MENSCH

I think there's a couple of historical reasons. In London there was, and there still is DeepMind, which was a very strong attractor of talents across the world. And in Paris in 2018, both DeepMind and Google opened offices, research offices, and it went and augmented the existing research scene that was already pretty strong, because as it turns out, France and also a couple of other countries in the European Union have very good education pipeline. And so junior machine learning engineers and junior machine learning scientists are quite good. And so that's one of the reason why today we have a pretty strong ecosystem of companies on both the foundational layer, but also on the application layer.

ELAD GIL

Yeah, the French seem a lot smarter than the British. So. No, I'm just kidding.

ARTHUR MENSCH

I'm not the one saying that.

ELAD GIL

The other thing that I think is kind of striking is you start to see a lot of different AI based companies focused on regional differences. So, for example, when you launched, you included a variety of different european languages. I know there's models being built right now for Japan, for India, for a variety of different geos. And one could argue that either you have large global platform companies that serve everywhere, except for maybe China, because China is likely to be firewalled in some ways, just like it has been for the Internet more generally. Or you could imagine a world where you have regional champions emerge. And in particular, you could almost view it like Boeing versus Airbus, where the governments of specific regions decide that they really want to fund or become customers to local players. What do you view as sort of the future world, and how does that evolve in terms of global versus regional platforms?

ARTHUR MENSCH

So we've taken a global approach to distribution. I guess there was another path that we could have taken, which was to focus on just the european market, pretending that there was any form of defensibility there. We don't think this is the case. Technology remains very fluid and so can circulate across countries. On the other hand, the technology we're building is effectively very linked to language, and language is, well, English is only one language across many. And as it turns out, LLMs are much better at English than other languages. So by also focusing more on different languages, we managed to make models that are very good at european languages in particular, versus the american models. And so there's a big market for that. And similarly, there's a big market in Asia for models that can speak asian languages. And there's a variety of scientific problems to be sorted and solved to address these markets, but those are huge, and those haven't been the focus of us companies. So it's effectively an opportunity for us as a european company to focus a bit more on the world globally.

ELAD GIL

Okay, great. I think we can open up to a few questions from the audience, and if people want to just ask, I can always just repeat it in the back there, please. Yeah, right there. If you want to speak loudly, I can repeat what you say. The question is, do you plan to release closed source versions of your model or always be open source?

ARTHUR MENSCH

So we have commercial models already. So to an extent we haven't been open sourcing everything. We are a very young company, but our purpose is to release the best open source models. And then we are basically coming up with an enterprise surrounding and some premium features that we can sell to sustain the business. And so our strategy today, and that might evolve with time, is to have both very strong open source models, but also models that are much stronger actually at that point in time as closed source APIs. The one thing that we focus on also for our commercial models is to make deployment of these models very portable and very flexible. So we have customers to whom we ship the weights and allow them to modify the model, do client side fine tuning the same way they would do it with open source models. And so in that sense, we have some coherence across the commercial family and the open source family.

[AUDIENCE QUESTION ON MAIN USES CASES]

ARTHUR MENSCH

Knowledge management, developer productivity. So coding basically.

[AUDIENCE QUESTION – PLANS TO DO CODING SPECIFIC MODELS?]

ARTHUR MENSCH

Yeah, we have plans. Not doing any announcement today, but we do have plans.

[AUDIENCE QUESTION – NEW ARCHITECTURES AND RESEARCH]

ARTHUR MENSCH

We've been mostly into production at that point because the team was pretty lean, but we're not dedicating a couple of full time employees like finding new architectures, finding well, making research. And I think this is super important to remain relevant. So as we scale, we will be able to afford more exploration. It's also very linked to the compute capacity you have. So if you want to make some discoveries and make some progress, you need to have enough compute. And we're a bit compute bound because of the shortage on H 100, but this is going to improve favorably. So we expect to be doing more research and more exploratory research, I guess because we've been doing research from the.

ELAD GIL

Starter, I guess related to that, it seems like in general, your team has a very strong bias for action, and you move very quickly. How do you select for that in people that you hire? Are there specific things you look for, interview questions you ask?

ARTHUR MENSCH

So we look for AI scientists that can do everything from going down the infrastructure stack to making, extract, transform and load pipelines to thinking about mathematics. So we've been trying to find full stack AI engineers, and they tend to have a strong bias for action. Really, the focus we had is to find low ego people willing to get their hands dirty with jobs that are considered boring by some AI scientists because it's a bit boring. But this has been actually quite productive. And because we focused on the right things and the back. I guess the team is now quite big, so there's a bunch of challenges associated to that. I was surprised by the amount of inbound that we had and the amount of representation that I had to do, especially as we got drawn into political stuff, which we would rather have avoided, but we kind of didn't have a choice. So this was definitely a surprise for me, generally speaking. I was also surprised by the speed we managed to have, because it actually exceeded our expectations. But, yeah, I had pretty little idea of what the job of a funder would be when we started. It's quite fun, but it's effectively surprising. I was imagining myself as still coding after a year, and it's actually no longer the case, unfortunately. But, yeah, that's the price of trying to scale up pretty quickly.

ELAD GIL

You get to do HR coding now, which is even better.

ARTHUR MENSCH

Yeah.

ARTHUR MENSCH

So the reason why we started the company is to have a production arm that creates fission value, to have a research arm. And to be honest, there isn't much demonstration of existence of such organs, because you do have a few research labs that are tied to cloud companies that have a very big top line and using it to sustain research. We think that with AI and with the value that the technology brings, there is a way for doing it. But I guess this still remains to be shown. And that's the experiment we are making with mistral.

ELAD GIL

Probably. One last question. I know Arthur has a hard stop, maybe way in the back there.

[AUDIENCE QUESTION – HOW MUCH PERFORMANCE CAN A SMALL MODEL REALLY HAVE]

ARTHUR MENSCH

Yes, I think you can squeeze it to that point. The question is whether you can have a 7B model that beats Mistral Large. This starts to be a bit tricky, but there might be way. I also expect the hardware to improve, like the local hardware to improve. And so that will also give a little more space and a little more memory. And yeah, I see more potential there, because effectively you're a bit constrained by scaling loads. That tells you that at some point you do saturate the capacity of models of a certain size.

ELAD GIL

What is the main constraint? Or what do you think is the thing that it asymptotes against for scaling loads?

ARTHUR MENSCH

You can make 7B models very strong if you focus on a specific task. But if you want to pour all of the knowledge of the world onto 7GB, well, it's actually quite ambitious. So one thing is, for instance, multilingual models at this size are not a great idea. So you do need to focus on a specific part of the human knowledge. You want to compress I guess one.

ELAD GIL

Last question for me and then we can wrap up is a friend of mine pointed this out to me, which basically, if you think about what you do when you're training a model is you spin up a giant data center or supercomputer and then you run it for n weeks or months or however long you decide to train for, and then the output is a file.

ARTHUR MENSCH

You're basically zipping the world knowledge. It's not much more than that, actually.

ELAD GIL

Yeah. How do you think about either forms of continuous training or retraining over time or sort of longer training runs that get tacked on? I know some people are basically training longer or longer and then dropping a model and then they keep training and then they drop a model. And so I don't know how you think about where the world heads.

ARTHUR MENSCH

Yeah, this is an efficient way of training, so that's definitely interesting for us.

ELAD GIL

Okay, great. Well, please join me in thanking Arthur.

Other Firesides & Podcasts

My book: High Growth Handbook. Amazon. Online.

Markets:

Startup life

A brief guide to startup pivots

4 Ways Startups Fail

Founder Investors and Scout Programs

Co-Founders

Raising Money

Things I Don't Know About AI

2024-02-21 21:37:01

In most markets, the more time passes the clearer things become. In generative AI (“AI”), it has been the opposite. The more time passes, the less I think I actually understand.

For each level of the AI stack, I have open questions. I list these out below to stimulate dialog and feedback.

LLM Questions

There are in some sense two types of LLMs - frontier models - at the cutting edge of performance (think GPT-4 vs other models until recently), and everything else. In 2021 I wrote that I thought the frontier models market would collapse over time into an oligopoly market due to the scale of capital needed. In parallel, non-frontier models would more commodity / pricing driven and have a stronger opensource presence (note this was pre-Llama and pre-Mistral launches).

Things seem to be evolving towards the above:

Frontier LLMs are likely to be an oligopoly market. Current contenders include closed source models like OpenAI, Google, Anthropic, and perhaps Grok/X.ai, and Llama (Meta) and Mistral on the open source side. This list may of course change in the coming year or two. Frontier models keep getting more and more expensive to train, while commodity models drop in price each year as performance goes up (for example, it is probably ~5X cheaper to train GPT-3.5 equivalent now than 2 years ago)

As model scale has gotten larger, funding increasingly has been primarily coming from the cloud providers / big tech. For example, Microsoft invested $10B+ in OpenAI, while Anthropic raised $7B between Amazon and Google. NVIDIA is also a big investor in foundation model companies of many types. The venture funding for these companies in contrast is a tiny drop in the ocean in comparison. As frontier model training booms in cost, the emerging funders are largely concentrated amongst big tech companies (typically with strong incentives to fund the area for their own revenue - ie cloud providers or NVIDIA), or nation states wanting to back local champions (see eg UAE and Falcon). This is impacting the market and driving selection of potential winners early.

It is important to note that the scale of investments being made by these cloud providers is dwarfed by actual cloud revenue. For example, Azure from Microsoft generates $25B in revenue a quarter. The ~$10B OpenAI investment by Microsoft is roughly 6 weeks of Azure revenue. AI is having a big impact on Azure revenue revently. Indeed Azure grew 6 percentage points in Q2 2024 from AI - which would put it at an annualized increase of $5-6B (or 50% of its investment in OpenAI! Per year!). Obviously revenue is not net income but this is striking nonetheless, and suggests the big clouds have an economic reason to fund more large scale models over time.

In parallel, Meta has done outstanding work with Llama models and recently announced $20B compute budget, in part to fund massive model training. I posited 18 months ago that an open source sponsor for AI models should emerge, but assumed it would be Amazon or NVIDIA with a lower chance of it being Meta. (Zuckerberg & Yann Lecunn have been visionary here).

Questions on LLMs:

Are cloud providers king-making a handful of players at the frontier and locking in the oligopoly market via the sheer scale of compute/capital they provide? When do cloud providers stop funding new LLM foundation companies versus continuing to fund existing? Cloud providers are easily the biggest funders of foundation models, not venture capitalists. Given they are constrained in M&A due to FTC actions, and the revenue that comes from cloud usage, it is rational for them to do so. This may lead / has led to some distortion of market dynamics. How does this impact the long term economics and market structure for LLMs? Does this mean we will see the end of new frontier LLM companies soon due to a lack of enough capital and talent for new entrants? Or do they keep funding large models hoping some will convert on their clouds to revenue?
Does OSS models flip some of the economics in AI from foundation models to clouds? Does Meta continue to fund OS models? If so, does eg Llama-N catch up to the very frontier? A fully open source model performing at the very frontier of AI has the potential to flip a subportion the economic share of AI infra from LLMs towards cloud and inference providers and decreases revenue away from the other LLM foundation model companies. Again, this is likely an oligopoly market with no singular winner (barring AGI), but has implications on how to think about the relative importance of cloud and infrastructure companies in this market (and of course both can be very important!).
- One of the most brilliant things in the Llama2 terms of use is the open commercial use of the license if you have fewer then 700 million users[1]. This obviously prevents some large competitors from using their models. But it also means if you are a big cloud provider you need to pay a license to Meta for Llama, which Microsoft has already done. This creates an interesting long term way for Meta to control (& monetize) Llama despite being open source.
How do we think about speed and price vs performance for models? One could imagine extremely slow incredibly performant models may be quite valuable if compared to normal human speed to do things. The latest largest Gemini models seem to be heading in this direction with large 1 million+ token context windows a la Magic, which announced a 5 million token window in June 2023. Large context windows and depth of understanding can really change how we think about AI uses and engineering. On the other side of the spectrum, Mistral has shown the value of small, fast and cheap to inference performant models. The 2x2 below suggests a potential segmentation of where models will matter most.

How do architectures for foundation models evolve? Do agentic models with different architectures subsume some of the future potential of LLMs? When do other forms of memory and reasoning come into play?

Do governments back (or direct their purchasing to) regional AI champions? Will national governments differentially spend on local models a la Boeing vs Airbus in aerospace? Do governments want to support models that reflect their local values, languages, etc? Besides cloud providers and global big tech (think also e.g. Alibaba, Rakuten etc) the other big sources of potential capital are countries. There are now great model companies in Europe (e.g. Mistral), Japan, India, UAE, China and other countries. If so, there may be a few multi-billion AI foundation model regional companies created just off of government revenue.
What happens in China? One could anticipate Chinese LLMs to be backed by Tencent, Alibaba, Xiaomi, ByteDance and others investing in big ways into local LLMs companies. China’s government has long used regulatory and literal firewalls to prevent competition from non-Chinese companies and to build local, government supported and censored champions. One interesting thing to note is the trend of Chinese OSS models. Qwen from Alibaba for example has moved higher on the broader LMSYS leaderboards.
What happens with X.ai? Seems like a wild card.
How good does Google get? Google has the compute, scale, talent to make amazing things and is organized and moving fast. Google was always the worlds first AI-first company. Seems like a wild card.

Infra companies

There are a few types of infrastructure companies with very different uses. For example, Braintrust provides eval, prompt playgrounds, logging and proxies to help companies move from “vibe based” analysis of AI to data driven. Scale.ai and others play a key role in data labeling, fine tuning, and other areas. A number of these have open but less existential questions (for example how much of RLHF turns into RLAIF).

The biggest uncertainties and questions in AI infra have to do with the AI Cloud Stack and how it evolves. It seems like there are very different needs between startups and enterprises for AI cloud services. For startups, the new cloud providers and tooling (think Anyscale, Baseten, Modal, Replicate, Together, etc) seem to be taking a useful path resulting in fast adoption and revenue growth.

For enterprises, who tend to have specialized needs, there are some open questions. For example:

Does the current AI cloud companies need to build an on-premise/BYOC/VPN version of their offerings for larger enterprises? It seems like enterprises will optimize for (a) using their existing cloud marketplace credits which they already have budget for, to buy services (b) will be hesitant to round trip out from where their webapp / data is hosted (ie AWS, Azure, GCP) due to latency & performance and (c) will care about security, compliance (FedRAMP, HIPAA etc). The short term startup market for AI cloud may differ from long term enterprise needs.
How much of AI cloud adoption is due to constrained GPU / GPU arb? In the absence of GPU on the main cloud providers companies are scrambling to find sufficient GPU for their needs, accelerating adoption of new startups with their own GPU clouds. One potential strategy NVIDIA could be doing is preferentially allocating GPU to these new providers to decrease bargaining power of hyperscalers and to fragment the market, as well as to accelerate the industry via startups. When does the GPU bottleneck end and how does that impact new AI cloud providers? It seems like an end to GPU shortages on the main clouds would be negative for companies whose only business is GPU cloud, while those with more tools and services should have an easier transition if this were to happen.
How do new AI ASICS like Groq impact AI clouds?
What else gets consolidated into AI clouds? Do they cross sell embeddings & RAG? Continuous updates? Fine tuning? Other services? How does that impact data labelers or others with overlapping offerings? What gets consolidated directly into model providers vs via the clouds?
Which companies in the AI cloud will pursue which business model?
- It is important to note there are really 2 market segments in the AI cloud world (a) startups (b) mid-market and enterprise. It seems likely that “GPU only” business model default works with the startup segment(who have fewer cloud needs), but for large enterprises adoption may be more driven by GPU cloud constraints on major platforms. Do companies providing developer tooling, API endpoints, and/or specialized hardware, or other aspects morph into two other analogous models - (a) “Snowflake/Databricks for AI” model or (b) “Cloudflare for AI”? If so, which ones adopt which model?
How big do the new AI clouds become? As large as Heroku, Digital Ocean, Snowflake, or AWS? What is the size of outcome and utilization scale for this class of company?
How does the AI stack evolve with very long context window models? How do we think about the interplay of context window & prompt engineering, fine tuning, RAG, and inference costs?
How does FTC (and other regulator) prevention of M&A impact this market? There are at least a dozen credible companies building AI cloud related products and services - too many for all of them to be stand alone. How does one think about exits under an administration that is aggressively against tech M&A? Should the AI clouds themselves consolidate amongst themselves to consolidate share and services offered?

Apps questions

ChatGPT was the starting gun for many AI founders. Prior to ChatGPT (and right before that Midjourney and Stable Diffusion) most people in tech were not paying close attention to the Transformer/Diffusion model revolution and dislocation we are now experiencing.

This means that people closest to the model and technology - ie AI researchers and infra engineers - were the first people to leave to start new companies based on this technology. The people farther away from the core model world - many product engineers, designers, and PMs, did not become aware of how important AI is until now.

ChatGPT launched ~15 months ago. If it takes 9-12 months to decide to quit your job, a few months to do it, and a few months to brainstorm an initial idea with a cofounder, we should start to see a wave of app builders showing up now / shortly.

B2B apps. What will be the important companies and markets in the emerging wave of B2B apps? Where will incumbents gain value versus startups? I have a long post on this coming shortly.
Consumer. Arguably a number of the earliest AI products are consumer or “prosumer” - ie used in both personal and business use cases. Apps like ChatGPT, Midjourney, Perplexity and Pika are examples of this. That said, why are there so few consumer builders in the AI ecosystem? Is it purely the time delay mentioned above? It seems like the 2007-2012 social product cohort has aged out. New blood is needed to build the next great wave of AI consumer.
Agents. Lots and lots of things can happen with agents. What will be strong focused product areas versus startups looking for a use case?

This is one of the most exciting and fast-changing moments in technology in my lifetime. It will be fun to see what everyone builds. Looking forward to thoughts on the questions above.

Thanks to Amjad Masad and Vipul Prakash for comments on a draft of this post.

NOTES

[1] Yes I occasionally read terms of use for fun.

MY BOOK
You can order the High Growth Handbook here. Or read it online for free.

OTHER POSTS

Firesides & Podcasts

Markets:

Startup life

A brief guide to startup pivots

4 Ways Startups Fail

Founder Investors and Scout Programs

Co-Founders

Raising Money

Elad Gil Modify

Rss preview of Blog of Elad Gil

AI Markets Have Crystalized

Markets with more clarity

1. Foundation Models- LLMs

2. Code

3. Legal

4. Medical Scribing

5. Customer service / experience

6. Search and IR re-invention

Future markets that will be important

Model vs GTM vs Team?

Agents everywhere

AI Roll Ups

Market ending moves

Summary - AI markets have crystalized

OTHER POSTS

Fertility

Anti-aging / longevity

Dental

Biomarkers

Other health areas.

NOTES

OTHER POSTS

OTHER POSTS

Examples of problems in Delaware causing companies to leave

State by State Comparisons from Ben Potter & Team at Latham, Watkins LLC.

Who is moving where?

Tradeoffs DE, NV, TX

Legal, governmental, and statutory considerations

Expenses

How to leave Delaware

Politically exposed and founder-controlled companies may be the first to move

OTHER POSTS

LLM Questions

Questions on LLMs:

Infra companies

Apps questions

Elad GilModify

Rss preview of Blog of Elad Gil

AI Markets Have Crystalized

Markets with more clarity

1. Foundation Models- LLMs

2. Code

3. Legal

4. Medical Scribing

5. Customer service / experience

6. Search and IR re-invention

Future markets that will be important

Model vs GTM vs Team?

Agents everywhere

AI Roll Ups

Market ending moves

Summary - AI markets have crystalized

OTHER POSTS

Fertility

Anti-aging / longevity

Dental

Biomarkers

Other health areas.

NOTES

OTHER POSTS

OTHER POSTS

Examples of problems in Delaware causing companies to leave

State by State Comparisons from Ben Potter & Team at Latham, Watkins LLC.

Who is moving where?

Tradeoffs DE, NV, TX

Legal, governmental, and statutory considerations

Expenses

How to leave Delaware

Politically exposed and founder-controlled companies may be the first to move

OTHER POSTS

LLM Questions

Questions on LLMs:

Infra companies

Apps questions

The author's social media

Elad Gil Modify