MoreRSS

site iconManas J. SaloiModify

A product leader, has held key product management roles at Gojek, Directi, Craftsvilla, CouponDunia and Kore, responsible for product development and growth.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Manas J. Saloi

OpenEvidence vs Harvey

2025-07-20 08:00:00

While Harvey’s specialized legal models still outperform general-purpose LLMs on legal tasks, the margin has dramatically narrowed over the past year.

Seven foundation models now surpass Harvey’s originally benchmarked system on BigLaw Bench, a remarkable shift that has fundamentally altered the company’s strategy and led to their strategic pivot.

Rather than doubling down on proprietary model development, Harvey has repositioned itself around agentic workflows: complex, multi-step AI processes that chain together different models and tools to handle sophisticated legal tasks like drafting S-1 filings or conducting due diligence.

OpenEvidence, a pioneer in medical AI, has taken the opposite path.

As foundation models commoditised the basic medical chat, OpenEvidence built smaller models trained exclusively on peer-reviewed medical literature, an approach that earned them the distinction of being the first AI to score above 90% on the USMLE.

While Harvey now routes queries to whichever foundational model performs best, OpenEvidence continues to rely on its purpose-built medical models that “outperform much larger models on in-domain tasks.”

Two vertical AI companies.

Two radically different responses to the same challenge: the rapid improvement of general-purpose AI.

One pivoted to orchestration and workflows. The other doubled down on domain expertise.

I think OpenEvidence won through its access to locked up data that most frontier foundational models can’t touch.

It licensed the entire New England Journal of Medicine archive and the JAMA network material, still fenced off behind paywalls and robots files, then tuned an open source model on that corpus and wired in retrieval so every answer is backed by a trusted source.

A general model trained on random web crawl rarely sees that text in full, so it can’t quote dosage tables or the latest NEJM trial with confidence.

General LLMs usually never see those journals completely, so they guess dosage tables or miss a brand new randomized clinical trial.

And clinicians notice. In an era where medical knowledge now doubles every 73 days, the only safe shortcut is a model that reads the journals for doctors, daily, and shows its work.

This moat is hard to copy. Publishers discovered what their PDFs are worth and now sell exclusivity, not bulk feeds. Any challenger must either outbid OpenEvidence or scrape, which fails compliance reviews in hospital IT.

Harvey tried the same trick for law but:

  • US case law data already sits in the public domain. Seven million opinions went online via the Caselaw Access Project, free for anyone with a scraper.
  • GPT-4, Gemini, Claude, and all other foundational models already trained on that long before Harvey showed up, so Harvey had no unique competitive advantage when it comes to legal data.

Public case opinions are less exclusive than medical journals, but Harvey still leverages pay walled treatises and LexisNexis data through their partnership, so it does have some proprietary sources. However, these sources haven’t been enough to build an inhouse model that outperforms foundational models.

Harvey now just layers workflows and retrieval on top of whichever foundation model is SOTA. Its ceiling is the underlying model’s ceiling.

Medicine and law also differ in how knowledge ages. Medical evidence becomes outdated quickly; a new randomized clinical trial can change practice overnight. Legal precedent moves slowly through amendments and scattered rulings, so yesterday’s case often remains relevant tomorrow.

Constant re-ingestion buys less goodwill in law compared to cardiology or oncology. That makes continual licensed ingestion far more valuable in medicine than in law.

Vertical AI doesn’t win just because it’s vertical. It wins when it controls data the horizontal giants can’t touch, and when the field punishes outdated information. Medicine checks both boxes, law only one.

In short, OpenEvidence bought a moat by licensing proprietary, fast-updating data.

I expect the next OpenEvidence-like breakout to come from a field where journals are locked up and the half-life of truth is measured in days, not months.

The OpenEvidence playbook

2025-07-19 08:00:00

OpenEvidence acquired over 350,000 doctors within two years by solving a fundamental issue: medical knowledge is expanding faster than any human can keep up with. Two new medical papers appear every minute, doubling the corpus of medical research every 73 days.

It’s humanly impossible for doctors to read everything relevant.

Instead of training a model on the entire public internet, OpenEvidence focused on smaller, specialised models trained exclusively on a trusted corpus: over 35 million peer-reviewed medical publications, with exclusive access to top-tier journals like The New England Journal of Medicine.

Unlike a generalist chatbot like ChatGPT, it presents itself as a decision support tool that gives doctors evidence backed answers. It can instantly search, summarise, and extract the most critical insights doctors need right at the point of care.

But this overload problem isn’t unique to medicine.

Professionals in multiple industries are drowning under the same firehose of information. For investors, lawyers, cybersecurity experts, and researchers in cutting edge fields, the risks of missing key information can be equally severe.

Here’s what makes an industry ripe for an “OpenEvidence-for-X” product:

  • The amount of information professionals must handle is overwhelming and growing exponentially.
  • The cost of missing something crucial is high enough that people will happily pay to avoid it.
  • The content they rely on has clear, authoritative sources like medical journals for doctors or SEC filings for financial analysts.
  • Professionals need fast, actionable answers embedded directly into their existing workflows, rather than standalone tools.

A few industries that ready for their own OpenEvidence:

  • Legal professionals constantly sift through millions of court rulings and regulations. Just missing one new precedent could tank a case. Companies like Harvey.ai and Lexis+ AI have emerged to tackle this.
  • Finance analysts face a relentless barrage: over two million Reuters stories a year, thousands of earnings calls, SEC filings dropping constantly. Time is money in trading.
  • Cybersecurity teams get hit with over 100 new software vulnerabilities per day. Delaying just one security patch might lead to catastrophic breaches.
  • There’s the tsunami of scientific research, especially in rapidly evolving fields like AI, climate tech, and materials science. Every week, thousands of new papers flood databases like arXiv.
  • Patents and regulatory documents are similarly daunting. Over 3 million patents were filed worldwide last year alone. A company missing a critical patent could face costly infringement lawsuits.
  • Regulatory professionals overwhelmed by thousands of pages of new rules each month would greatly benefit from a tool that translates complex legislation into actionable briefs and compliance checklists, saving countless hours.

These industries share the following design patterns:

  • They rely on authoritative sources: court documents from PACER, SEC filings from EDGAR, vulnerabilities from NVD databases.
  • Regulatory or professional duty to “stay current.”
  • They require pinpoint granular semantic indexing, providing exactly the right information down to the paragraph.
  • Every recommendation must link back explicitly to original sources to maintain trust and compliance.
  • Solutions can integrate seamlessly into users’ daily tools.

AI startups that nail these details see rapid adoption, and users willingly pay for reliability and precision. The market is already validating this need. BloombergGPT is embedding deep semantic searches directly into financial workflows. Harvey.ai is embedding itself in legal processes. GitHub Dependabot is automating software security upgrades.

Each solves real problems with specialised AI, similiar to what OpenEvidence did in medicine.

Any industry overwhelmed by growing complexity, facing its own information overload crisis, where the consequences of a single missed detail can be severe, is primed for disruption by its own version of OpenEvidence.

The best vertical AI software will win by controlling data that the horizontal LLM foundational model giants can’t touch, and when the field punishes outdated information.

If you are scouting for vertical AI opportunities, you need to secure proprietary data through exclusive licenses, prioritize retrieval quality, and focus on domain-specific UX by integrating seamlessly into your users’ daily workflow like OpenEvidence did.

Mamoon Hamid: From Box to Glean

2025-07-10 08:00:00

Back in 2005, when Mamoon Hamid joined USVP as a fresh-faced engineer turned investor, the tech world was still figuring out what came after the dot com crash.

When he first met Aaron Levie in 2006, Box was a simple file sharing service competing in a crowded market. But Mamoon saw beyond the product to the underlying shift.

Files trapped on individual computers were moving to the cloud. Collaboration was becoming the killer feature. The browser wasn’t just an access point, it was enabling an entirely new way of working.

The pattern was clear: move the system of record to the cloud, expose it through the browser, and collaboration comes “for free.” That framing helped Mamoon ride the entire Web 2 wave.

Mamoon Hamid ended up backing Box, Yammer, Slack and Figma long before they were household names.

He believed a browser delivered, cloud hosted experience would beat desktop software and on prem servers for most business workflows. The browser was a better interface. It was a fundamental shift in how software could be delivered and consumed. No more installation CDs. No more version conflicts. Just open a browser and start working.

[To hear him discuss this in more detail, listen to his appearance on Jack Altman’s podcast.]

Mamoon’s framework to spot fundamental platform shifts was simple. He looked for three key indicators:

  1. New interface paradigm - The browser democratized access to software.
  2. Shift in data location - From local to cloud, from siloed to shared.
  3. Change in user behavior - From individual productivity to collaborative workflows.

Just as the browser replaced desktop applications, natural language is replacing graphical interfaces. Why navigate through fifteen menus when you can just ask for what you want? Why navigate fifteen dropdown menus when you can type “show me deals closing this quarter that need attention”?

Data isn’t just moving location, it’s transforming entirely. In the cloud era, data was passive. It sat there waiting for you to query it. Now it’s active. Your CRM doesn’t wait for you to check on leads. It identifies opportunities, drafts outreach emails, and schedules follow-ups before you even ask.

And user behavior? We’re shifting from reactive to proactive tools. Your CRM doesn’t wait for you to check on leads. It identifies opportunities and drafts outreach emails, for you to review and send.

The browser era was about a centralised system of record accessed through a point and click GUI. The AI era is about an autonomous system of action, driven by natural language and intent. The insight this time is that the marginal cost of cognition is collapsing. Storing data is table stakes, turning it into action is where value accrues.

The browser thesis wasn’t just about data accessibility, it was about collaboration becoming the atomic unit of work. The killer insight was that when data moved online, work itself transformed from individual tasks to shared workflows.

So the AI era question shouldn’t just be about cheap intelligence. It should capture how work fundamentally reorganizes when intelligence is embedded everywhere.

“What happens when the atomic unit of software shifts from features to entire workflows that complete themselves without user input?”

“What new value emerges when software can both decide and act in real time, across every context at near zero marginal cost?”

What Mamoon understood about Web 2.0 and what applies even more strongly to AI is that platform shifts compound.

Box moved file storage to the cloud, paving the way for Figma to let designers collaborate online and for Slack to bring team conversations into the same space.

With AI, this compounding is happening even faster. Every workflow automated creates demand for adjacent automations. Every successful AI product educates users and reduces resistance to the next one.

This framework for identifying platform shifts has positioned Mamoon and Kleiner perfectly for the AI wave.

Kleiner’s investments in Glean (enterprise search powered by AI), Harvey (AI for legal work), and Ambience Healthcare (AI medical scribe) all follow the same pattern: they’re not adding AI features to existing interfaces, they are rebuilding entire categories with AI at the core.

[Edited using Claude. Any errors in interpretation are my own.]

Hero at cost play

2025-06-21 08:00:00

I was exploring the stablecoins ecosystem the other day and someone mentioned Aspora. They started as Vance, a remittance service for NRIs. If you check out their site, they promise near Google exchange rates for NRIs sending money home. Fees are minimal, sometimes even zero. Compared to banks or even Wise, their pricing looks too good to be true.

Aspora’s remittance product seems great for users but terrible as a standalone business. They’ve proudly announced hitting $2 billion in annualised transfer volume and saving customers €15 million in cumulative fees. That sounds impressive until you realise that someone else, Aspora’s investors, is subsidising that generosity.

Then why are we talking about Aspora? There are already enough VC-subsidised, loss-making startups.

Because if you think deeply, Aspora isn’t actually trying to build a profitable remittance business. What they’re doing is a classic hero feature play as an initial wedge strategy. The goal isn’t remittances, it’s to become the banking super app for the entire Indian NRI diaspora.

Remittance is just a wedge to acquire customers, Aspora will have multiple ways to monetise them later.

We’ve seen this playbook before. Parker Conrad handed out free HR and payroll software to US startups through his company, Zenefits. There was no catch for the customer. Zenefits made its money by getting licensed as a health insurance broker, earning about $450 per employee per year in commissions whenever a company bought benefits through its platform.

Zenefits was using payroll data to cross-sell lucrative health insurance policies. It was never a payroll company, it was an insurance company.

The parallels are impossible to ignore.

Both companies solve a painful, everyday headache, either moving money home or managing employees, for little to no cost.

Both collect critical data that competitors have to beg for later, like regulatory KYC (Know Your Customer) or payroll records. And both make their real profit on a regulated, sticky back-end product. For Zenefits it was insurance, and for Aspora it will be a full suite of banking products.

This is what I call the ‘hero-at-cost(or even loss)’ playbook. What looks like the initial hero feature is a product sold at cost or is a loss leader, while the actual product that helps monetize the eventually sticky user base comes later.

The wedge product (which is the hero product initially, and not a thin MVP) is priced so aggressively that it’s a no-brainer for customers to adopt. Once you have the users and their data locked in, you cross-sell high-margin services. The model works because of a few key levers. The initial problem has to be huge, like remittance pain for 30 million NRIs or payroll pain for every startup. The core, high-margin product needs to have high switching costs, like a bank account tied to your KYC (Know Your Customer) or a health plan tied to an annual renewal. And most importantly, the free wedge has to quietly collect the compliance and usage data you need for the upsell.

Meanwhile, Aspora’s cheap money transfers pull customers in, build trust, and crucially collect their KYC and financial data. Once Aspora locks in a large enough base, they’ll introduce high margin banking products like loans, insurance, or investment management tailored to NRIs.

NRIs transferring money home face an everyday headache. Banks charge absurd markups. Hidden fees pop up constantly. Aspora kills that pain. Users become fiercely loyal advocates, spreading the word through WhatsApp groups and family chats. Over 55% of Aspora’s new users come via referrals (essentially zero marginal customer acquisition cost).

Of course, this playbook has its risks. Especially if you are building in a space that regulators monitor aggressively. Zenefits found that out the hard way when state insurance commissioners forced them to abandon their free software strategy, calling it an illegal inducement. Aspora will likely face similar scrutiny. Indian regulators, in particular, have shown they can move swiftly when consumer data is involved.

You can spot these wedges in the wild if you know what to look for.

  1. Find a high-frequency, overpriced task that can be driven to a near-zero marginal cost.
  2. Look for a high-margin back-end product that relies on the same data or license.
  3. The trigger from the free wedge to the paid core should feel almost automatic, like payroll to insurance.
  4. The data or compliance rails should get stronger with every new user, making it harder for competitors to lure away your customers.

The hero-at-cost strategy works when the customer’s pain is acute, the back-end margin is fat and defensible, and the rails, whether data or licenses, connect the two. Aspora’s Google-rate remittance play is not charity. They’re the on-ramp to a cross-border bank play, starting with NRIs and then immigrants from more nations.

Hidden Assets: Disney’s film vault and vLex’s global law library

2025-06-20 08:00:00

“Disney’s a perfectly marvellous company, but it’s also very high-priced. Part of what it does is make ordinary movies, which is not a business that attracts me at all. However, part of what Disney has is better than a great gold mine. I mean, those videocassettes… Disney is an amazing example of autocatalysis.” They had all those movies in the can. They owned the copyright. And just as Coke could prosper when refrigeration came, when the videocassette was invented, Disney didn’t have to invent anything or do anything except take the thing out of the can and stick it on the cassette. And every parent and grandparent wanted his descendants to sit around and watch that stuff at home on videocassette. So Disney got this enormous tailwind….” - Excerpt from Poor Charlie’s Almanack.

While reading Poor Charlie’s Almanack I came across this idea of under-appreciated assets suddenly becoming engines of growth. A great example was Disney. The studio had already spent the money to create Snow White, Bambi, and every other classic. When home video arrived, Disney simply moved those films from theatre reels to VHS tapes. Parents could now buy the stories for rewatching. The cost to repurpose the content was tiny compared with the fresh wave of revenue that followed.

Munger called this kind of self-reinforcing payoff autocatalysis. In chemistry, a reaction speeds itself up once it starts. In business, a stored asset can suddenly compound when a new distribution channel, technology, or behavior emerges. Disney’s copyrights met the VCR and value snowballed.

The legal world has now got its own “films in the can” story.

Since 2000, Barcelona-based vLex has been collecting and digitising case law, statutes, journals, and court filings from more than one hundred countries. It kept licensing new collections, cleaning the text, and adding structured metadata. For years this looked like a niche publishing play.

The game changed with the rise of large language models. These models crave high-quality, full-text data that is legally cleared for use. Public web crawls rarely include full court opinions or premium law reviews. vLex, on the other hand, held a billion-document archive that was well indexed and citation ready. What once seemed like a quiet reference library turned into prime fuel for legal AI.

Private-equity backers stepped in, vLex merged with U.S. research company Fastcase, and in 2025 practice-management giant Clio agreed to buy the combined company for about a billion dollars. Clio’s user base of hundreds of thousands of lawyers is the distribution channel, while vLex’s data vault is the content. Together they plan to deliver research, drafting, and analytics tools powered by that proprietary corpus.

The parallel to Disney is striking. Disney’s core asset was its deep library of animated films; vLex’s was its vast archive of legal documents. The catalyst for Disney was the VCR; for vLex, it was generative AI.

The cost to repurpose the content was minimal in both cases.

So how do you spot one of these sleeping assets? There are a few tests: First, is it durable? Will the asset still matter as technology changes? Great stories last for generations, and authoritative law will always matter in court. Second, is it transferable? Can you move the asset into a new format at a low cost? Films moved from reels to tapes, and structured case law moves easily from a search database into an AI workflow. Finally, do you control the rights? If you don’t own the content, you can’t monetise it when a new format comes along.

A Playbook for Unlocking Hidden Assets:

  1. Inventory your vault. List every dataset, archive, and expert annotation your organisation controls.
  2. Clean and enrich now. Metadata and quality checks feel like overhead until a catalyst arrives.
  3. Acquire adjacent libraries while prices are low. vLex snapped up Justis Publishing before legal AI drove valuations higher.
  4. Secure distribution. Pair the asset with a channel that puts it in front of users daily.
  5. Wrap with intelligent tools. Content alone is not value; you need tools to convert it into user value.

Munger said you know you are onto something special when “you accomplish A and suddenly get A plus B plus C.” Disney’s film vault hit fast-forward with the VCR. vLex spent two decades building the world’s law vault, and AI turned that investment into a billion-dollar outcome.

I am trying to figure out which overlooked archive in another industry is poised for a similar AI-powered economic breakthrough.

Throne ad

2025-06-19 08:00:00

I really like the idea of ThroneScience.

One of my favourite VC frameworks is: take what only the wealthy can get today and make it available to everyone. But what if we could offer people something that was once fit only for royalty?

I wanted to play around with this idea.

Throne brands itself as the Whoop for your poop. Imagine Apple designing a toilet based health tracker.

The founder had shared their slide deck yesterday, so I tried out a few ad ideas. The numbers are from their deck.

Throne ad 1Throne ad 2

I tried 2 formats with slightly different copy and alignment. These exercises are part of my effort to improve my skills in design and marketing.