MoreRSS

site iconLenny RachitskyModify

The #1 business newsletter on Substack.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Lenny Rachitsky

An unbelievable offer: Now get one free year of Cursor, v0, Replit, Lovable, and Bolt pro plans with an annual subscription to Lenny’s Newsletter

2025-04-16 00:47:25

The greatest bundle ever just got so much bigger, and unbelievably better. Now when you subscribe to Lenny’s Newsletter, you’ll get one free year of 10 incredible products, including five of the hottest AI tools in the world right now:

  1. Bolt

  2. Cursor

  3. Lovable

  4. Replit

  5. v0

This is in addition to the existing beloved products we’ve already got in the bundle:

  1. Linear

  2. Notion

  3. Perplexity Pro

  4. Superhuman

  5. Granola

These companies have never offered anything like this before.

You’re getting a $15,000+ value for the price of a yearly newsletter subscription. Using even just one of these products offsets the cost of the newsletter.

This bundle is in addition to the existing benefits of being a paid subscriber: full access to every new newsletter post, along with five years’ worth of previous posts, and invites to a thriving members-only Slack community and local community meetups.

A yearly subscription is an absolute no-brainer. Click here to grab this deal 👇

Subscribe now

Important deal details:

  1. You must have an annual subscription to Lenny’s Newsletter to be eligible for this bundle. Monthly subscribers do not have access to the deal.

  2. Both existing subscribers and new subscribers are eligible.

  3. You must be a new paying customer of the products to take advantage of the free year. If you’ve already paid for one of the products in the bundle before, you won’t be able to get a free additional year of that specific product.

  4. You can though have an existing (free) account with any product, and apply the yearly offer to your existing account. If you don’t already have an account, just create one.

  5. If you request an early refund or chargeback for your subscription, your bundle codes will deactivate (whether they’re active or unused).

Subscribe now

How to redeem the deal:

  1. Once you become a paid yearly subscriber, click here to redeem your codes.

  2. If you’re on a monthly plan, upgrade to yearly, then go here to redeem your codes.

  3. If you’re already a paid annual subscriber, click here to redeem your codes.

Specifics of what you get:

  1. Cursor: One year of the Pro plan ($240 value)

  2. Bolt: One year of the Pro plan ($240 value)

  3. Lovable: One year of the Starter plan ($240 value)

  4. Replit: One year of the Core plan ($240 value)

  5. v0: One year of the Premium plan ($240 value)

  6. Granola: One year of the Business plan for you and your team—up to 100 seats ($10,000+ value)

  7. Notion: One year of the Plus plan (plus unlimited AI) for you and your team—up to 10 seats ($2,000+ value)

  8. Linear: One year of the Business plan—two seats ($336 value)

  9. Superhuman: One year of the Starter plan ($300 value)

  10. Perplexity: One year of the Pro plan ($240 value)

If you have any trouble or questions, email [email protected].

Subscribe now

Sincerely,

Lenny

Everyone’s an engineer now: Inside v0’s mission to create a hundred million builders | Guillermo Rauch (founder & CEO of Vercel, creators of v0 and Next.js)

2025-04-13 19:03:09

Listen now:
YouTube // Apple // Spotify

Brought to you by:

WorkOS—Modern identity platform for B2B SaaS, free up to 1 million MAUs

Vanta—Automate compliance. Simplify security

LinkedIn Ads—Reach professionals and drive results for your business

Guillermo Rauch is the founder and CEO of Vercel, creators of v0 (one of the most popular AI app building tools), and the mind behind foundational JavaScript frameworks like Next.js and Socket.io. An open source pioneer and legendary engineer, Guillermo has built tools that power some of the internet’s most innovative products, including Midjourney, Grok, and Notion. His mission is to democratize product creation, expanding the pool of potential builders from 5 million developers to over 100 million people worldwide. In this episode, you’ll learn:

  1. How AI will radically speed up product development—and the three critical skills PMs and engineers should master now to stay ahead

  2. Why the future of building apps is shifting toward prompts instead of code, and how that affects traditional product teams

  3. Specific ways to improve your design “taste,” plus practical tips to consistently create beautiful, user-loved products

  4. How Guillermo built a powerful app in under two hours for $20 (while flying and using plane Wi-Fi) that would normally take weeks and thousands of dollars in engineering time

  5. The exact strategies Vercel uses internally to leverage AI tools like v0 and Cursor, enabling their team of 600 to ship faster and better than ever before

  6. Guillermo’s actionable advice on increasing your product quality through rapid iteration, real-world user feedback, and creating intentional “exposure hours” for your team

Some takeaways:

  1. There are three essential skills you need to master right now to thrive in the AI-driven product landscape: clearly defining product intent, coaching AI effectively through iteration, and quickly resolving challenges when AI gets stuck.

  2. Despite the fears about AI replacing software engineers, Guillermo emphasizes the continuing value of deeply understanding how software systems work—encouraging engineers and PMs to build technical fluency and generalist knowledge to leverage AI rather than compete against it.

  3. You don’t have to be a designer or experienced engineer to build beautiful products—AI tools like v0 embed best practices and design excellence, allowing anyone to rapidly produce high-quality, production-ready digital experiences.

  4. “Translation tasks” in programming (like converting designs into code) are being automated, while conceptual understanding and eloquence in describing what you want remain crucial.

  5. Develop better product taste by increasing your “exposure hours”—time spent watching users interact with your products and competitors’ products.

  6. When using AI tools like v0, provide references and inspirations rather than being overly prescriptive—the AI might implement solutions better than you would have.

  7. Remember that feature development is “like adopting a puppy”—say nine no’s for every yes, as each feature requires ongoing maintenance.

  8. Don’t hesitate to tell AI tools directly what you don’t like—simple prompts like “make it more jazzy” or “make it pop” can be effective.

  9. Break down large projects into smaller components when working with AI to avoid overwhelming context windows.

Where to find Guillermo Rauch:

• X: https://x.com/rauchg

• LinkedIn: https://www.linkedin.com/in/rauchg/

• Website: https://rauchg.com/

In this episode, we cover:

(00:00) Introduction to Guillermo Rauch

(04:43) v0's mission

(07:03) The impact and growth of v0

(15:54) The future of product development with AI

(19:05) Empowering engineers and product builders

(24:01) Skills for the future: coding, math, and eloquence

(35:05) v0 in action: real-world applications

(36:40) Tips for using v0 effectively

(45:46) Core skills for building AI apps

(49:44) Live demo

(59:45) Understanding how AI thinks

(01:04:35) AI integration and future prospects

(01:07:22) Building taste

(01:13:43) Limitations of v0

(01:16:54) Improving the design of your product

(01:20:09) The secret to product quality

(01:22:35) Vercel’s AI-driven development

(01:25:43) Guillermo's vision for the future

Referenced:

• v0: https://v0.dev/

• Vercel: https://vercel.com/

• GitHub: https://github.com/

• Cursor: https://www.cursor.com/

• Next.js Framework: https://nextjs.org/

• Claude: https://claude.ai/new

• Grok: https://x.ai/

• Midjourney: https://www.midjourney.com

• SocketIO: https://socket.io/

• Notion’s lost years, its near collapse during Covid, staying small to move fast, the joy and suffering of building horizontal, more | Ivan Zhao (CEO and co-founder): https://www.lennysnewsletter.com/p/inside-notion-ivan-zhao

• Notion: https://www.notion.com/

• Automattic: https://automattic.com/

• Inside Bolt: From near-death to ~$40m ARR in 5 months—one of the fastest-growing products in history | Eric Simons (founder & CEO of StackBlitz): https://www.lennysnewsletter.com/p/inside-bolt-eric-simons

• v0 Community: https://v0.dev/chat/community

• Figma: https://www.figma.com/

• Git Commit: https://www.atlassian.com/git/tutorials/saving-changes/git-commit

• What are Artifacts and how do I use them?: https://support.anthropic.com/en/articles/9487310-what-are-artifacts-and-how-do-i-use-them

• Design Engineering at Vercel: https://vercel.com/blog/design-engineering-at-vercel

• CSS: https://en.wikipedia.org/wiki/CSS

• Tailwind: https://tailwindcss.com/

• Wordcel / Shape Rotator / Mathcel: https://knowyourmeme.com/memes/wordcel-shape-rotator-mathcel

• Steve Jobs’s Ultimate Lesson for Companies: https://hbr.org/2011/08/steve-jobss-ultimate-lesson-fo

• Bloom Hackathon: https://bloom.build/

• Expenses Should Do Themselves | Saquon Barkley x Ramp (Super Bowl Ad): https://www.youtube.com/watch?v=p1Tgsy7D0Jg

• Velocity over everything: How Ramp became the fastest-growing SaaS startup of all time | Geoff Charles (VP of Product): https://www.lennysnewsletter.com/p/velocity-over-everything-how-ramp

• JavaScript: https://www.javascript.com/

• React: https://react.dev/

• Mapbox: https://www.mapbox.com/

• Leaflet: https://leafletjs.com/

• Escape hatches: https://react.dev/learn/escape-hatches

• Supreme: https://supreme.com/

• Shadcn: https://ui.shadcn.com/

• Charles Schwab: https://www.schwab.com/

Fortune: https://fortune.com/

• Semafor: https://www.semafor.com/

• AI SDK: https://sdk.vercel.ai/

• DeepSeek: https://www.deepseek.com/

• Stripe: https://stripe.com/

• Vercel templates: https://vercel.com/templates

• GC AI: https://getgc.ai/

• OpenEvidence: https://www.openevidence.com/

• Paris Fashion Week: https://www.fhcm.paris/en/paris-fashion-week

• Guillermo’s post on X about making great products: https://x.com/rauchg/status/1887314115066274254

• Everybody Can Cook billboard: https://www.linkedin.com/posts/evilrabbit_activity-7242975574242037760-uRW9/

Ratatouille: https://www.imdb.com/title/tt0382932/

Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].

Lenny may be an investor in the companies discussed.

🧠 Community Wisdom: Learning to be more concise, reviewing design works, creating a strategy pre-PMF, moving from PM to design, building trust with customer support, and more

2025-04-13 00:01:14

👋 Hello and welcome to this week’s edition of ✨ Community Wisdom ✨ a subscriber-only email, delivered every Saturday, highlighting the most helpful conversations in our members-only Slack community.

Read more

OpenAI’s CPO on how AI changes must-have skills, moats, coding, startup playbooks, more | Kevin Weil (CPO at OpenAI, ex-Instagram, Twitter)

2025-04-10 19:03:32

Listen now:
YouTube // Apple // Spotify

Brought to you by:

Eppo—Run reliable, impactful experiments

Persona—A global leader in digital identity verification

OneSchema—Import CSV data 10x faster

Kevin Weil is the chief product officer at OpenAI, where he oversees the development of ChatGPT, enterprise products, and the OpenAI API. Prior to OpenAI, Kevin was head of product at Twitter, Instagram, and Planet, and was instrumental in the development of the Libra (later Novi) cryptocurrency project at Facebook.

In this episode, you’ll learn:

  1. How OpenAI structures its product teams and maintains agility while developing cutting-edge AI

  2. The power of model ensembles—using multiple specialized models together like a company of humans with different skills

  3. Why writing effective evals (AI evaluation tests) is becoming a critical skill for product managers

  4. The surprisingly enduring value of chat as an interface for AI, despite predictions of its obsolescence

  5. How “vibe coding” is changing how companies operate

  6. What OpenAI looks for when hiring product managers (hint: high agency and comfort with ambiguity)

  7. “Model maximalism” and why today’s AI is the worst you’ll ever use again

  8. Practical prompting techniques that improve AI interactions, including example-based prompting

Some takeaways:

  1. OpenAI’s philosophy of model maximalism—the idea that AI models will improve so quickly that it’s better to build for capabilities that are just emerging rather than extensively scaffolding around current limitations. This approach acknowledges that today’s AI models are “the worst you’ll ever use for the rest of your life,” with capabilities increasing exponentially while costs decrease by orders of magnitude. The pace of improvement is staggering—what once took 6-9 months between model iterations has accelerated to 3-4 months with the O-series models, each representing a substantial leap in capability.

  2. OpenAI structures its AI systems as ensembles of specialized models—similar to how human organizations work. Rather than relying on a single general-purpose model, they deploy multiple specialized models (some fine-tuned for specific tasks, others chosen for speed or cost efficiency) working together to solve complex problems. This mirrors how companies function as collections of specialists with different skills and costs. OpenAI applies this approach internally to handle customer support for 400+ million users with just 30-40 staff members.

  3. Writing effective evals is becoming a core skill for product managers and teams building AI products. These structured tests measure model performance on specific tasks, helping teams understand where models excel (99.95% accuracy) versus where they struggle (60% accuracy). This knowledge fundamentally shapes product design decisions. The quality of evals effectively caps the potential of AI products, as models can only be optimized for what you can measure well.

  4. OpenAI embraces an “iterative deployment” approach, preferring to ship products early and refine them in public rather than perfecting them internally. This collaborative evolution with users acknowledges that everyone is learning about model capabilities together. This philosophy extends to how OpenAI approaches roadmapping—they set directional alignment but expect plans to change as technology evolves, focusing on the planning process rather than rigidly following the plan itself.

  5. While many dismiss chat as a primitive interface that will be superseded, Kevin argues that it may be the ideal interaction model for AI. Chat’s unstructured, flexible nature maximizes communication bandwidth in a way that more structured interfaces cannot. It mirrors how humans naturally communicate and can adapt to any intelligence level—from basic to superintelligent systems. This flexibility makes it a “catchall for every possible thing you’d ever want to express to a model.”

  6. As AI becomes ubiquitous, Kevin predicts that fine-tuned models will proliferate across industries. Consequently, product teams will increasingly include “quasi-researcher, machine-learning engineer types” to customize models for specific use cases. This integration is already happening at foundation model companies but will spread throughout the industry as organizations recognize that generic models can’t match the performance of those fine-tuned for specific domains.

  7. OpenAI maintains velocity through a strongly bottom-up approach to product development, empowering teams to move quickly without extensive consensus-seeking. While they do quarterly roadmapping, they readily discard plans as they learn new information. This philosophy values planning over plans, and accepts that mistakes will happen when moving quickly. Their approach emphasizes ownership, autonomy, and the ability to learn and pivot rapidly.

  8. Kevin describes vibe coding—a collaborative coding approach where developers work alongside AI models like Cursor or Windsurf, accepting most suggestions while providing guidance. Rather than meticulously writing every line, developers maintain a high-level direction while letting models handle implementation details. Kevin believes product teams should increasingly use this approach for prototyping and demos instead of static designs.

  9. A counterintuitive insight is that designing AI experiences often works well when modeled after human behavior. When creating UIs for reasoning models that need time to “think,” OpenAI looked to how humans behave when pondering a difficult question—not going silent, not babbling every thought, but providing occasional updates to maintain engagement. This human-centered approach to AI design creates more intuitive and satisfying user experiences.

  10. Kevin says personalized AI tutoring is potentially “the most important thing AI could do” and is surprised that there isn’t yet a solution serving billions of children. With studies consistently showing dramatic learning improvements from personalized tutoring, and chat interfaces now sophisticated enough to provide it at scale for free, this represents an enormous opportunity to transform education globally, particularly for underserved populations.

Where to find Kevin Weil:

• X: https://x.com/kevinweil

• LinkedIn: https://www.linkedin.com/in/kevinweil/

In this episode, we cover:

(00:00) Kevin’s background

(05:16) OpenAI’s new image model

(08:13) The role of chief product officer at OpenAI

(11:42) His recruitment story and joining OpenAI

(15:59) Working at OpenAI

(18:44) The importance of evals in AI

(24:40) Opportunities in the space

(26:34) Shipping quickly and consistently

(29:47) Product reviews and iterative deployment

(32:53) Winning consumer awareness

(36:03) Designing thoughtful experiences

(40:56) Chat as an interface for AI

(45:21) Collaboration between researchers and product teams

(48:05) Hiring product managers at OpenAI

(53:06) How OpenAI uses AI: vibe coding, AI prototyping, and more

(01:04:34) Raising kids in an increasingly intelligent AI world

(01:08:07) Why Kevin feels optimistic about our AI future

(01:14:20) The AI model you're using today is the worst AI model you'll ever use

(01:17:58) Reflections on the Libra project

(01:21:51) Lightning round and final thoughts

Referenced:

• OpenAI: https://openai.com/

• The AI-Generated Studio Ghibli Trend, Explained: https://www.forbes.com/sites/danidiplacido/2025/03/27/the-ai-generated-studio-ghibli-trend-explained/

• Introducing 4o Image Generation: https://openai.com/index/introducing-4o-image-generation/

• Waymo: https://waymo.com/

• X: https://x.com

• Facebook: https://www.facebook.com/

• Instagram: https://www.instagram.com/

• Planet: https://www.planet.com/

• Sam Altman on X: https://x.com/sama

• A conversation with OpenAI’s CPO Kevin Weil, Anthropic’s CPO Mike Krieger, and Sarah Guo: https://www.youtube.com/watch?v=IxkvVZua28k

• OpenAI evals: https://github.com/openai/evals

• Deep Research: https://openai.com/index/introducing-deep-research/

• Ev Williams on X: https://x.com/ev

• OpenAI API: https://platform.openai.com/docs/overview

• Dwight Eisenhower quote: https://www.brainyquote.com/quotes/dwight_d_eisenhower_164720

• Inside Bolt: From near-death to ~$40m ARR in 5 months—one of the fastest-growing products in history | Eric Simons (founder & CEO of StackBlitz): https://www.lennysnewsletter.com/p/inside-bolt-eric-simons

• StackBlitz: https://stackblitz.com/

• Claude 3.5 Sonnet: https://www.anthropic.com/news/claude-3-5-sonnet

• Anthropic: https://www.anthropic.com/

• Four-minute mile: https://en.wikipedia.org/wiki/Four-minute_mile

• Chad: https://chatgpt.com/g/g-3F100ZiIe-chad-open-a-i

• Dario Amodei on LinkedIn: https://www.linkedin.com/in/dario-amodei-3934934/

• Figma: https://www.figma.com/

• Julia Villagra on LinkedIn: https://www.linkedin.com/in/juliavillagra/

• Andrej Karpathy on X: https://x.com/karpathy

• Silicon Valley CEO says ‘vibe coding’ lets 10 engineers do the work of 100—here’s how to use it: https://fortune.com/2025/03/26/silicon-valley-ceo-says-vibe-coding-lets-10-engineers-do-the-work-of-100-heres-how-to-use-it/

• Cursor: https://www.cursor.com/

• Windsurf: https://codeium.com/windsurf

• GitHub Copilot: https://github.com/features/copilot

• Patrick Srail on X: https://x.com/patricksrail

• Khan Academy: https://www.khanacademy.org/

• CK-12 Education: https://www.ck12.org/

• Sora: https://openai.com/sora/

• Sam Altman’s post on X about creative writing: https://x.com/sama/status/1899535387435086115

• Diem (formerly known as Libra): https://en.wikipedia.org/wiki/Diem_(digital_currency)

• Novi: https://about.fb.com/news/2020/05/welcome-to-novi/

• David Marcus on LinkedIn: https://www.linkedin.com/in/dmarcus/

• Peter Zeihan on X: https://x.com/PeterZeihan

The Wheel of Time on Prime Video: https://www.amazon.com/Wheel-Time-Season-1/dp/B09F59CZ7R

Top Gun: Maverick on Prime Video: https://www.amazon.com/Top-Gun-Maverick-Joseph-Kosinski/dp/B0DM2LYL8G

• Thinking like a gardener not a builder, organizing teams like slime mold, the adjacent possible, and other unconventional product advice | Alex Komoroske (Stripe, Google): https://www.lennysnewsletter.com/p/unconventional-product-advice-alex-komoroske

• MySQL: https://www.mysql.com/

Recommended books:

Co-Intelligence: Living and Working with AI: https://www.amazon.com/Co-Intelligence-Living-Working-Ethan-Mollick/dp/059371671X

The Accidental Superpower: Ten Years On: https://www.amazon.com/Accidental-Superpower-Ten-Years/dp/1538767341

Cable Cowboy: https://www.amazon.com/Cable-Cowboy-Malone-Modern-Business/dp/047170637X

Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].

Lenny may be an investor in the companies discussed.

Beyond vibe checks: A PM’s complete guide to evals

2025-04-08 20:32:04

👋 Welcome to a 🔒 subscriber-only edition 🔒 of my weekly newsletter. Each week I tackle reader questions about building product, driving growth, and accelerating your career. For more: Lennybot | Podcast | Courses | Hiring | Swag

Subscribe now

Annual subscribers now get a free year of Perplexity Pro, Notion, Superhuman, Linear, and Granola. Subscribe now.


I’m going to keep this intro short because this post is so damn good, and so damn timely.

Writing evals is quickly becoming a core skill for anyone building AI products (which will soon be everyone). Yet there’s very little specific advice on how to get good at it. Below you’ll find everything you need to understand wtf evals are, why they are so important, and how to master this emerging skill.

Aman Khan runs a popular course on evals developed with Andrew Ng, is Director of Product at Arize AI (a leading AI company), and has been a product leader at Spotify, Cruise, Zipline, and Apple. He was also a past podcast guest and is launching his first Maven course on AI product management this spring. If you’re looking to get more hands-on, definitely check out Aman’s upcoming free 30-minute lightning lesson on April 18th: Mastering Evals as an AI Product Manager. You can find Aman on X, LinkedIn, and Substack.

Now, on to the post. . .


After years of building AI products, I’ve noticed something surprising: every PM building with generative AI obsesses over crafting better prompts and using the latest LLM, yet almost no one masters the hidden lever behind every exceptional AI product: evaluations. Evals are the only way you can break down each step in the system and measure specifically what impact an individual change might have on a product, giving you the data and confidence to take the right next step. Prompts may make headlines, but evals quietly decide whether your product thrives or dies. In fact, I’d argue that the ability to write great evals isn’t just important—it’s rapidly becoming the defining skill for AI PMs in 2025 and beyond.

If you’re not actively building this muscle, you’re likely missing your biggest opportunity for impact-building AI products.

Let me show you why.

Why evals matter

Let’s imagine you’re building a trip-planning AI agent for a travel-booking website. The idea: your users type in natural language requests like “I want a relaxing weekend getaway near San Francisco for under $1,000,” and the agent goes off to research the best flights, hotels, and local experiences tailored to their preferences.

To build this agent, you’d typically start by selecting an LLM (e.g. GPT-4o, Claude, or Gemini) and then design prompts (specific instructions) that guide the LLM to interpret user requests and respond appropriately. Your first impulse might be to feed user questions into the LLM directly to get out responses one by one, as with a simple chatbot, before adding capabilities to turn it into a true “agent.” When you extend your LLM-plus-prompt by giving it access to external tools—like flight APIs, hotel databases, or mapping services—you allow it to execute tasks, retrieve information, and respond dynamically to user requests. At that point, your simple LLM-plus-prompt evolves into an AI agent, capable of handling complex, multi-step interactions with your users. For internal testing, you might experiment with common scenarios and manually verify that the outputs make sense.

Everything seems great—until you launch. Suddenly, frustrated customers flood support because the agent booked them flights to San Diego instead of San Francisco. Yikes. How did this happen? And more importantly, how could you have caught and prevented this error earlier?

This is where evals come in.

What exactly are evals?

Evals are how you measure the quality and effectiveness of your AI system. They act like regression tests or benchmarks, clearly defining what “good” actually looks like for your AI product beyond the kind of simple latency or pass/fail checks you’d usually use for software.

Evaluating AI systems is less like traditional software testing and more like giving someone a driving test:

  • Awareness: Can it correctly interpret signals and react appropriately to changing conditions?

  • Decision-making: Does it reliably make the correct choices, even in unpredictable situations?

  • Safety: Can it consistently follow directions and arrive safely at the intended destination, without going off the rails?

Just as you’d never let someone drive without passing their test, you shouldn’t let an AI product launch without passing thoughtful, intentional evals.

Evals are analogous to unit testing in some ways, with important differences. Traditional software unit testing is like checking if a train stays on its tracks: straightforward, deterministic, clear pass/fail scenarios. Evals for LLM-based systems, on the other hand, can feel more like driving a car through a busy city. The environment is variable, and the system is non-deterministic. Unlike in traditional software testing, when you give the same prompt to an LLM multiple times, you might see slightly different responses—just like how drivers can behave differently in city traffic. With evals, you’re often dealing with more qualitative or open-ended metrics—like the relevance or coherence of the output—that might not fit neatly into a strict pass/fail testing model.

An example eval prompt to detect frustrated users

Getting started

Different eval approaches

  1. Human evals: These are human feedback loops you can design into your product (i.e. showing a thumbs-up/thumbs-down or a comment box next to an LLM response, for your user to provide feedback). You can also have human labelers (i.e. subject-matter experts) provide their labels and feedback, and use this for aligning the application with human preferences via prompt optimization or fine-tuning a model (aka reinforcement learning from human feedback, or RLHF).

    • Pro: Directly tied to the end user.

    • Cons: Very sparse (most people don’t hit that thumbs-up/thumbs-down), not a strong signal (what does a thumbs-up or -down mean?), and costly (if you want to hire human labelers).

  2. Code-based evals: Utilizing checks on API calls or code generation (i.e. was the generated code “valid” and can it run?).

    • Pros: Cheap and fast to write this eval.

    • Cons: Not a strong signal; great for code-based LLM generation but not for more nuanced responses or evaluations.

  3. LLM-based evals: This technique utilizes an external LLM system (i.e. a “judge” LLM), with a prompt like the one above, to grade the output of the agent system. LLM-based evals allow you to generate classification labels in an automated way that resembles human-labeled data—without needing to have users or subject-matter experts label all of your data.

    • Pro: Scalable (it’s like a human label but much cheaper) and natural language, so the PM can write prompts. You can also get the LLM to generate an explanation.

    • Con: Need to create LLM-as-a-judge (with some small amount of data to start).

Importantly, LLM-based evals are natural language prompts themselves. That means that just as building intuition for your AI agent or LLM-based system requires prompting, evaluating that same system also requires you to describe what you want to catch.

Let’s take the example from earlier: a trip-planning agent. In that system, there are a lot of things that can go wrong, and you can choose the right eval approach for each step in the system.

Standard eval criteria

As a user, you want evals that are (1) specific, (2) battle-tested, and (3) test for specific areas of success. A few examples of common areas evals might look at are:

  1. Hallucination: Is the agent accurately using the provided context, or is it making things up?

    1. Useful for: When you are providing documents (e.g. PDFs) for the agent to perform reasoning on top of

  1. Toxicity/tone: Is the agent outputting harmful or undesirable language?

    1. Useful for: End-user applications, to determine if users may be trying to exploit the system or the LLM is responding inappropriately

  1. Overall correctness: How well is the system performing at its primary goal?

    1. Useful for: End-to-end effectiveness; for example, question-answering accuracy—how often is the agent actually correct at answering a question provided by a user?

Other common areas for eval would be:

Phoenix (open source) maintains a repository of off-the-shelf evaluators here.* Ragas (open source) also maintains a repository of RAG-specific evaluators here.

*Full disclosure: I’m a contributor to Phoenix, which is open source (there are other tools out there too for evals, like Ragas). I’d recommend people get started with something free/open source, which won’t hold their data hostage, to run evals! Many of the tools in the space are closed source. You never have to talk to Arize/our team to use Phoenix for evals.

The eval formula

Each great LLM eval contains four distinct parts:

  • Part 1: Setting the role. You need to provide the judge-LLM a role (e.g. “you are examining written text”) so that the system is primed for the task.

  • Part 2: Providing the context. This is the data you will actually be sending to the LLM to grade. This will come from your application (i.e. the message chain, or the message generated from the agent LLM).

  • Part 3: Providing the goal. Clearly articulating what you want your judge-LLM to measure isn’t just a step in the process; it’s the difference between a mediocre AI and one that consistently delights users. Building these writing skills requires practice and attention. You need to clearly define what success and failure look like to the judge-LLM, translating nuanced user expectations into precise criteria your LLM judge can follow. What do you want the judge-LLM to measure? How would you articulate what a “good” or “bad” outcome is?

  • Part 4: Defining the terminology and label. Toxicity, for example, can mean different things in different contexts. You want to be specific here so the judge-LLM is “grounded” in the terminology you care about.

Here’s a concrete example. Below is an example eval for toxicity/tone for your trip planner agent.

The workflow for writing effective evals

Read more

Become a better communicator: Specific frameworks to improve your clarity, influence, and impact | Wes Kao (coach, entrepreneur, advisor)

2025-04-06 19:03:13

Listen now:
YouTube // Apple // Spotify

Brought to you by:

WorkOS—Modern identity platform for B2B SaaS, free up to 1 million MAUs

Vanta—Automate compliance. Simplify security

Coda—The all-in-one collaborative workspace

Wes Kao is an entrepreneur, coach, and advisor. She co-founded the live learning platform Maven, backed by First Round and a16z. Before Maven, Wes co-created the altMBA with best-selling author Seth Godin. Today, Wes teaches a popular course on executive communication and influence. Through her course and one-on-one coaching, she’s helped thousands of operators, founders, and product leaders master the art of influence through clear, compelling communication. Known for her surgical writing style and no-BS frameworks, Wes returns to the pod to deliver a tactical master class on becoming a sharper, more persuasive communicator—at work, in meetings, and across your career.

What you’ll learn:

  1. The #1 communication mistake leaders make—and Wes’s proven fix to instantly gain buy-in

  2. Wes’s MOO (Most Obvious Objection) framework to consistently anticipate and overcome pushback in meetings

  3. How to master concise communication—including Wes’s tactical approach for brevity without losing meaning

  4. The art of executive presence: actionable strategies for conveying confidence and clarity, even under pressure

  5. The “sales, then logistics” framework—and why your ideas keep getting ignored without it

  6. The power of “signposting”—and why executives skim your docs without it

  7. Exactly how to give feedback that works—Wes’s “strategy, not self-expression” principle to drive behavior change without friction

  8. Practical ways to instantly improve your writing, emails, and Slack messages—simple techniques Wes teaches executives

  9. Managing up like a pro: Wes’s clear, practical advice on earning trust, building credibility, and aligning with senior leaders

  10. Career accelerators: specific habits and tactics from Wes for growing your influence, advancing your career, and standing out

  11. Real-world communication examples—Wes breaks down real scenarios she’s solved, providing step-by-step solutions you can copy today

Some takeaways:

  1. Communication is the highest-leverage career skill: If you’re not getting the reaction you want, focus on improving how you communicate rather than blaming others for not understanding.

  2. The “sales, then logistics” framework: Always sell people on why something matters before diving into how to do it. Even executives who seem rushed need 30 to 60 seconds of context for why this matters now.

  3. Being concise is about density of insight, not brevity: “Being concise is not about absolute word count. It’s about economy of words and density of the insight.” The bottleneck to being concise is often unclear thinking.

  4. Use “signposting” to guide your audience: Words like “for example,” “because,” “as a next step,” and “first, second, third” help readers navigate your ideas without excessive formatting.

  5. The MOO (Most Obvious Objection) technique: Before sharing an idea, spend just a few seconds anticipating the most obvious objections. This simple practice dramatically improves your communication effectiveness.

  6. Speak with accurate confidence: Don’t overstate hypotheses as facts or understate strong recommendations. Match your conviction level to the evidence available.

  7. Give feedback using “strategy, not self-expression”: Focus on motivating behavior change rather than venting your frustrations. “Trim 90% of what you initially want to say and keep only the 10% that will make the person want to change.”

  8. Managing up is about sharing your point of view: Don’t just ask your manager what to do. Present your recommendation with supporting evidence, which reduces their cognitive load and demonstrates your strategic thinking.

  9. The CEDAF delegation framework:

    1. Comprehension: Ensure they understand what needs to be done

    2. Excitement: Make the task meaningful and motivating

    3. De-risk: Anticipate and address potential issues

    4. Align: Confirm mutual understanding

    5. Feedback: Create the shortest possible feedback loop

  10. Create a “swipe file”: Collect examples of effective communication that you can reference later. Even the act of noting these examples trains you to recognize effective patterns.

  11. Small communication improvements compound: “These might seem minor, but (a) it compounds, and (b) all the ‘big things,’ everyone else is already doing. So there’s not a lot of alpha in that.”

  12. Invest time up front: Spending a few extra minutes crafting clear communications saves hours of back-and-forth clarification later. “A little bit more up-front investment reaps a lot of benefits down the line.”

Where to find Wes Kao:

• LinkedIn: https://www.linkedin.com/in/weskao/

• Website: https://www.weskao.com/

• Maven course: https://maven.com/wes-kao/executive-communication-influence

In this episode, we cover:

(00:00) Introduction to Wes Kao

(05:34) Working with Wes

(06:58) The importance of communication

(10:44) Sales before logistics

(18:20) Being concise

(24:31) Books to help you become a better writer

(27:30) Signposting and formatting

(32:05) How to develop and practice your communication skills

(40:41) Slack communication

(42:23) Confidence in communication

(50:17) The MOO framework

(54:00) Staying calm in high-stakes conversations

(57:36) Which tactic to start with

(58:53) Effective tactics for managing up

(01:04:53) Giving constructive feedback: strategy, not self-expression

(01:09:39) Delegating effectively while maintaining high standards

(01:16:36) The swipe file: collecting inspiration for better communication

(01:19:59) Leveraging AI for better communication

(01:22:01) Lightning round

Referenced:

• Persuasive communication and managing up | Wes Kao (Maven, Seth Godin, Section4): https://www.lennysnewsletter.com/p/persuasive-communication-wes-kao

• Making Meta | Andrew ‘Boz’ Bosworth (CTO): https://www.lennysnewsletter.com/p/making-meta-andrew-boz-bosworth-cto

• Communication is the job: https://boz.com/articles/communication-is-the-job

• Maven: https://maven.com/

• Sales, not logistics: https://newsletter.weskao.com/p/sales-not-logistics

• How to be more concise: https://newsletter.weskao.com/p/how-to-be-concise

• Signposting: How to reduce cognitive load for your reader: https://newsletter.weskao.com/p/sign-posting-how-to-reduce-cognitive

• Airbnb’s Vlad Loktev on embracing chaos, inquiry over advocacy, poking the bear, and “impact, impact, impact” (Partner at Index Ventures, Airbnb GM/VP Product): https://www.lennysnewsletter.com/p/impact-impact-impact-vlad-loktev

• Tone and words: Use accurate language: https://newsletter.weskao.com/p/tone-and-words-use-accurate-language

• Quote by Joan Didion: https://www.goodreads.com/quotes/264509-i-don-t-know-what-i-think-until-i-write-it

• Strategy, not self-expression: How to decide what to say when giving feedback: https://newsletter.weskao.com/p/strategy-not-self-expression

• Tobi Lütke’s leadership playbook: Playing infinite games, operating from first principles, and maximizing human potential (founder and CEO of Shopify): https://www.lennysnewsletter.com/p/tobi-lutkes-leadership-playbook

• The CEDAF framework: Delegating gets easier when you get better at explaining your ideas: https://newsletter.weskao.com/p/delegating-and-explaining

• Swipe file: https://en.wikipedia.org/wiki/Swipe_file

• Apple Notes: https://apps.apple.com/us/app/notes/id1110145109

• Claude: https://claude.ai/new

• ChatGPT: https://chatgpt.com/

• Arianna Huffington’s phone bed charging station (Oak): https://www.amazon.com/Arianna-Huffingtons-Phone-Charging-Station/dp/B079C5DBF4?th=1

• The Harlan Coben Collection on Netflix: https://www.netflix.com/browse/genre/81180221

• Oral-B Pro 1000 rechargeable electric toothbrush: https://www.amazon.com/dp/B003UKM9CO/

• The Best Electric Toothbrush: https://www.nytimes.com/wirecutter/reviews/best-electric-toothbrush/

Glengarry Glen Ross on Prime Video: https://www.amazon.com/Glengarry-Glen-Ross-James-Foley/dp/B002NN5F7A

• 1,000,000: https://www.lennysnewsletter.com/p/1000000

Recommended books:

On Writing Well: The Classic Guide to Writing Nonfiction: https://www.amazon.com/Writing-Well-Classic-Guide-Nonfiction/dp/0060891548/

Stein on Writing: A Master Editor of Some of the Most Successful Writers of Our Century Shares His Craft Techniques and Strategies: https://www.amazon.com/Stein-Writing-Successful-Techniques-Strategies/dp/0312254210/

On Writing: A Memoir of the Craft: https://www.amazon.com/Writing-Memoir-Craft-Stephen-King/dp/1982159375

Several Short Sentences About Writing: https://www.amazon.com/Several-Short-Sentences-About-Writing/dp/0307279413/

High Output Management: https://www.amazon.com/High-Output-Management-Andrew-Grove/dp/0679762884

Your Brain at Work: Strategies for Overcoming Distraction, Regaining Focus, and Working Smarter All Day Long: https://www.amazon.com/Your-Brain-Work-Revised-Updated/dp/0063003155/

Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].

Lenny may be an investor in the companies discussed.