MoreRSS

site iconLenny RachitskyModify

The #1 business newsletter on Substack.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Lenny Rachitsky

Sonnet 5 review: I ran 64 generations to find out if it's worth it

2026-07-01 07:22:23

I’ve been testing every major frontier model release since the start of the year, and when Anthropic dropped Sonnet 5, I wanted more than a vibe check. I got tired of one-off tests I couldn’t repeat or compare over time, so I built something better: the How I AI Bench, a repeatable eval harness I constructed live using Claude Code while recording this episode. I ran Sonnet 5 blind against four other frontier models (Sonnet 4.6, Opus 4.8, GPT-5.5, and Gemini 3 Pro) across PRD quality, prototype generation, agentic task completion, and agent personality. The results were not what I expected.

Listen or watch on YouTube, Spotify, or Apple Podcasts

What you’ll learn:

  1. What Anthropic claims Sonnet 5 improves over Sonnet 4.6, and where the benchmark data actually backs that up

  2. How I built the How I AI Bench in under 45 minutes using Claude Code, starting from my own stored session history

  3. Why I combined human vibe scoring (70%) with LLM as judge scoring (30%) instead of trusting either alone

  4. How to set up a local HTML scoring page so you can rate AI outputs on gut feel and export those scores as JSON

  5. Which model I recommend for PRDs, which for complex prototypes, and which for chatting with an agent daily


Brought to you by:

Runway—The creative AI platform for images, video and more

Hyperagent—Deploy fleets of agents that handle real work

In this episode, we cover:

(00:00) Sonnet 5 is out

(01:55) What Anthropic claims

(04:02) Why I’m done with one-off vibe checks

(05:05) Building the How I AI Bench live with Claude Code

(07:42) The scoring system

(10:43) Agent voice eval

(11:57) Quick recap

(13:58) Results: The How I AI index leaderboard

(21:21) What I’m improving for the next run

(22:16) Generating a Claire-weighted index

(23:53) Model-by-task recommendations

Tools referenced:

• Claude Sonnet 5: https://www.anthropic.com/news/claude-sonnet-5

• Claude Opus 4.8: https://www.anthropic.com/news/claude-opus-4-8

• GPT-5.5 (OpenAI): https://openai.com/index/introducing-gpt-5-5/

• Gemini 3 Pro (Google DeepMind): https://deepmind.google/models/gemini/pro/

• Cursor: https://www.cursor.com/

Other references:

• SWE-bench Pro (agentic coding benchmark referenced): https://www.swebench.com/

Where to find Claire Vo:

ChatPRD: https://www.chatprd.ai/

Website: https://clairevo.com/

LinkedIn: https://www.linkedin.com/in/clairevo/

X: https://x.com/clairevo

Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].

How top PMs increase their leverage with AI

2026-06-30 21:31:39

👋 Hey there, I’m Lenny. Each week, I answer reader questions about building product, driving growth, and accelerating your career. For more: Lenny’s Podcast | Lennybot | How I AI | My favorite AI/PM courses, public speaking course, and interview prep copilot

Subscribe now

P.S. Get a full free year of Google AI, Cursor, Lovable, Notion, Manus, Replit, Gamma, n8n, Canva, ElevenLabs, Factory, Wispr Flow, Fin, Supabase, Bolt, Linear, PostHog, Framer, Railway, Granola, Warp, Gumloop, Magic Patterns, Mobbin, Stripe Atlas, and ChatPRD, by becoming an Insider subscriber. Yes, this is for real.


For years, the PM job drifted toward coordinating and aligning people. That version of the role is fading. Today, the best PMs at the best companies are prototyping with real code, querying data conversationally with MCP, confidently running coding AI agents, and finding countless ways to increase their leverage with AI.

To help you navigate this shift, I’m excited to announce that I’ve co-developed a course with my frequent collaborator and new Head of Education, Colin Matthews. It’s called Become an AI-Native Builder, and Colin will teach you how to best use the latest AI tools—including Codex, Claude Code, Cursor, and many of the Product Pass products—in your day-to-day work as a PM. You’ll use skills and MCPs to support discovery, create prototypes using your real codebase, ship changes to production using GitHub, and set up evals to automate and improve the quality of your work. This is the course I’ve always wished existed. And it’s not just for PMs. It’s great for designers, ops, researchers, sales, and basically anyone who’s non-technical and wants to get their hands dirty.

The first cohort kicks off on July 13th. If you’re a Lenny’s Newsletter annual subscriber you’ll get $600 off (and Insiders get $1,000 off). Grab your discount code here and sign up today.

In addition to this course, Colin is hosting a handful of free workshops with leaders from OpenAI, Cursor, Linear, Replit, and Lovable. These will be live and hands-on, and you’ll learn and practice new skills alongside other people in the trenches. These workshops are exclusive to paid Lenny’s Newsletter subscribers, and you can sign up for them here.

Colin is one of the most talented instructors of AI that I’ve come across. We’ve done four guest posts together (one is my third-most-popular post of all time) and, like me, he’s low on hype and high on pragmatic advice. He’s taught AI and other technical skills to tens of thousands of PMs at leading companies like OpenAI, Google, Stripe, Figma, Microsoft, and more. He’s a longtime product leader, a founder, and has shipped more than 10 SaaS products solo.

To mark the launch of this course, Colin wrote a guest post that will help you understand what’s possible with AI right now and where you stand on the “ladders of leverage.”

Let’s get into it.


I’ve trained over 30,000 PMs on how to integrate AI into their workflows. Early this year, I noticed a shift. Whereas before I heard from executives that they expected teams to be using AI for just basic prototyping and general productivity, now they’re increasingly looking for their employees to use AI to complete entire tasks. After building bespoke training programs to level up product and design ICs across industries like healthcare, legal, and streaming, I saw two main gaps: knowing what level of AI to reach for and having the technical skills to leverage AI tools to their fullest.

I wanted to share a forward-looking framework on how the most AI-native PMs are operating in mid-2026, and how you can create much more leverage for yourself with AI.

Think of this framework as three ladders, each for a different type of leverage that AI can give you.

Personal leverage helps you check items off your own to-do list at work. Product leverage accelerates your ability to ship the right things more quickly. And systems leverage helps build repeatable steps to consistently outsource work to AI and get high-quality results.

As you ascend each ladder rung, you get an order of magnitude more leverage. On the first rung, you use AI for assistance in your own work. On the second rung, you pass tasks to AI and review the output. At the top of the ladder, AI completes multi-step tasks and checks its own results. You will always apply some level of review at the end, but increasing leverage frees you up for other work.

Not every task, workflow, or company demands that you move up to the highest rung to get the most out of AI. The right rung on each ladder is about the best use of AI for the work in front of you.

I’ll walk through all three ladders, with examples you can start using today. Let’s get into it.

The personal leverage ladder

This is the most common way we all use AI at work: drafting docs, researching, or creating small artifacts. Most PMs are already pretty capable here, but it’s worth detailing the rungs so that you can see where you’re at and where you might go.

  • Rung 1: You use AI to write text. You’re using AI to help you with PRDs, Jira tickets, emails, etc. You then copy and paste answers into other tools. Most people are at this rung, or even lower. Don’t feel bad if you’re hanging out here. There’s so much opportunity!

  • Rung 2: You use AI to create artifacts. Think slides, basic Excel models, or small prototypes. Instead of generating text for you to copy, AI generates the actual artifact.

  • Rung 3: You get AI to complete a full to-do item for you. You’ve connected your LLM to external products like Amplitude, Google Drive, Notion, and Canva so it can pull and push information as needed. You run a prompt or skill to complete a task you might have handed off to a colleague before, like reading through customer support tickets or analyzing A/B test results.

Let’s walk through an example of each rung to illustrate baseline expectations.

At the beginning, you’re simply talking to the AI. For example, if I wanted to create a PRD, I might ask Claude to help me write it with a prompt like this:

The AI has very little context on your company or what a good PRD looks like. You’ll likely talk back and forth until you get a good enough result, then copy-paste to Google Docs or Word to improve it before sharing with your team.

Next is getting the AI to do work instead of just helping you do the work. Continuing with the last example, you could have Claude generate a financial model that shows the cost of hosting an agent yourself vs. using a managed service like Vercel.

Here’s a prompt I recently used:

Create a model that represents costs if we build and host ourselves vs. using managed agents. Do research on the engineering time saved and the compute costs in self-hosted vs. managed. Look at other vendors, like Cloudflare, Vercel, or E2B that provide sandboxes for agents for pricing. Demonstrate both the cost of the pilot and the cost at scale in the model, assuming we have 5M+ agent instances running annually (where an agent instance is per hour).

And here’s what the resulting model looks like:

You can check out this generated model here.

As mentioned, you should expect the output to need significant revision at this rung, but it’s a step forward from copy-pasting text from AI into a separate document.

You’re at the highest rung of personal leverage if you are able to delegate complete to-do items to AI. To illustrate this, I’ll use a fictional product called Stride. Stride is a Strava clone, where athletes can share their performance for running, swimming, and other activities. Let’s say I wanted to do a retention analysis of users who share their exercises with an attached photo, and compare that against the cohort who don’t share photos. To complete this task, I’ll connect my LLM of choice (Claude) to my product analytics software (PostHog) and give it instructions to run this analysis for me. This was my prompt:

Use PostHog to check if users who use social share features have a higher 30d retention than those who don’t. Show me an html doc as a final output visualizing cohorts and any other useful data. Cite all your sources so I can validate.

And here’s the result:

In the past, this would have been a task you’d fit in between meetings. Instead, you’re handing it off to an LLM to complete end-to-end. I’d recommend including “cite your sources” so you can easily validate if the output is correct. In this case, Claude provides links directly to the source data in PostHog.

Pro tip: To allow models to complete tasks for you (and thus move to this rung in your personal work), you’ll need to connect your LLM to the products you use frequently via MCP. Claude Code, Codex, and Cursor can all connect to tools like Figma, Amplitude, PostHog, Pendo, and more via MCP. This may sound complicated, but it’s really easy to do, and once you create this connection, you’ll never have to touch it again. Simply navigate to your product’s connectors marketplace and add your tools (Claude, ChatGPT, Gemini).

Once you have your connectors set up, try completing a common task using your AI, like:

  • Analyzing how a launch went by reviewing recent customer tickets and online sentiment

  • Checking how many users actually use a feature through product analytics events

  • Summarizing a recording from a customer call and creating a prototype based on their feedback

  • Updating your next sprint based on a change in roadmap priorities

You’ll likely find the results disappointing at first, but that’s just because the model doesn’t know how to meet your standards yet. Continue iterating with the model until you have a good result, then lock in the workflow by creating a skill—just ask your LLM to create one in the same chat you completed the task in.

You can repeat this workflow to get reasonable first drafts for almost anything: PRDs, roadmaps, marketing assets, survey analysis, prototypes, Figma mocks, and more. I totally acknowledge that the quality may not be perfect, but we’ll come back to that when we talk about systems leverage.

The product leverage ladder

Product leverage closes the gap between what you want to build and what you can ship, even without strong design or engineering chops. The way to leverage AI for this ladder breaks down into three rungs:

  • Rung 1: You create web-based prototypes. These communicate your ideas better than docs, but the prototype itself doesn’t have any value beyond communication. And your prototype’s code is independent from your product—it doesn’t use your existing codebase.

  • Rung 2: You create code-based prototypes. Claude Code or Codex accesses your real codebase as the context for generating prototypes, instead of screenshots or complex prompts.

  • Rung 3: You get an agent to ship changes to production as pull requests. An engineer picks up the PR, reviews, and merges your code into the product.

Let’s spend a bit more time with each of these.

Using AI tools like Lovable, Replit, and Magic Patterns is an amazing way to quickly and easily create a prototype. You can use this to share a concept or design with stakeholders, customers, or internal team members, allowing you to validate whether your solution is usable and solves the customer problem faster than ever.

Let’s revisit Stride as an example. Here’s what the profile page currently looks like:

Customers have been complaining that they don’t have clear cancellation paths and are confused about their subscription status after a free trial. You can use AI prototyping to create a quick mock to test with customers and stakeholders if one or more of your solutions address this problem:

This helps get to the right solution by testing more concepts faster, but typically the underlying code doesn’t have any value. Web AI prototyping tools have limited context on our real components, pages, and data models, so this prototype is removed from reality. That means that it will take more work to translate the tested prototype to real code.

The next rung up is to ask AI to prototype with your real product. This does not require running the full product on your laptop—or an expert-level coding ability—but you will need some technical skills. First, use a product like Claude Code or Codex to write code and run your app. Second, you’ll need a codebase that contains your UI but not your full backend; more on that later.

You can create the same prototype, this time asking Claude Code or Codex to use your existing codebase:

This time, the cancellation flow is added to the existing settings page, uses real components, and follows the design patterns of the actual product.

To create a codebase or repo that’s easy to use and doesn’t require running the full backend, pair with an engineer and have them use a prompt like this on your main codebase:

Create a new repo that contains all of the base UI elements, styles, routes, pages, and components for [list parts of the product you want included]. Create a mock data store that mimics the API data model and is stored locally. I should be able to run the resulting repo without any environment variables or backend services.

Once complete, clone the new repo to your computer. You should have an easy-to-run version of your UI that you can’t accidentally mess up. Sometimes your engineer will tell you this process is more complicated to get running than a simple prompt. It is very much worth the effort to build prototypes on your real styles and components, so do your best to get over the hump!

The last rung in product leverage is getting an agent to ship code to production. This is a great example of where your technical judgment as a PM is critical. It’s possible that your billing change is only in the UI, and all backend APIs, data elements, and events already exist. It’s also possible that this would require new infrastructure or integrations with another team to ship.

As a PM, it does not make sense to spend time being a worse engineer than the rest of your team. Knowing when you should write a doc, ship a prototype, or create a PR is as important as the technical ability to complete these tasks. PRs are great for copy changes, small UI/UX tweaks, and changes to views that use existing backend code.

The systems leverage ladder

Read more

🎙️ How I AI: GLM-5.2 review & How Gusto built a new product line with Claude Code

2026-06-29 23:02:35

GLM-5.2: why I’m replacing Opus in Claude Code with this new model

Listen now on YouTubeSpotifyApple Podcasts

Brought to you by:

  • Mercury—Radically different banking, loved by over 300K entrepreneurs

Claire tests GLM-5.2, the new open-weight model from Z.ai, inside her actual ChatPRD codebase. She runs it through codebase audits, UI redesigns, and a 45-minute autonomous bug-hunting task in Cursor and Claude Code, and breaks down where it surprised her, where it struggled, and why it may be good enough to replace Opus for some coding workflows.

Biggest takeaways:

  1. Open-weight models are no longer a hobbyist curiosity—they are production-grade alternatives. GLM-5.2, built by Beijing-based Z.ai, benchmarks near Claude Opus 4.8 and above GPT-5.5 on SWE Bench Pro, with a million-token context window and full support for reasoning mode, function calling, structured output, and context caching. The decision is no longer about capability ceilings but, instead, about cost, control, and vendor dependency. Claire’s live testing confirmed it: this is not a toy.

  2. Self-hosting changes the vendor power dynamic in ways that matter at scale. Open-weight means the trained model weights are publicly available, letting teams run inference on their own hardware, fine-tune on proprietary data, and route around any single provider’s API terms. When frontier labs change pricing or policy, teams using open-weight models can switch inference providers without touching a line of application code. The key: you’re not locked in.

  3. Getting GLM-5.2 running in Cursor took 30 minutes, and Claire documented the undocumented part. Route your API key through Open Router, override the OpenAI base URL in Cursor’s settings to openrouter.ai/api/v1/cursor (the /cursor suffix isn’t documented anywhere), and add z-ai/glm-5.2 as a custom model. Claude Code requires two environment variable changes and one edit to claude/settings.json. Total time: under an hour, once you have the exact strings.

  4. The 45-minute autonomous task revealed both the ceiling and the floor. Claire gave GLM-5.2 a single prompt inside Claude Code: pull the last 72 hours of Sentry errors and Vercel logs, then build a prioritized bug-fix plan. Over 45 minutes, it ran MCP tool calls, authenticated into external services, and produced a dark-mode engineering canvas with 20 Sentry errors, five Vercel log signals, and 14 planned fixes, including two P0s Claire hadn’t spotted through normal monitoring. The model surfaced signal-to-noise issues in their error pipeline that weren’t showing up elsewhere.

  5. It hit a wall with React, then recovered. During the long-running task, GLM-5.2 struggled with TypeScript compilation errors before eventually producing clean React output. Claire’s read: HTML and CSS generation is reliable; React under agentic, multi-step pressure is shakier. For teams whose codebase is primarily React (she estimates it covers 98% of her own use), this is the friction point to test before committing the model to critical paths.

  6. The cost math is striking: $3.36 for 6 million tokens, including the full 45-minute agentic session. A 72% cache rate helped, but even at full price, open-weight inference through Open Router sits well below Opus or GPT-5.5 rates for equivalent coding capability. For agents accumulating long context windows over extended sessions (the exact workload where frontier model costs compound fastest), open-weight alternatives offer a structurally different cost curve.

  7. Claire’s recommendation: put GLM-5.2 in rotation, not in the spotlight. She’s keeping it in Cursor for frontend and design work, and in Claude Code for long-running agentic tasks, alongside closed frontier models rather than as a replacement. The constraint she’s watching: can it handle her React-heavy workload at the same consistency she gets from Composer? If it can, the cost-and-control argument gets much harder to ignore.

Blog and detailed workflow walkthroughs from this episode:

GLM 5.2: A Live Review of an Opus-Level Open-Weights Model: https://www.chatprd.ai/how-i-ai/glm-5-2-review-open-weights-model

↳ How to Deploy an Autonomous AI Agent for Bug Triage and Prioritization: https://www.chatprd.ai/how-i-ai/workflows/how-to-deploy-an-autonomous-ai-agent-for-bug-triage-and-prioritization

↳ How to Perform an AI-Powered Codebase Audit and Architecture Visualization: https://www.chatprd.ai/how-i-ai/workflows/how-to-perform-an-ai-powered-codebase-audit-and-architecture-visualization

↳ How to Configure the Open-Weight GLM 5.2 Model in Cursor: https://www.chatprd.ai/how-i-ai/workflows/how-to-configure-the-open-weight-glm-5-2-model-in-cursor

No Figma. No Jira. No docs. How Gusto built a new product line with Claude Code | Eddie Kim (CTO)

Listen now on YouTubeSpotifyApple Podcasts

Brought to you by:

Eddie Kim is the co-founder and CTO of Gusto. In this episode, he shares how a five-person team used Claude Code, a permanent Zoom room, and almost none of the usual product process—no PM, no Figma, no Jira, no long specs—to build Gusto Cofounder from scratch in just 10 weeks.

Biggest takeaways:

  1. A five-person team with no process can outship a large team with full process, if AI handles the engineering. Eddie’s product launched at Gusto’s tier-one level after 10 weeks, starting from zero code. The constraint wasn’t a liability—it was the design. When AI does the building, coordination overhead doesn’t scale the engineering; it just slows it down. The key: strip process to what the team actually needs, then let AI fill the gap.

  2. “Zero code to tier-one launch” is now a viable founding path. The team reached a production milestone at Gusto without a line of pre-existing code. This flips the assumption that early teams spend months on infrastructure before shipping anything real. With Claude Code as the primary builder, the initial sprint becomes about direction and judgment, not typing. It compresses the time between idea validation and real user contact from months to weeks.

  3. No meetings, no Jira, no text threads. It shipped anyway. The team had no standup cadence, no ticket system, no async thread to resolve blockers. What replaced all of that: shared context held inside the AI loop. When the model carries state and the team is small and aligned, human coordination overhead becomes optional.

  4. The technical stack for a production AI agent is shockingly minimal. The entire agent loop ran on Cloudflare Workers with the Vercel AI SDK. Nothing else. No proprietary orchestration layer, no third-party agent framework. Everything else was built in-house. Teams often over-architect before they’ve proven anything; Eddie’s stack is evidence that infrastructure minimalism accelerates the path to learning what the agent actually needs to do.

  5. Building agents is not as complicated as the community makes it sound. An agent is an AI SDK running somewhere in the cloud, able to look up files and call tools. That’s the full definition. The complexity people fear (state management, orchestration, reliability) is solvable with the same judgment calls any backend system requires. Eddie’s team shipped one at production quality in 10 weeks without specialist AI infrastructure experience.

  6. The “permanent Zoom” model of AI development changes how teams think about context. Claude Code running in a persistent loop means the model has continuous access to the codebase’s current state. That’s closer to having an engineer who never closes their laptop than a chat interface you query on demand. For small teams, this is the equivalent of a senior engineer who is always available, always current, and never needs onboarding after a break.

  7. The lesson for founding teams isn’t “use Claude Code.” It’s “design your process for AI as a team member.” Most early teams graft AI tools onto a human-scaled workflow: standups, tickets, PRs reviewed by three people. Eddie’s team treated the AI as a primary contributor from day one and built their coordination model around that assumption. The result: a workflow that gets faster as the AI improves, not one that merely offloads tasks to it.

Blog and detailed workflow walkthroughs from this episode:

How Gusto Built a New Product Line in 10 Weeks with Claude Code, No Jira, and No Docs: https://www.chatprd.ai/how-i-ai/how-gusto-built-a-new-product-line-in-10-weeks-with-claude-code-no-jira-and-no-docs

↳ How to Build a New AI Product in 10 Weeks Using the ‘No-Process’ Method: https://www.chatprd.ai/how-i-ai/workflows/how-to-build-a-new-ai-product-in-10-weeks-using-the-no-process-method

↳ How to Fix Bugs Using an AI-Powered Test-Driven Development (TDD) Workflow: https://www.chatprd.ai/how-i-ai/workflows/how-to-fix-bugs-using-an-ai-powered-test-driven-development-tdd-workflow


If you’re enjoying these episodes, reply and let me know what you’d love to learn more about: AI workflows, hiring, growth, product strategy—anything.

Catch you next week,
Lenny

P.S. Want every new episode delivered the moment it drops? Hit “Follow” on your favorite podcast app.

No Figma. No Jira. No docs. How Gusto built a new product line with Claude Code | Eddie Kim (CTO)

2026-06-29 20:03:38

Eddie Kim is the co-founder and CTO of the payroll and HR platform Gusto, which just crossed $1 billion in revenue and serves more than 500,000 small businesses. Recently he did something most CTOs don’t: he went back to writing code. With three other engineers and one designer, Eddie built Gusto Cofounder, a net-new AI product, from zero code to a tier-one launch in 10 weeks. He walks through how that team actually worked, why they threw out nearly every process, and how anyone can copy the approach.

Listen or watch on YouTube, Spotify, or Apple Podcasts

What you’ll learn:

  1. The trash-can method: how to write, review, and delete a full PR as a product decision instead of a planning doc

  2. The two-tool agent stack behind Gusto Cofounder

  3. The exact “perma-Zoom” setup that replaced standups, retros, and Slack threads for 10 weeks

  4. How a designer with no engineering background hit the 94th percentile for shipping code

  5. The eval-first workflow Eddie uses to fix real customer bugs with Claude Code

  6. How a non-technical leader can prototype an idea to win buy-in, then carry it all the way to production-quality code


Brought to you by:

Magic Patterns—Prototypes that look like your product

Jira Product Discovery—Prioritize with insights, build with confidence

In this episode, we cover:

(00:00) Intro: five people, 10 weeks

(02:38) The origins of Cofounder

(08:32) Inside the 10-week build process

(12:50) Building with no PMs

(14:38) The “trash can” method

(17:15) The stack architecture

(19:10) Shipping to production from day one

(22:03) How a designer became a top engineer

(29:05) Demo: Cofounder over text and Slack

(31:45) Demo: running a real payroll

(36:26) Live coding with evals in Claude Code

(39:39) Recap: prototype, small team, permission

(43:17) Lightning round

(48:44) Where to find Eddie and Cofounder

Tools referenced:

• Gusto Cofounder (early access/waitlist): https://gusto.com/cofounder

• Claude Code (Anthropic): https://claude.ai/code

• Cloudflare Workers: https://workers.cloudflare.com/

• Vercel AI SDK: https://sdk.vercel.ai/

• DX (engineering analytics): https://getdx.com/

• Wispr Flow (voice-to-text): https://wisprflow.ai

• OpenClaw: https://openclaw.ai/

Other references:

• Gusto (the main product, “Gusto Classic”): https://gusto.com

Mindbody (referenced as customer data source): https://www.mindbodyonline.com/

Where to find Eddie Kim:

LinkedIn: https://www.linkedin.com/in/edawerd/

Where to find Claire Vo:

ChatPRD: https://www.chatprd.ai/

Website: https://clairevo.com/

LinkedIn: https://www.linkedin.com/in/clairevo/

X: https://x.com/clairevo

Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].

OpenAI Codex lead on the new shape of product work | Andrew Ambrosino

2026-06-28 20:31:54

Andrew Ambrosino leads development of the Codex desktop app at OpenAI. Nearly 100% of OpenAI employees—not just engineers—now use Codex weekly. A lifelong builder with a background spanning engineering, design, product management, and founding companies, he is now responsible for turning the Codex desktop experience into what he calls “the best desktop app that has ever existed, full stop.”

In our in-depth conversation, we discuss:

  1. Why AI has completely flipped the product development process

  2. What “taste” really means as a professional skill, and why it is emerging as the most valuable capability in an AI-first workplace

  3. Why Andrew believes the Codex app would have failed if they launched it last November (vs. in February)

  4. The “zone defense” model for how product managers at OpenAI operate when everyone can build anything

  5. How roles are collapsed on Andrew’s team, and why eliminating the concept of roles entirely is a big mistake

  6. How Andrew uses Codex to run his own workflows

  7. The vision for a home base that coordinates work across ChatGPT, Codex, and the tools people already use.


Brought to you by:

WorkOS—Make your app enterprise-ready, with SSO, SCIM, RBAC, and more

Mercury—Radically different banking, now with Command

Where to find Andrew Ambrosino:

• X: https://x.com/ajambrosino

• LinkedIn: https://www.linkedin.com/in/ajambrosino

• Website: https://ambrosino.io

Referenced:

• Codex: chatgpt.com/codex

• The Primal Mark: How the Beginning Shapes the End in the Development of Creative Ideas: https://www.gsb.stanford.edu/faculty-research/publications/primal-mark-how-beginning-shapes-end-development-creative-ideas

• Linear: https://linear.app

• “Taste” is not just taste in aesthetics: https://x.com/thenanyu/status/2067327619897446721

• Linear’s secret to building beloved B2B products | Nan Yu (Head of Product): https://www.lennysnewsletter.com/p/linears-secret-to-building-beloved-b2b-products-nan-yu

• Paul Graham’s website: https://paulgraham.com

• The design process is dead. Here’s what’s replacing it. | Jenny Wen (head of design at Claude): https://www.lennysnewsletter.com/p/the-design-process-is-dead

• The case study factory: https://essays.uxdesign.cc/case-study-factory

• Why humans are AI’s biggest bottleneck (and what’s coming in 2026) | Alexander Embiricos (OpenAI Codex Product Lead): https://www.lennysnewsletter.com/p/why-humans-are-ais-biggest-bottleneck

• OpenClaw: https://openclaw.ai

• OpenClaw: The complete guide to building, training, and living with your personal AI agent: https://www.lennysnewsletter.com/p/openclaw-the-complete-guide-to-building

• From skeptic to true believer: How OpenClaw changed my life | Claire Vo: https://www.lennysnewsletter.com/p/how-openclaw-changed-my-life-claire-vo

• The Codex feature that works while you sleep: https://www.lennysnewsletter.com/p/the-codex-feature-that-works-while

• The AI paradox: More automation, more humans, more work | Dan Shipper: https://www.lennysnewsletter.com/p/the-ai-paradox-dan-shipper

• Atlas: https://chatgpt.com/atlas

• Anthropic: https://www.anthropic.com

• Adobe Premiere: https://www.adobe.com/products/premiere

The Magic School Bus Rides Again: https://www.netflix.com/title/80108373

Recommended books:

The Gruffalo: https://www.amazon.com/Gruffalo-Julia-Donaldson/dp/0803730470

The Big Orange Splot: https://www.amazon.com/Big-Orange-Splot-Manus-Pinkwater/dp/0590445103


Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].

Lenny may be an investor in the companies discussed.


My biggest takeaways from this conversation:

Read more

🧠 Community Wisdom: Beating a career slump, adding more structure to an established team, questions for new-team 1:1s, the evolving shape of the growth role, and more

2026-06-28 01:09:06

👋 Hello and welcome to this week’s edition of ✨ Community Wisdom ✨ a subscriber-only email, delivered every Saturday, highlighting the most helpful conversations in our members-only Slack community.

Read more