2026-06-21 20:31:43
Fiona Fung leads the teams behind Claude Code and Cowork at Anthropic (overseeing Boris Cherny and the entire engineering and PM team). Before Anthropic, she spent 11 years at Microsoft building Visual Studio and TypeScript and then moved to Meta, where she started Facebook Marketplace (now generating over $100 billion in GMV annually), worked on Meta’s first smart glasses and AR glasses, and led infrastructure, growth, integrity, and safety teams at Instagram. She’s been an engineer for over 25 years and has a unique perspective on how the role of building software is changing.
Listen on YouTube, Spotify, and Apple Podcasts
What she’s learned about running a team that’s shipping 8x more code than before
Which roles AI will transform next
Specific ways her team uses AI
How Claude “routines” have changed how she operates as a manager
The context-switching problem no one has solved yet
The biggest unsolved problem in AI
What keeps her up at night
WorkOS—Make your app enterprise-ready, with SSO, SCIM, RBAC, and more
Mercury—Radically different banking, now with Command
• LinkedIn: linkedin.com/in/fionafung
• Running an AI-native engineering org: https://www.youtube.com/watch?v=igO8iyca2_g
• Head of Claude Code: What happens after coding is solved | Boris Cherny: https://www.lennysnewsletter.com/p/head-of-claude-code-what-happens
• Today, Anthropic engineers on average ship 8x as much code per quarter as they did compared to 2021-2025: https://x.com/AnthropicAI/status/2062568864240836995
• Visual Studio: https://visualstudio.microsoft.com
• Joseph Campbell’s quote: https://www.goodreads.com/quotes/192665-the-cave-you-fear-to-enter-holds-the-treasure-you
• Life-changing Cowork use case: https://x.com/lennysan/status/2059664455001334124
• Introducing Claude for Small Business: https://www.anthropic.com/news/claude-for-small-business
• Conversations with Tyler podcast: https://conversationswithtyler.com
• Sheryl Sandberg on Facebook: https://www.facebook.com/sheryl#
• Amélie on Prime Video: https://www.amazon.com/Amelie-Jean-Pierre-Jeunet/dp/B0DQ4S3N45
• Spirited Away on HBO Max: https://www.hbomax.com/movies/spirited-away/3deab668-d0a4-4a8d-9bc8-0952a0ad836e
• Nausicaä of the Valley of the Wind on HBO Max: https://www.hbomax.com/movies/nausicaa-of-the-valley-of-the-wind/ed66031b-6353-4019-ba54-35488468a4db
• Sweet Sisters Bodycare: https://sweetsistersbodycare.com
• Anthropic events: https://www.anthropic.com/events
• Clare Pooley’s quote: https://www.goodreads.com/quotes/11305360-in-a-world-where-you-can-be-anything-be-kind
• Margaret Atwood’s books: https://www.amazon.com/stores/author/B000AQTHI0?ccs_id=0027a474-cd59-4a3a-bcd7-9b173c27d530
• Haruki Murakami’s books: https://www.amazon.com/stores/Haruki-Murakami/author/B000AP7AFI
• The Little Prince: https://www.amazon.com/Little-Prince-Antoine-Saint-Exup%C3%A9ry/dp/0156012197
• Nausicaä of the Valley of the Wind: https://www.amazon.com/Nausica%C3%A4-Valley-Wind-Box-Set/dp/1421550644
• High Output Management: https://www.amazon.com/High-Output-Management-Andrew-Grove/dp/0679762884
Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].
Lenny may be an investor in the companies discussed.
2026-06-20 23:53:05
👋 Hello and welcome to this week’s edition of ✨ Community Wisdom ✨ a subscriber-only email, delivered every Saturday, highlighting the most helpful conversations in our members-only Slack community.
2026-06-17 20:04:04
I break down every loop type from scratch—what a heartbeat, cron, hook, and goal loop actually are, when each one fits, and the five things any effective loop needs before it touches production. Then I build two live loops: a daily aging-PR reviewer in Claude Code that schedules itself at 10:15 a.m. and spins off its own subagents, and a weekly skills-identification loop in Codex that spawns goal-based subagents to validate its own output in real time.
Listen or watch on YouTube, Spotify, or Apple Podcasts
The plain-English definition of a loop—and why it’s just an automated prompt, not a scary new paradigm
The four loop types (heartbeat, cron, hook, and goal) and when each one actually fits your workflow
How to think about loop design using the “onboarding an employee” mental model
The five things every effective loop needs: work trees, skills, plugins/connectors, subagents, and state tracking
How to build a scheduled PR-review routine in Claude Code that babysits aging PRs and alerts your team
How to set up a weekly skills-identification automation in Codex that spawns its own validating subagents
Why goal-based loops are the hardest to write well—and where most people burn tokens for nothing
The two warning signs that your loop is going to get expensive before it gets useful
WorkOS—Make your app enterprise-ready today
Runway—The creative AI platform for images, video, and more
(00:00) Prompts are out and loops are in
(02:30) Defining a loop
(03:03) The four ways to automate a prompt: heartbeat, cron, hooks, and goals
(06:03) Five things every effective loop needs
(09:26) The “onboarding an employee” framework for designing loops
(11:58) Live build #1: Daily aging PR loop in Claude Code
(17:08) Subagents inside loops
(19:00) Live build #2: Weekly skills identification loop in Codex
(22:57) Watching subagents spin up in real time
(25:28) Warning signals around loops
(27:31) What listeners are doing with loops
• Claude Code: https://claude.ai/code
• Codex: https://chatgpt.com/codex
• OpenClaw: https://openclaw.ai/
• Claire’s article “Why OpenClaw Feels Alive Even Though It’s Not”: https://x.com/clairevo/article/2017741569521271175
• Addy Osmani’s article on loop engineering: https://addyosmani.com/blog/loop-engineering/
• Using Goals in Codex: https://developers.openai.com/cookbook/examples/codex/using_goals_in_codex
ChatPRD: https://www.chatprd.ai/
Website: https://clairevo.com/
LinkedIn: https://www.linkedin.com/in/clairevo/
Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].
2026-06-15 23:01:32
Listen now on YouTube • Spotify • Apple Podcasts
Claire puts Claude Fable 5, Anthropic’s first generally available Mythos-class model, through a series of real-world tests: product specs, agent workflows, design tasks, vision tasks, and multi-agent orchestration. She breaks down what Anthropic is claiming, where the model genuinely feels like a leap forward, and where it surprisingly falls short.
Fable 5 is Anthropic’s first “Mythos-class” model to reach general availability, and it’s crushing benchmarks across the board. It hit 80% on SWBench Pro, significantly outperforming Opus 4.8, GPT-4.5, and Gemini 3.1 Pro. Claire found the model excels in specific areas while falling short in others that matter for everyday product work.
The model is expensive by design: $10 per million input tokens and $50 per million output tokens. That’s a new tier above Opus, and it consumes tokens at roughly twice the rate of other models. You need to be strategic about when to deploy this level of intelligence versus using cheaper models like Sonnet or Opus for simpler tasks.
Fable 5 works like a “seasoned engineer”—which is both its superpower and its Achilles’ heel. It’s thorough, autonomous, and will investigate every corner of a problem to be 120% sure it’s shipping the right thing. Sometimes you need a model that’s a little less thorough, a little “dumber,” to actually ship something useful quickly.
The model is exceptionally good at vision tasks, particularly document formatting and PDF parsing. Claire tested it on creating handwriting worksheets for her 7-year-old and found it dramatically outperformed Opus 4.8—better spacing, clearer layout, appropriate white space. This extends to other vision tasks where you want something to look good or need to parse complex documents.
The writing is nearly unreadable for specs and PRDs. Claire found that Fable 5 produces extremely detailed, technically complete documents that are almost impossible to parse. It gets wrapped around the axle on details, creates big blocks of dense paragraphs with internal references, and makes it hard to see the forest for the trees.
Design output is shockingly bad, at least for one-shot design tasks. When Claire asked Fable to design a skills registry, it produced fundamentally terrible design: gray, black, red, simple outlines. This was a real surprise given the model’s benchmark performance.
The model is conservative on execution and takes “minimal” very literally. When Claire asked it to ship an MVP that would deliver customer value, Fable produced something extremely narrow and not actually that useful. This conservatism may stem from the safety guardrails built into the model.
Fable 5 includes specific safeguards for cybersecurity, biology, chemistry, and distillation tasks. Instead of blocking you entirely, it uses a new “fallback” concept—if you get classified into one of these categories, it gracefully falls back to Opus 4.8. Anthropic reports that 95% of sessions don’t hit a fallback, and they maintain a 30-day retention policy solely to catch misuse.
Multi-agent orchestration is technically possible but not yet reliable. Claire tested the dynamic workflows and subagent capabilities extensively and had some successful multi-agent runs, but also encountered frequent stalls and errors. She walked away from her laptop and came back to find subagents had stalled after about three hours.
The key insight: match model intelligence to task complexity. Claire recommends using it for hard technical problems where extreme detail matters, long-horizon work, and vision tasks. But for front-end work, strategy, specs, and design, other models in the ecosystem will serve you better and cost less.
This is “baby Mythos,” not the full Mythos model. Fable 5 has guardrails that the unrestricted Mythos model (available only to Project Glasswing partners) doesn’t have. The underlying model is the same, but Fable is tuned for safety and general availability.
How I AI: My Honest Review of Claude Fable 5: https://www.chatprd.ai/how-i-ai/claude-fable-5-review
Listen now on YouTube • Spotify • Apple Podcasts
Brought to you by:
Claire sits down with Ankur Goyal, the founder and CEO of Braintrust, to unpack how top engineering teams are using AI agents, evals, and CI to ship better software faster. They get into why agents are now capable of tackling hard infrastructure problems, how to decide what work sits “below the agent line,” and why evals are quickly becoming the modern version of a PRD. Ankur’s core message: the best teams won’t just use AI to write more code; they’ll build the feedback loops, benchmarks, and systems that let AI improve the quality of the product itself.
There’s no staff engineer running as many rigorous benchmarks as someone using an agent. Ankur viscerally disagrees with engineers who say AI can’t handle complicated problems. While models might not be perfect at writing highly concurrent code, they excel at running exhaustive experiments—testing every column store format, every execution engine, every optimization strategy. The baseline of rigor you get from agents is incredible, and there’s simply no excuse anymore to skip benchmarks because they’re tedious.
The agent line keeps going up—and you need to identify what’s below it. Many interactions, decisions, and directions that feel like they need human judgment actually fit “below the agent line.” If you took the information from a meeting and gave it to an agent, would it solve the same problem? Increasingly, the answer is yes. The best teams push this line higher by building smart skills and integrations that expand what agents can handle autonomously.
Practical quality beats theoretical quality every time. In theory, a human engineer with infinite time and focus might produce better code than an AI agent. In practice, humans lose context over days, have decaying attention spans on hard-but-tedious problems, and skip benchmarks they know they should run. AI agents maintain consistent focus, run every test, and can work on problems continuously for days or weeks. The practical quality of AI-assisted engineering is higher because of sustained rigor, not because the code is theoretically better.
You can now bite off much harder technical problems than before. Companies historically avoid major infrastructure changes because the cost of testing alternatives is prohibitively high and the unknown unknowns are risky. With AI agents, you can exhaustively test six different database solutions, run thousands of benchmarks on production-scale data, and make informed decisions about platform shifts that would have been impossible before. The business case for deep technical work becomes much easier when agents do the heavy lifting.
Run four to six foreground agents simultaneously—that’s the human concurrency limit. Ankur runs different agents working on different problems. This matches the personal concurrency limit most people can manage; you can’t effectively context switch between more than that. Some agents run locally, and others run remotely on cloud infrastructure with production-scale data. The key is isolation: each agent has its own environment, ports, and services.
Evals are the modern PRD—they define what success looks like, not how to achieve it. Machine learning shifts programming from defining implementation details to defining success criteria. Just like the best PRDs include user stories and examples, the best evals include concrete test cases and scoring functions. The difference is that evals quantify success in ways that can be automatically measured and improved. This lets you focus on outcomes while AI figures out the implementation.
Build a feedback loop that automatically turns real-world data into evals. For AI product teams, the #1 engineering priority isn’t prompt engineering or picking an agent framework—it’s building a pipeline that summons real-world data and converts it into evals. This is the same principle as investing in CI for traditional software: you’re building the platform that lets agents do the work engineers used to do manually. Without this feedback loop, you’re stuck in whack-a-mole mode, fixing individual cases without systematic improvement.
Quantify your designer’s taste so it scales across your product. Ankur runs hundreds of evals to improve things quantitatively, then asks David (their tastemaker designer) for a vibe check every few days. When David destroys his work, Ankur captures the feedback (“David thinks it’s OK to show both languages as long as . . .”) and improves the scoring functions to encode David’s palette. This doesn’t replace David; it amplifies him. They’re able to apply David’s quality bar to more things than he could ever review manually.
Product building is now carving, not constructing. It’s extremely fast to create something with too many features, too many buttons, and too much code. The hard part is removing stuff. When customers complain, Braintrust removes the thing causing confusion 90% of the time, making the system work better by eliminating complexity. This is the opposite of traditional product development, where you carefully add features one by one.
Invest in CI to earn the ability to move faster—it’s the platform for AI-powered engineering. Every engineer is now building a platform upon which agents do the work engineers used to do manually. For traditional software, that platform is CI. If you feel constrained by velocity, don’t ship crappy stuff faster. Instead, pause and improve CI so you earn the ability to move faster safely. The same principle applies to AI products: build the eval pipeline first, then let agents optimize within that system.
When agents fail, close the session and improve the evals—don’t yell or bribe. Ankur’s back-pocket strategy is remarkably disciplined: he doesn’t try to prompt his way out of problems. He closes the session, improves the evaluation criteria or success metrics, and starts fresh. Sometimes this means hand-writing code to better understand the problem (like when he spent a weekend hand-writing a 3,000-line eval that had become trash through vibe coding). The solution is always better evals, not better prompting.
Blog: Ankur Goyal’s Playbook for Agent-Driven Benchmarking and AI Evals https://www.chatprd.ai/how-i-ai/ankur-goyals-playbook-for-agent-driven-benchmarking-and-ai-evals
Workflows:
↳ How to Scale Expert Judgment in AI Systems with a Human Feedback Loop: https://www.chatprd.ai/how-i-ai/workflows/how-to-scale-expert-judgment-in-ai-systems-with-a-human-feedback-loop
↳ How to Use AI Coding Agents for Exhaustive Infrastructure Benchmarking: https://www.chatprd.ai/how-i-ai/workflows/how-to-use-ai-coding-agents-for-exhaustive-infrastructure-benchmarking
If you’re enjoying these episodes, reply and let me know what you’d love to learn more about: AI workflows, hiring, growth, product strategy—anything.
Catch you next week,
Lenny
P.S. Want every new episode delivered the moment it drops? Hit “Follow” on your favorite podcast app.
2026-06-15 20:04:03
In this episode, I sit down with Ankur Goyal, founder and CEO of Braintrust, the AI evals and observability platform used by teams like Notion, Stripe, Vercel, and Zapier. This one is for the senior engineers, staff engineers, VPs of engineering, and CTOs in my audience. We get into how coding agents can take on deeply technical architecture and infrastructure work that no single human engineer could tackle before, and then we demystify evals so you can use them to make your AI products better without touching the implementation.
Listen or watch on YouTube, Spotify, or Apple Podcasts
How Ankur uses Codex to run week-long benchmark experiments across database indexes, column store formats, and execution engines to speed up slow queries
Why he argues there’s no excuse to skip rigorous benchmarking now that agents can run them tirelessly
The “agent line” framework: how to decide which decisions, directions, and interactions you can hand off to an agent
How I think about the practical vs. theoretical quality of AI on hard technical problems, and why human attention decays on tedious work
Why evals are the modern version of a PRD, and how to encode “what good looks like” so a model can figure out the “how”
How to build a scoring function live and let an agent improve your prompt inside a safe playground
How Ankur turned his designer David’s taste into a repeatable eval so quality scales beyond one person
Why fixing your CI is the highest-leverage way to speed up engineering velocity
Guru—The AI layer of truth
Persona—Trusted identity verification for any use case
(00:00) Introduction to Ankur Goyal
(03:00) Using AI agents for database optimization
(06:10) Running exhaustive benchmarks with coding agents
(09:03) Why staff engineers are wrong about AI limitations
(11:30) The “agent line” framework for delegation
(14:00) Ankur’s workflow: running 4 to 6 concurrent agents
(17:16) Technical setup: foreground agents, background agents, and cloud environments
(20:32) Spending time with AI tools
(23:06) Demystifying evals
(26:02) Live demo: Building an eval for documentation answers
(30:20) The alternative to evals: vibe checks and whack-a-mole
(32:09) Capturing designer taste in scoring functions
(33:13) Quick recap
(33:44) Managing velocity and throughput
(35:40) Why CI/CD investment is critical for AI-accelerated teams
(37:30) Ankur’s prompting strategy when agents fail
(39:10) Closing thoughts and how to connect
• Braintrust: https://www.braintrust.dev/
• Codex: https://openai.com/codex/
• GPT 5.4: https://developers.openai.com/api/docs/models/gpt-5.4
• Claude: https://claude.ai/
• GPT 5.5 just did what no other model could: https://www.lennysnewsletter.com/p/gpt-55-just-did-what-no-other-model
• Paul Graham’s Maker vs. Manager Schedule: http://www.paulgraham.com/makersschedule.html
• tmux: https://github.com/tmux/tmux
• Chris Tate at Vercel: https://www.linkedin.com/in/ctatedev/
LinkedIn: https://www.linkedin.com/in/ankrgyl/
ChatPRD: https://www.chatprd.ai/
Website: https://clairevo.com/
LinkedIn: https://www.linkedin.com/in/clairevo/
Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].
2026-06-14 20:31:44
Mark Pincus founded Zynga—the company behind Words With Friends, FarmVille, and Zynga Poker—and has arguably created more hit consumer products than anyone in history. At Zynga, eight of 10 major game launches became massive hits, reaching over a billion players. Over the past five years, Mark has been synthesizing everything he’s learned about building successful consumer products and turning it into a book, Life at the Speed of Play, which comes out on June 23. This is the first interview he’s done about the book.
Listen on YouTube, Spotify, and Apple Podcasts
His “Proven, Better, New” framework: copy what’s proven, make it better so that 10 out of 10 people say “f*ck yes, I’ll use this”—then add something new
Why being less ambitious is the path to the most ambitious ideas
His rule of thumb that your instincts are right 95% of the time, but your ideas are wrong 75% of the time
“Kill hope before hope kills you”
How to raise kids in the age of AI
WorkOS—Make your app enterprise-ready, with SSO, SCIM, RBAC, and more
Vanta—Automate compliance, manage risk, and accelerate trust with AI
• LinkedIn: https://www.linkedin.com/in/markpincus
• Website: https://www.lifeatthespeedofplay.com
• Tribe.net: https://en.wikipedia.org/wiki/Tribe.net
• Zynga: https://www.zynga.com
• Sid Meier: https://en.wikipedia.org/wiki/Sid_Meier
• Electronic Arts: https://www.ea.com
• CityVille: https://en.wikipedia.org/wiki/CityVille
• Words With Friends: https://wordswithfriends.com/
• Scrabble: https://playscrabble.com
• Reddit: https://www.reddit.com
• TED Radio Hour, MIT Media Lab founder, 1984 TED talk.: https://www.ted.com/talks/nicholas_negroponte_5_predictions_from_1984
• Peter Thiel on LinkedIn: https://www.linkedin.com/in/peterthiel
• FarmVille: https://en.wikipedia.org/wiki/FarmVille
• Craig Newmark: https://en.wikipedia.org/wiki/Craig_Newmark
• How to consistently go viral: Nikita Bier’s playbook for winning at consumer apps (co-founder of TBH, Gas, advisor, investor): https://www.lennysnewsletter.com/p/how-to-consistently-go-viral-nikita-bier
• Angry Birds: https://www.angrybirds.com/
• OMGPop: https://en.wikipedia.org/wiki/OMGPop
• Draw Something: https://en.wikipedia.org/wiki/Draw_Something
• Slack founder: Mental models for building products people love ft. Stewart Butterfield: https://www.lennysnewsletter.com/p/slack-founder-stewart-butterfield
• Brian Chesky’s new playbook: https://www.lennysnewsletter.com/p/brian-cheskys-contrarian-approach
• Garry Tan on LinkedIn: https://www.linkedin.com/in/garrytan
• Brian Armstrong on LinkedIn: https://www.linkedin.com/in/barmstrong
• Jason Citron on X: https://x.com/jasoncitron
• Stanislav Vishnevskiy on LinkedIn: https://www.linkedin.com/in/svishnevskiy
• Jeff Bezos on X: https://x.com/JeffBezos
• Andy Jassy on X: https://x.com/ajassy
• Niantic: https://nianticlabs.com
• Pokémon Go: https://pokemongo.com
• Bing Gordon on LinkedIn: https://www.linkedin.com/in/binggordon
• Life at the Speed of Play: Launch Products People Love!: https://www.amazon.com/Life-Speed-Play-Launch-Products/dp/0063352575/ref=tmm_hrd_swatch_0
Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].
Lenny may be an investor in the companies discussed.