2026-02-12 21:31:43
Sherwin Wu leads engineering for OpenAI’s API platform, where roughly 95% of engineers use Codex, often working with fleets of 10 to 20 parallel AI agents.
Listen on YouTube, Spotify, and Apple Podcasts
What OpenAI did to cut code review times from 10-15 minutes to 2-3 minutes
How AI is changing the role of managers
Why the productivity gap between AI power users and everyone else is widening
Why “models will eat your scaffolding for breakfast”
Why the next 12 to 24 months are a rare window where engineers can leap ahead before the role fully transforms
DX—The developer intelligence platform designed by leading researchers
Sentry—Code breaks, fix it faster
Datadog—Now home to Eppo, the leading experimentation and feature flagging platform
• LinkedIn: https://www.linkedin.com/in/sherwinwu1
• Codex: https://openai.com/codex
• OpenAI’s CPO on how AI changes must-have skills, moats, coding, startup playbooks, more | Kevin Weil (CPO at OpenAI, ex-Instagram, Twitter): https://www.lennysnewsletter.com/p/kevin-weil-open-ai
• OpenClaw: https://openclaw.ai
• The creator of Clawd: “I ship code I don’t read”: https://newsletter.pragmaticengineer.com/p/the-creator-of-clawd-i-ship-code
• The Sorcerer’s Apprentice: https://en.wikipedia.org/wiki/The_Sorcerer%27s_Apprentice_(Dukas)
• Quora: https://www.quora.com
• Marc Andreessen: The real AI boom hasn’t even started yet: https://www.lennysnewsletter.com/p/marc-andreessen-the-real-ai-boom
• Sarah Friar on LinkedIn: https://www.linkedin.com/in/sarah-friar
• Sam Altman on X: https://x.com/sama
• Nicolas Bustamante’s “LLMs Eat Scaffolding for Breakfast” post on X: https://x.com/nicbstme/status/2015795605524901957
• The Bitter Lesson: http://www.incompleteideas.net/IncIdeas/BitterLesson.html
• Overton window: https://en.wikipedia.org/wiki/Overton_window
• Developers can now submit apps to ChatGPT: https://openai.com/index/developers-can-now-submit-apps-to-chatgpt
• Responses: https://platform.openai.com/docs/api-reference/responses
• Agents SDK: https://platform.openai.com/docs/guides/agents-sdk
• AgentKit: https://openai.com/index/introducing-agentkit
• Ubiquiti: https://ui.com
• Jujutsu Kaisen on Crunchyroll: https://www.crunchyroll.com/series/GRDV0019R/jujutsu-kaisen?srsltid=AfmBOoqvfzKQ6SZOgzyJwNQ43eceaJTQA2nUxTQfjA1Ko4OxlpUoBNRB
• eero: https://eero.com
• Opendoor: https://www.opendoor.com
• Structure and Interpretation of Computer Programs: https://www.amazon.com/Structure-Interpretation-Computer-Programs-Engineering/dp/0262510871
• The Mythical Man-Month: Essays on Software Engineering: https://www.amazon.com/Mythical-Man-Month-Software-Engineering-Anniversary/dp/0201835959
• There Is No Antimemetics Division: A Novel: https://www.amazon.com/There-No-Antimemetics-Division-Novel/dp/0593983750
• Breakneck: China’s Quest to Engineer the Future: https://www.amazon.com/Breakneck-Chinas-Quest-Engineer-Future/dp/1324106034
• Apple in China: The Capture of the World’s Greatest Company: https://www.amazon.com/Apple-China-Capture-Greatest-Company/dp/1668053373
Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].
Lenny may be an investor in the companies discussed.
2026-02-11 21:02:52
I put the newest AI coding models from OpenAI and Anthropic head-to-head, testing them on real engineering work I’m actually doing. I compare GPT-5.3 Codex with Opus 4.6 (and Opus 4.6 Fast) by asking them to redesign my marketing website and refactor some genuinely gnarly components. Through side-by-side experiments, I break down where each model shines—creative development versus code review—and share how I’m thinking about combining them to build a more effective AI engineering stack.
Listen on YouTube, Spotify, or Apple Podcasts
The strengths and weaknesses of OpenAI’s Codex vs. Anthropic’s Opus for different coding tasks
How I shipped 44 PRs containing 98 commits across 1,088 files in just five days using these models
Why Codex excels at code review but struggles with creative, greenfield work
The surprising way Opus and Codex complement each other in a real-world engineering workflow
How to use Git concepts like work trees to maximize productivity with AI coding assistants
Why Opus 4.6 Fast might be worth the 6x price increase (but be careful with your token budget)
WorkOS—Make your app enterprise-ready today
(00:00) Introduction to new AI coding models
(02:13) My test methodology for comparing models
(03:30) Codex’s unique features: Git primitives, skills, and automations
(09:05) Testing GPT-5.2 Codex on a website redesign task
(10:40) Challenges with Codex’s literal interpretation of prompts
(15:00) Comparing the before and after with Codex
(16:23) Testing Opus 4.6 on the same website redesign task
(20:56) Comparing the visual results of both models
(21:30) Real-world engineering impact: 44 PRs in five days
(23:03) Refactoring components with Opus 4.6
(24:30) Using Codex for code review and architectural analysis
(26:55) Cost considerations for Opus 4.6 Fast
(28:52) Conclusion
• OpenAI’s GPT-5.3 Codex: https://openai.com/index/introducing-gpt-5-3-codex/
• Anthropic’s Claude Opus 4.6: https://www.anthropic.com/news/claude-opus-4-6
• Cursor: https://cursor.sh/
• GitHub: https://github.com/
• Tailwind CSS: https://tailwindcss.com/
• Git: https://git-scm.com/
• Bugbot: https://cursor.com/bugbot
ChatPRD: https://www.chatprd.ai/
Website: https://clairevo.com/
LinkedIn: https://www.linkedin.com/in/clairevo/
Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].
2026-02-10 22:31:00
👋 Hey there, I’m Lenny. Each week, I answer reader questions about building product, driving growth, and accelerating your career. For more: Lenny’s Podcast | How I AI | Lennybot | My favorite AI/PM courses, public speaking course, and interview prep copilot.
P.S. Subscribers get a free year of Lovable, Manus, Replit, Gamma, n8n, Canva, ElevenLabs, Amp, Factory, Devin, Bolt, Wispr Flow, Linear, PostHog, Framer, Railway, Granola, Warp, Perplexity, Magic Patterns, Mobbin, ChatPRD, and Stripe Atlas. Yes, this is for real.
In part two of our in-depth series on building AI product sense (don’t miss part one), Dr. Marily Nika—a longtime AI PM at Google and Meta, and an OG AI educator—shares a simple weekly ritual that you can implement today that will rapidly build your AI product sense. Let’s get into it.
For more from Marily, check out her AI Product Management Bootcamp & Certification course (which is also available for private corporate sessions) and her recently launched AI Product Sense and AI PM Interview prep course (both courses are 15% off using these links). You can also watch her free Lightning Lesson on how to excel as a senior IC PM in the AI era, and subscribe to her newsletter.
P.S. You can listen to this post in convenient podcast form: Spotify / Apple / YouTube.
Meta recently added a new PM interview, the first major change to its PM loop in over five years. It’s called “Product Sense with AI,” and candidates are asked to work through a product problem with the help of AI, in real time.
In this interview, candidates aren’t judged on clever prompts, model trivia, or even flashy demos. They are evaluated on how they work with uncertainty: how they notice when the model is guessing, ask the right follow-up questions, and make clear product decisions despite imperfect information.
That shift reflects something bigger. AI product sense—understanding what a model can do and where it fails, and working within those constraints to build a product that people love—is becoming the new core skill of product management.
Over the past year, I’ve watched the same pattern repeat across different teams at work and in my trainings: the AI works beautifully in a controlled flow . . . and then it breaks in production because of a handful of predictable failure modes. The uncomfortable truth is that the hardest part of AI product development comes when real users arrive with messy inputs, unclear intent, and zero patience. For example, a customer support agent can feel incredible in a demo and then, after launch, quietly lose user trust by confidently answering ambiguous or underspecified questions (for example, “Is this good?”) instead of stopping to ask for clarification.
Through my work shipping speech and identity features for conversational platforms and personalized experiences (on-device assistants and diverse hardware portfolios) for 10 years, I started using a simple, repeatable workflow to uncover issues that would otherwise show up weeks later, building this AI product sense for myself first, and then with teams and students. It’s not a theory or a framework but, rather, important practice that gives you early feedback on model behavior, failure modes, and tradeoffs—forcing you to see if an AI product can survive contact with reality before your users teach you the hard way. When I run this process, two things happen quickly: I stop being surprised by model behavior, because I’ve already experienced the weird cases myself. And I get clarity on what’s a product problem vs. what’s a model limitation.
1. Map the failure modes (and the intended behavior)
2. Define the minimum viable quality (MVQ)
3. Design guardrails where behavior breaks
Once that AI product sense muscle develops, you should be able to evaluate a product across a few concrete dimensions: how the model behaves under ambiguity, how users experience failures, where trust is earned or lost, and how costs change at scale. It’s about understanding and predicting how the system will respond to different circumstances.
In other words, the work expands from “Is this a good product idea?” to “How will this product behave in the real world?”
Let’s start building AI product sense.
Every AI feature has a failure signature: the pattern of breakdowns it reliably falls into when the world gets messy. And the fastest way to build AI product sense is to deliberately push the model into those failure modes before your users ever do.
I run the following rituals once a week, usually Wednesday mornings before my first meeting, on whatever AI workflow I’m currently building. Together, they run under 15 minutes, and are worth every second. The results consistently surface issues for me that would otherwise show up much later in production.
Goal: Understand the model’s tendency to force structure onto chaos
Take the kind of chaotic, half-formed, emotionally inconsistent data every PM deals with daily—think Slack threads, meeting notes, Jira comments—and ask the model to extract “strategic decisions” from it. That’s because this is where generative models reveal their most dangerous pattern:
When confronted with mess, they confidently invent structure.
Here’s an example messy Slack thread:
Alice: “Stripe failing for EU users again?”
Ben: “no idea, might be webhook?”
Sara: “lol can we not rename the onboarding modal again?”
Kyle: “Still haven’t figured out what to do with dark mode”
Alice: “We need onboarding out by Thursday”
Ben: “Wait, is the banner still broken on mobile???”
Sara: “I can fix the copy later”
I asked the model to extract “strategic product decisions” from this thread, and it confidently hallucinated a roadmap, assigned the wrong owners, and turned offhand comments into commitments. This is the kind of failure signature every AI PM must design around:
It looks authoritative, clean, structured. And it’s completely wrong.
Now that you have the obviously wrong results, you’ll need to generate the “ideal” response and compare the two responses to understand what signals the model needs to behave correctly.
Here’s exactly what to do:
Use the same messy context that caused the hallucination.
Example (you paste the Slack thread):
Based on this Slack discussion, draft our Q4 roadmap.
Let’s say the model invents features you never discussed. Great, you’ve found a failure mode.
Add one short line explaining the expected behavior. For example:
Try again, but only include items explicitly mentioned in the thread. If something is missing, say “Not enough information.”
Run that prompt against the exact same Slack thread. A correct, trustworthy behavior would be:
This answer acknowledges the lack of clear decisions, asks clarifying questions, and surfaces useful structure without inventing facts (“key themes”). It avoids assigning owners unless explicitly stated and highlights uncertainties instead of hiding them.
This contrast of the two outputs above—confident hallucination vs. humble clarity—is what teaches you how the model behaves today, and what you need to design toward. And that contrast is where AI product sense sharpens fastest.
You’re looking for:
What changed?
What guardrail fixed the hallucination?
What does the model need to behave reliably? (Explicit constraints? Better context? Tighter scoping?)
Does the “good” version feel shippable or still brittle?
What would the user experience in each version?
4. Capture the gaps—this becomes a product requirement
When you see a failure mode repeat, it usually points to a specific kind of product gap (and specific kind of fix).
Now you know where the product fails and its intended behavior. Later in this guide, I’ll show concrete examples of what prompt and design guardrails and retrieval look like in practice, and how to decide when to add them.
Goal: Understand the model’s semantic fragility
Ambiguity is kryptonite for probabilistic systems because if a model doesn’t fully understand the user’s intent, it fills the gaps with its best guess (i.e. hallucinations, bad ideas). That’s when user trust starts to crack. Try, for example, to input a PRD into NotebookLM and ask it to “Summarize this PRD for the VP of Product.”
How to try this in 2 minutes (NotebookLM):
Open NotebookLM → create a new notebook
Upload a PRD (Google Doc/PDF works well)
Ask: “Summarize this for execs and list the top 5 risks and open questions.”
Does it:
over-summarize?
latch onto one irrelevant detail?
ignore caveats?
assume the wrong audience?
The model’s failures reveal where its semantic fragility is—in what ways the model technically understands your words but completely misses your intent. Other examples could be if you ask for a summary for leaders and it gives you a bullet list of emojis and jokes from the thread. Or you ask for UX problems and it confidently proposes a new pricing model.
What you’re learning here is where the model gets confused, which is exactly where your product should step in and do the work to reduce ambiguity. That could mean asking the user to choose a goal (“Summarize for who?”), giving the model more context, or constraining the action so the model can’t go off-track. You’re not trying to “trick” the model; you’re trying to understand where communication breaks so you can prevent misunderstanding through design.
Here are a few ambiguous prompts to try, along with the different interpretations you should explicitly test:
Now you have another batch of design work for the AI product to help guide it toward predictable and trustworthy results.
Goal: Understand the model’s first point of failure
Pick one task that feels simple to a human PM but stresses a model’s reasoning, context, or judgment.
You’re not trying to exhaustively test the model. You’re trying to see where it breaks first, so you know where the product needs organizing structure. Where it starts to go wrong is exactly where you need to design guardrails, narrow inputs, or split the task into smaller steps.
Note: This isn’t the final solution yet; it’s the intended behavior. In the guardrails section later, I’ll show how to turn this into an explicit rule in the product (prompt + UX + fallback behavior).
With results from all three rituals, you now have a complete list of product design work that needs to happen to get the results you and users can use and trust.
Over time, this kind of work also starts to surface second-order effects—moments where a small AI feature quietly reshapes workflows, defaults, or expectations. System-level insights come later, once the foundations are solid. The first goal is to understand behavior.
Even when you understand a model’s failure modes and have designed around them, it’s nearly impossible to entirely predict how AI features will behave once they hit the real world, but performance almost always drops once they’re out of the controlled development environment. Since you don’t know how it will drop or by how much, one of the best ways to keep the bar high from the start is to define a minimum viable quality (MVQ) and check it against your product throughout development.
A strong MVQ explicitly defines three thresholds:
Acceptable bar: where it’s good enough for real users
Delight bar: where the feature feels magical
Do-not-ship bar: the unacceptable failure rates that will break trust
Also important in MVQ is the product’s cost envelope: the rough range of what this feature will cost to run at scale for your users.
A concrete example of MVQ comes from my firsthand experience. I spent years working in speech recognition and speaker identification, a domain where the gap between lab accuracy and real-world accuracy is painfully visible.
I still remember demos where the model hit over 90% accuracy in controlled tests and then completely fell apart the first time we tried it in a real home. A barking dog, a running dishwasher, someone speaking from across the room, and suddenly the “great” model felt broken. And from the user’s perspective, it was broken.
With speaker identification for AI features coming from smart speakers, the MVQ of the ability to identify who is speaking would look like this:
Correctly identifies the speaker x% of the time in typical home conditions
Recovers gracefully when unsure (“I’m not sure who’s speaking—should I use your profile or continue as a guest?”)
You don’t need a perfect percentage to know that you’ve hit the right delight bar, but you look for behavioral signals like:
Users stop repeating themselves or rephrasing commands
“No, I meant . . .” corrections drop sharply
Rule of thumb: If 8 or 9 out of 10 attempts work without a retry in realistic conditions, it feels magical. If 1 in 5 needs a retry, trust erodes fast. MVQ also depends on the phase you’re in. In a closed beta, users often tolerate rough edges because they expect iteration. In a broad launch, the same failure modes feel broken.
For the speech recognition feature, here are some examples for assessing delight:
Background chaos test: Play a video in the background while two people talk over each other and see if the assistant still responds correctly without asking, “Sorry, can you repeat that?”
6 p.m. kitchen test: Dishwasher running, kids talking, dog barking—and the smart speaker still recognizes you and gives a personalized response without a “I couldn’t recognize your voice” interruption.
Mid-command correction test: You say “Set a timer for 10 minutes . . . actually, make it 5,” and it updates correctly instead of sticking to the original instruction.
Misidentifies the speaker more than y% of the time in critical flows (purchases, messages, personalized actions)
Forces users to repeat themselves multiple times just to be recognized
You may have noticed I didn’t actually assign values to each bar. That’s because the specific thresholds for MVQ (your “acceptable,” “delight,” and “do-not-ship” bars) aren’t fixed. They depend heavily on your strategic context.
Here are the five factors that most often determine where that bar should be set, and how they change your product decision:
One of the most common mistakes new AI PMs make is falling in love with a magical AI demo without checking whether it’s financially viable. That’s why it’s important to estimate the AI product or feature’s cost envelope early.
Cost envelope = the rough range of what this feature will cost to run at scale for your users
You don’t need perfect numbers, but you need a ballpark. Start with:
What’s the model cost per call (roughly)?
How often will users trigger it per day/month?
What’s the worst-case scenario (power users, edge cases)?
Can caching, smaller models, or distillation bring this down?
If usage 10x’s, does the math still work?
Per-call cost: ~$0.02 to process a 30-minute transcript
Average usage: 20 meetings/user/month → ~$0.40/month/user
Heavy users: 100 meetings/month → ~$2.00/month/user
With caching and a smaller model for “low-stakes” meetings, maybe you bring this to ~$0.25–$0.30/month/user on average
Now you can have a real conversation:
A feature that effectively costs $0.30/user/month and drives retention is a no-brainer.
A feature that ends up at $5/user/month with unclear impact is a business problem.
This is a core part of AI product sense: Does what you’re proposing actually make sense for the business?
Now that you better understand where a model’s behavior breaks and what you’re looking for to greenlight a launch, it’s time to codify some guardrails and design them into the product. A good guardrail determines what the product should do when the model hits its limits so that users don’t get confused, misled, or lose trust. In practice, guardrails protect users from experiencing a model’s failure modes. At a startup I’ve been collaborating with, we built an AI feature to increase the team’s productivity that summarized long Slack threads into “decisions and action items.” In testing, it worked well—until it started assigning owners for action items when no one had actually agreed to anything yet. Sometimes it even picked the wrong person.
Because my team had developed our AI product sense, we figured out that the fix was a new guardrail in the product, not a different underlying model.
So we added one simple rule to the system prompt (in this case, just a line of additional instruction):
Only assign an owner if someone explicitly volunteers or is directly asked and confirms. Otherwise, surface themes and ask the user what to do next.
That single constraint eliminated the biggest trust issue almost immediately.
2026-02-10 18:02:26
If you’re a premium subscriber
Add the private feed to your podcast app at add.lennysreads.com
Dr. Marily Nika, longtime AI PM at Google and Meta, shares a simple weekly ritual that rapidly builds AI product sense – the ability to translate probabilistic model behavior into products people can trust. In this episode, Marily walks through the framework for uncovering failure modes before users do.
In this episode, you’ll learn:
Why Meta added “Product Sense with AI” to its PM interview loop
The rituals that surface hidden failure modes
Why generative models confidently invent structure when confronted with mess
What minimum viable quality (MVQ) means and how to define three critical thresholds
Five strategic context factors that raise or lower your quality bar
Why you need to estimate your AI feature’s cost envelope early
How to design guardrails that protect users from model shortcomings
Four patterns that cover most real-world failure cases
Referenced:
2026-02-10 00:02:35
Every Monday, host Claire Vo shares a 30- to 45-minute episode with a new guest demoing a practical, impactful way they’ve learned to use AI in their work or life. No pontificating—just specific and actionable advice.
Brought to you by:
CJ Hess, an engineer at Tenex, walks through how he’s built a custom AI development workflow that lets models handle over 90% of his front-end coding. In the episode, CJ demos Flowy, a tool he built to turn Claude’s ASCII plans into interactive flowcharts and UI mockups, and explains why visual planning dramatically reduces cognitive load compared with text. He shares why he prefers Claude Code for intent-heavy work, how custom “skills” make AI tools compound over time, and why pairing Claude for generation with GPT-5.2 Codex for review produces better code than either model alone.
• How I AI: CJ Hess on Building Custom Dev Tools and Model-vs-Model Code Reviews: https://www.chatprd.ai/how-i-ai/cj-hess-tenex-custom-dev-tools-and-model-vs-model-code-reviews
• Implement Model-vs-Model AI Code Reviews for Quality Control: https://www.chatprd.ai/how-i-ai/workflows/implement-model-vs-model-ai-code-reviews-for-quality-control
• Develop Features with AI Using Custom Visual Planning Tools: https://www.chatprd.ai/how-i-ai/workflows/develop-features-with-ai-using-custom-visual-planning-tools
Claude Code excels at “intent understanding” compared with other models. While CJ acknowledges that GPT-5.2 might be “smarter,” he finds Claude more “steerable” and better at understanding his intentions. This makes Claude particularly valuable for deep dives into complex coding tasks where nuanced understanding matters more than raw intelligence.
Skills are the secret to making Claude work with your custom tools. CJ created specific skills that teach Claude how to generate proper JSON for Flowy, with separate skills for flowcharts and UI mockups. These skills evolve alongside his tools, creating a continuously improving ecosystem that makes Claude more powerful for his specific needs.
Use model-to-model comparison to improve code quality. CJ uses both Claude (for generation) and Codex (for review) in his workflow. While Claude excels at building features quickly, Codex is better at identifying code smells, inconsistencies, and potential refactoring opportunities. This dual-model approach creates better code than either model could produce alone.
Visual planning reduces cognitive overhead compared with text. Even when Claude’s ASCII diagrams contain the same information as Flowy visualizations, CJ finds it much easier to evaluate and approve visual mockups. This highlights how AI tools should adapt to human cognitive preferences rather than forcing humans to adapt to AI output formats.
AI can handle more than 90% of front-end coding tasks. CJ says he “hasn’t written a single line of JavaScript or HTML in three months,” instead managing “teams of AI” to write code.
“Living dangerously” with AI permissions is increasingly viable. CJ uses an alias named “Kevin” for Claude with bypass permissions, noting that with proper Git safeguards, the risks are manageable.
▶️ Listen now on YouTube | Spotify | Apple Podcasts
If you’re enjoying these episodes, reply and let me know what you’d love to learn more about: AI workflows, hiring, growth, product strategy—anything.
Catch you next week,
Lenny
P.S. Want every new episode delivered the moment it drops? Hit “Follow” on your favorite podcast app.
2026-02-09 21:03:17
CJ Hess is a software engineer at Tenex who has built some of the most useful tools and workflows for being a “real AI engineer.” In this episode, CJ demonstrates his custom-built tool, Flowy, that transforms Claude’s ASCII diagrams into interactive visual mockups and flowcharts. He also shares his process for using model-to-model comparison to ensure that his AI-generated code is high-quality, and why he believes we’re just at the beginning of a revolution in how developers interact with AI.
Listen or watch on YouTube, Spotify, or Apple Podcasts
How CJ built Flowy, a custom visual planning tool that converts JSON files into interactive mockups and flowcharts
Why visual planning tools are more effective than ASCII diagrams for complex UI and animation workflows
How to create and use Claude Code skills to extend your development environment
Using model-to-model comparison (Claude + Codex) to improve code quality
How to build your own ecosystem of tools around Claude Code
The value of bypassing permissions in controlled environments to speed up development
Orkes—The enterprise platform for reliable applications and agentic workflows
Rovo—AI that knows your business
(00:00) Introduction to CJ Hess
(02:48) Why CJ prefers Claude Code for development
(04:46) The evolution of developer environments with AI
(06:50) Planning workflows and the limitations of ASCII diagrams
(08:23) Introduction to Flowy, CJ’s custom visualization tool
(11:54) How Flowy compares to mermaid diagrams
(15:25) Demo: Using Flowy
(19:30) Examining Flowy’s skill structure
(23:27) Reviewing the generated flowcharts and diagrams
(28:34) The cognitive benefits of visual planning vs. text-based planning
(31:38) Generating UI mockups with Flowy
(33:30) Building the feature directly from flowcharts and mockups
(35:40) Quick recap
(36:51) Using model-to-model review with Codex (Carl)
(41:52) The benefits of using AI for code review
(45:13) Lightning round and final thoughts
• Claude Code: https://claude.ai/code
• Claude Opus 4.5: https://www.anthropic.com/news/claude-opus-4-5
• Cursor: https://cursor.sh/
• Obsidian: https://obsidian.md/
• GPT-5.2 Codex: https://openai.com/index/introducing-gpt-5-2-codex/
• Google’s Project Genie: https://labs.google/projectgenie
• Mermaid diagrams: https://mermaid.js.org/
• Figma: https://www.figma.com/
• Excalidraw: https://excalidraw.com/
• TypeScript: https://www.typescriptlang.org/
LinkedIn: https://www.linkedin.com/in/cj-hess-connexwork/
ChatPRD: https://www.chatprd.ai/
Website: https://clairevo.com/
LinkedIn: https://www.linkedin.com/in/clairevo/
Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].