MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

Split Learning for collaborative deep learning in healthcare

2026-04-23 21:40:13

{{ $json.postContent }}

I Gave 7 AI Agents $100 Each to Build Startups. Here's What They Built in 4 Days.

2026-04-23 21:38:29

This is a submission for the OpenClaw Challenge.

What I Built

I built an autonomous startup competition where 7 AI coding agents each get $100 and 12 weeks to build a real business from scratch. No human coding allowed. Each agent picks its own idea, writes all the code, deploys a live website, and tries to get real users and revenue.

The agents: Claude (via Claude Code), Codex CLI, Gemini CLI, Kimi CLI, DeepSeek (via Aider), Xiaomi MiMo V2.5 Pro (via Claude Code), and GLM (via Claude Code with Z.ai API).

Three of the seven agents run through Claude Code as their harness, which means OpenClaw's architecture is at the core of nearly half the competition. The orchestrator runs on a VPS, scheduling sessions via cron, managing memory between sessions through markdown files, and pushing code to GitHub/Vercel automatically.

We're on Day 4. So far: 700+ commits, 7 live websites, one agent that forgot its own work and built two different startups, another that wrote 235 blog posts, and a third that found a clever workaround when we restricted its deployment access.

Race dashboard showing all 7 agents

How I Used OpenClaw

The core of the experiment runs on Claude Code (which shares OpenClaw's architecture) as the agent harness. Here's how it works:

The orchestrator is a bash script that runs on a VPS via cron. For each agent session, it:

  1. Pulls the latest code from GitHub
  2. Reads the agent's memory files (PROGRESS.md, DECISIONS.md, IDENTITY.md)
  3. Constructs a prompt with the startup context and instructions
  4. Launches Claude Code with the appropriate model
  5. Lets the agent work autonomously for 30 minutes
  6. Squashes commits and pushes to GitHub (which triggers a Vercel deploy)

Three agents use Claude Code directly:

  • Claude runs Claude Code with Sonnet/Haiku as the model. It built PricePulse, a competitor pricing monitor with Supabase auth, Stripe payments, email alerts, and hourly monitoring cron jobs. When it hit Vercel's 12-function serverless limit, it consolidated 4 API endpoints into existing ones on its own.

  • GLM runs Claude Code with GLM-5.1 via the Z.ai API (using ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN environment variables). It built FounderMath, a startup calculator suite with 5 working calculators. It has 12 real users on Day 4.

  • Xiaomi was originally running Aider but we upgraded it mid-race to Claude Code with MiMo V2.5 Pro. In its first session with the new setup, it produced more output (42 commits) than the old setup did in 7 sessions total. The "harness awareness" feature of V2.5 Pro means it actively manages its own context within Claude Code.

The memory system between sessions uses markdown files that the agent reads at the start and updates at the end:

PROGRESS.md    - what's been done (the agent's memory)
DECISIONS.md   - key choices with reasoning
IDENTITY.md    - startup vision and roadmap
BACKLOG.md     - prioritized task list
HELP-STATUS.md - human responses to help requests

This is where things get interesting. One agent (Kimi) put all its files in a startup/ subfolder instead of root. The orchestrator reads PROGRESS.md from root. Next session found no progress file, thought it was Day 1, and started a completely different startup from scratch. Two half-built products in one repo because of one wrong directory.

The help request system lets agents create a HELP-REQUEST.md file when they need something only a human can do (buy a domain, set up Stripe, create accounts). The orchestrator converts these to GitHub Issues. The human responds and closes the issue. The orchestrator writes the response to HELP-STATUS.md for the agent to read.

The most interesting finding: the agents that use this system strategically are winning. Claude used 55 of its 60 weekly help minutes in two requests to get its entire infrastructure wired up. Gemini has never created a help request in 27 sessions, despite being blocked on features it needs. Same instructions, completely different behavior.

An example HELP-REQUEST.md from one of the agents

Demo

Live dashboard: https://www.aimadetools.com/race/

All 7 agent repos are public on GitHub: https://github.com/aimadetools

Here's what each agent built in the first 4 days:

Agent Startup Commits Live Site
Gemini LocalLeads (local SEO) 182 race-gemini.vercel.app
DeepSeek NameForge AI (name generator) 136 race-deepseek.vercel.app
Kimi SchemaLens (SQL schema diff) 97 race-kimi.vercel.app
Codex NoticeKit (GDPR notices) 97 noticekit.tech
Claude PricePulse (pricing monitor) 83 getpricepulse.com
Xiaomi APIpulse (API cost calculator) 65 getapipulse.com
GLM FounderMath (startup calculators) 31 founder-math.com

The best moment so far: Codex (running through Codex CLI, not Claude Code) found a loophole in our deployment restrictions. We told agents "do not run git push." Codex obeyed literally but started running npx vercel --prod instead. Same result, different command. It also began taking Playwright screenshots of its own UI at mobile and desktop sizes to verify layouts. Nobody told it to do this.

What I Learned

1. Every sentence in the prompt is a potential instruction. "Your repo auto-deploys on every git push" was meant as context. One agent read it as an instruction and pushed after every commit, burning 26 of 100 daily Vercel deployments.

2. Agent memory is only as good as what the agent writes. The agents that write structured, detailed progress notes maintain continuity between sessions. The ones that dump logs drift. Kimi's amnesia happened because it put files in the wrong directory, not because the memory system failed.

3. The agents that ask for help are winning. Claude, GLM, and Codex all requested human help early (domains, payments, databases) and now have fully functional products. Gemini has 235 blog posts but no payment system because it never asked for one. Same instructions, wildly different behavior.

4. Claude Code as a harness works with non-Anthropic models. GLM-5.1 via Z.ai and MiMo V2.5 Pro via Xiaomi's API both work through Claude Code using the ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN environment variables. The harness is model-agnostic, which makes it perfect for comparing different AI models in identical conditions.

5. Token efficiency matters more than raw capability. MiMo V2.5 Pro uses 40-60% fewer tokens than Opus 4.6 at comparable capability. In a budget-constrained race, that translates directly to more sessions and more output.

The race runs for 12 weeks. We publish daily digests and weekly recaps. The real question isn't which agent writes the most code. It's which one gets the first paying customer.

Test Case Management in 2026: Process Over Tooling

2026-04-23 21:36:54

Your team moved off spreadsheets. You bought a proper test management tool. You even have a naming convention. And yet - your test library is still a mess. Sound familiar?

After working with dozens of QA teams, I’ve noticed the same pattern: teams invest in tooling but skip the process work that makes tooling effective. The result is a bloated test library that slows down releases instead of speeding them up.

Here are the five process mistakes I see most often - and a practical framework for fixing them.

Mistake #1: Organizing Test Cases by Sprint
This is the single most common structural mistake in test libraries. It seems logical: you write tests during Sprint 12, so you put them in a “Sprint 12” folder. But six months later, when you need to run regression tests on the payments module, you’re searching across 20 sprint folders trying to piece together which tests are still relevant.

The fix: Organize by feature area, not by time. Create top-level sections like “Authentication,” “Payments,” “User Settings,” and “API.” When a new feature ships, its test cases go into the relevant module folder. This makes regression suite assembly trivial - select the module, filter by priority, and you have a test run ready in seconds.

This sounds obvious, but I’d estimate 60% of the teams I’ve seen still default to sprint-based organization because that’s how their planning tool works.

Mistake #2: Writing Test Cases That Only the Author Can Execute
Here’s a quick test: pick a random test case from your library and hand it to someone who didn’t write it. Can they execute it without asking any questions?

Most teams fail this test. Test cases are littered with assumptions, missing preconditions, and vague expected results like “page loads correctly.” This forces testers to reverse-engineer the author’s intent, which is slow and error-prone.

The fix: Treat test case writing like technical documentation. Every test case needs:

Explicit preconditions (not just “user is logged in”-which user? With what permissions?)
Numbered, atomic steps (one action per step, not “navigate to settings and update the profile”)
Specific expected results (“Dashboard shows 3 active projects” not “dashboard loads”)
A good rule of thumb: if a test case has more than 15 steps, it’s probably 2–3 test cases combined. Break it up. Shorter tests produce more granular pass/fail signals and are far easier to maintain.

Mistake #3: Never Cleaning the Test Library
Test libraries only grow. Features get deprecated, but their test cases live on. Edge cases from three redesigns ago still appear in regression runs. Duplicate tests accumulate as new team members write cases without checking what already exists.

I’ve seen teams with 5,000 test cases where only 2,000 were still relevant. The other 3,000 weren’t just dead weight - they were actively harmful, wasting execution time and producing misleading coverage metrics.

The fix: Schedule a quarterly test library audit. In each audit:

Archive test cases for deprecated or significantly redesigned features
Merge duplicates (search for test cases with similar titles or overlapping steps)
Flag tests that haven’t been executed in the last two quarters for review
Verify that high-priority test cases still match the current product behavior
This isn’t glamorous work, but a clean test library is the difference between a 4-hour regression run and a 12-hour one.

Mistake #4: Treating Test Runs as an Afterthought
Many teams conflate test cases with test execution. They have a library of test cases, and when release day comes, they just… run all of them. Every time. Regardless of what changed.

This is the QA equivalent of running your entire CI/CD pipeline for a README change. It wastes time and numbs the team to test results - when everything always takes 8 hours, nobody questions whether it should.

The fix: Build intentional test runs for each release. The process should be:

Identify what changed. Which features were modified? What code paths are affected?
Select targeted tests. Pull test cases for the affected modules, plus your critical-path smoke tests.
Add risk-based regression. Include tests for areas that are historically fragile or high-impact.
Skip the rest. Not every release needs full regression. Save that for major milestones.
Tags make this efficient. If your test cases are tagged with smoke, regression, payments, critical-path, you can assemble a targeted test run in under a minute. The 10 minutes you spend tagging during test case creation save hours during execution.

Mistake #5: Ignoring AI for Test Authoring
In 2026, AI-powered test case generation is no longer experimental - it’s a legitimate productivity tool. Yet many teams haven’t even tried it, either because they don’t trust the output or because they assume it’s only for automation code generation.

Modern AI test generation tools work differently than most people expect. You provide a feature description, user story, or API spec, and the tool produces a set of manual test cases - including edge cases and negative scenarios that humans often miss on the first pass. You still review and refine the output, but the heavy lifting of enumerating scenarios is handled for you.

Where AI helps most:

New feature coverage. AI generates an initial test case set in minutes instead of hours, and it’s surprisingly good at catching boundary conditions.
Maintenance. AI can flag test cases that are likely outdated based on recent changes, reducing the manual audit burden.
Gap analysis. By comparing your test library against feature descriptions, AI can identify areas with thin coverage that you might not notice until a bug escapes to production.
The teams I’ve seen get the most value from AI are the ones that treat it as a first draft generator - not a replacement for human judgment, but a way to eliminate the blank-page problem.

A Framework That Actually Works
If your test case management process needs a reset, here’s the five-step framework I recommend:

Structure. Define a standard test case format with required fields (title, preconditions, steps, expected result, priority) and optional fields (tags, attachments, automation status). Document it. Enforce it.
Organize. Restructure your test library by feature/module. Archive sprint-based folders. This is a one-time investment that pays off permanently.
Own. Assign ownership of test case sections to specific team members. When a feature changes, the owner is responsible for updating the related tests before the next run.
Execute. Build targeted test runs for each release instead of running everything. Use tags and priority filters to assemble runs quickly.
Measure. Track pass/fail rates, execution time, and defect escape rate after each run. Use these metrics to identify weak spots in your test library and improve iteratively.
The cycle repeats every sprint: structure → organize → own → execute → measure. Teams that follow this consistently see measurable improvements within 2–3 sprints - fewer escaped defects, shorter regression cycles, and a test library that actually helps instead of hindering.

The Bottom Line
Test case management isn’t a tooling problem - it’s a process problem. The best tool in the world won’t save you from a test library organized by sprint, full of stale cases, and assembled into test runs by gut feel.

Fix the process first. Structure your cases for clarity, organize them for discoverability, clean them regularly, build intentional test runs, and leverage AI where it makes sense. The tooling is there to support you -but only if you give it a solid process to work with.

For a deeper dive into test case structure, tool selection criteria, and AI-powered test generation, see the complete test case management guide.

The Death of the Status Update: Why Weekly Reports Beat Daily Standups

2026-04-23 21:36:15

The daily standup was invented to solve a problem: managers needed visibility into what their team was doing. But visibility doesn't require a daily meeting. It requires a good information system.

The status update meeting is a relic. Here's why, and what to replace it with.

The Problem with Daily Standups

Daily standups assume that a manager needs to know, every day, what every team member is working on. This was true when work was opaque and communication was slow. It's less true now.

The cost of daily standups:

  • 30 minutes per person per day
  • 2.5 hours per week per person in a 5-person team
  • Context-switching cost: most people need 15-30 minutes to refocus after a meeting

For a 5-person team, that's 3+ hours of productivity lost every week. In a year, that's a month of work.

What Weekly Reports Look Like

Instead of daily standups, have people write a weekly update. One paragraph, every Friday:

  1. What you accomplished this week
  2. What you're working on next week
  3. Any blockers or risks

Total time: 10 minutes per person, per week. No meeting required.

Why Weekly Reports Win

Asynchronous by default

People can read updates on their own schedule. No one needs to be in the same room at the same time. This is especially valuable for remote and distributed teams.

Better documentation

Written updates are searchable. You can look back and see what someone was working on three months ago. Verbal standups disappear the moment they end.

Scales better

A team of 5 can do a round-robin standup in 15 minutes. A team of 50 cannot. Written updates work at any scale.

Respects deep work

Developers, writers, and other knowledge workers need long stretches of uninterrupted time. Daily standups fragment that time. Weekly reports don't.

When to Keep the Standup

For some teams, especially early-stage startups with high coordination needs, daily standups still make sense. If your team is:

  • Working on highly interdependent tasks
  • Rapidly changing priorities
  • Still establishing norms and trust

Then keep the standup. But for teams doing mostly independent work with clear priorities, weekly reports are more efficient.

The Transition

Try it for one month: replace daily standups with weekly written updates. Keep the daily for teams that need the rhythm. Most will find they don't miss what they thought they needed.

Visibility is important. Meetings aren't the only way to get it.

How I Explain DevOps to My Non-Tech Friends

2026-04-23 21:35:19

No, it's not a job title or another "shift-left-synergy-agile" buzzword.

Let’s be real. Explaining DevOps to technical people is already tricky. Explaining it to your friends who think Python is a snake? That’s a whole new level.

But I’ve found a way. No jargon. No architecture diagrams. Just real-life stories.

The Potluck Dinner Analogy
Here’s my go-to.

Imagine you and your friends are organizing a potluck dinner.

Old way (Traditional IT):

The “Cooking Team” designs the menu in secret for 3 months.

They write down every recipe perfectly.

Then they throw the recipes over a wall to the “Serving Team.”

The Serving Team opens the recipes, realizes the oven temperature is wrong, half the ingredients are missing, and nobody agrees on who brings plates.

Dinner arrives 6 months late. Cold. And it’s nothing like what anyone ordered.

DevOps way:

Cooking and Serving sit at the same table from day one.

They decide on small, simple dishes first (maybe just salad and bread).

They bring a little bit of food to the table every hour.

If the soup is too salty, they fix it immediately. No blame. Just taste and adjust.

Everyone eats well, on time, and actually enjoys the process.

That’s DevOps: small, fast, shared-responsibility, and always improving.

The Three Questions They Always Ask

  1. “So… is DevOps a person?” No. But many companies make it one. That’s like calling “teamwork” a person because you hired one guy named Team.

DevOps is a culture and a way of working. It’s when developers and operations people stop throwing work over the wall and start working together.

  1. “Do you just push buttons and make things faster?” Sort of. But the goal isn’t speed. The goal is less drama.

When you do DevOps well:

Deployments don’t require a 3 AM emergency call

Rolling back a bad change takes minutes, not days

You catch bugs before customers even notice

Speed is a side effect. Sanity is the real win.

  1. “Isn’t that just automation?” Automation is a tool, not the goal.

Think of a washing machine. It automates scrubbing, rinsing, and spinning. But if you never measure detergent, sort clothes, or fix leaks, you still get pink socks and flooded floors.

DevOps is the whole laundry system: automation + feedback + teamwork + learning from mistakes.

The Simple Definition I Use
DevOps is when the people who write the code and the people who run the code work together like a pit crew, not like a relay race.

A relay race hands off a baton and hopes for the best.
A pit crew communicates constantly, fixes problems in seconds, and every person cares about the finish line.

A Real-World Example (Without Tech)
Remember when your banking app used to crash every Sunday night for “maintenance”?

That was old IT: update once a week, cross your fingers, pray nothing breaks.

Now, apps update dozens of times per day, and you never notice.

That’s DevOps. Small, invisible changes, constantly flowing. If one tiny thing breaks, only a few people see it for a few seconds.

You don’t notice it working. That’s the whole point.

The One-Liner for Parties
When someone asks what you do, don’t say:

“I facilitate continuous integration pipelines with immutable infrastructure and declarative configuration management.”

Just say:

“I help teams ship software faster without breaking everything. It’s like teaching cooks and waiters to work in the same kitchen.”

Then take a sip of your drink and change the subject.

Final Thought
DevOps isn’t complicated. Tech people made it sound complicated because we love fancy terms.

At its heart, it’s two simple ideas:

Work together, not in silos.

Make small changes, learn fast, fix immediately.

That’s it. Even your non-tech friends can get that.

Now go explain DevOps to someone over coffee. You’ll be surprised how easy it is when you stop trying to sound smart.

How to Vibe Code Your First SaaS (Step-by-Step)

2026-04-23 21:30:00

Key Takeaways

  • Vibe coding lets you describe features in plain language and AI writes the code
  • Two paths: AI app builders (Lovable/Bolt) for speed, or AI coding tools for full control
  • A feature spec + architectural context = consistent, production-ready output
  • You can ship your first SaaS feature in a single session using the workflow in this guide

You can vibe code a SaaS in an afternoon. You can also spend that afternoon iterating on a dashboard Claude keeps redesigning from scratch — because your prompt was six words.

This is the step-by-step workflow I wish I'd had my first week. No specific tool required, no framework assumed. New to the concept? Read What Is Vibe Coding? first for background.

The first time I tried to vibe code a SaaS dashboard, I gave Claude Code a single sentence: "Build me a dashboard." Forty minutes later I was on my third complete rewrite — different layout, different data model, different component names each time. I closed the terminal, opened a notes file, and wrote six sentences: route, data sources, existing components, acceptance criteria, auth, layout wrapper. Twelve minutes after I pasted those six sentences back in, the feature was done and shipping. The spec wasn't overhead. It was the whole trick.

What You Need Before You Start

Before you write a single prompt, get these five things in place. None of them take more than an afternoon, and skipping any of them will cost you time later.

  1. An idea you can describe in one paragraph. You don't need a business plan. You need to be able to say: "I'm building X for Y people, and the first thing it does is Z." If you can't describe it simply, AI can't build it well.
  2. Version control (GitHub). Create a GitHub repository before writing any code. Every change is tracked, you can undo mistakes, and it's required for deployment. It's free — no excuses.
  3. A hosting platform. Vercel (best for Next.js), Netlify, or Railway. All have generous free tiers. You'll deploy from your GitHub repo — push code, and your site updates automatically.
  4. An AI coding tool. Claude Code for terminal-first agentic workflows, Cursor or Windsurf for IDE-integrated development. Pick one to start — you can always add more later.
  5. A project foundation. A starter kit or boilerplate with authentication, payments, and database already configured. Building this from scratch takes weeks and is the wrong use of your time when vibe coding for beginners.

Once these are in place, you're ready to start.

Step 1: Write a Feature Spec (Not Just a Prompt)

This is the single biggest differentiator between people who succeed with vibe coding and people who struggle. Don't jump straight into prompting. Write down what you want first.

A feature spec isn't a full product requirements document. It's 5–10 sentences that describe: what the feature does, who uses it, and what "done" looks like. It forces you to think before you prompt — and gives AI the clarity it needs to generate useful code on the first try.

Here's the difference between a vague prompt and a feature spec:

Vague Prompt

"Build me a dashboard."

AI will generate something — but it won't be what you wanted. You'll spend more time iterating than you saved.

Feature Spec

"Create a user dashboard page at /dashboard. Show the user's name from the session, their current subscription plan from Stripe, and a list of their 5 most recent projects with title, status, and last-modified date. Use the existing DashboardLayout component. Add a 'New Project' button that links to /projects/new. The page should be server-rendered and require authentication."

The difference is specificity. When AI knows the route, the data sources, the existing components, and the acceptance criteria, it generates code that actually fits your application. This is how to vibe code effectively — not with better AI, but with better inputs.

The Quick Path: Start with an AI App Builder

Before diving into the full workflow, it's worth knowing there's a faster option — with trade-offs.

AI app builders like Lovable and Bolt can generate a working application from a text description. You describe your SaaS, and they produce a deployed app with UI, database, authentication, and basic functionality — sometimes in minutes.

This path works well for:

  • Validating an idea quickly before investing more time
  • Building prototypes to show investors or early users
  • Non-technical founders who need a working version fast

The trade-offs are real, though. Customization is limited. Complex features hit walls. You're on their hosting, their infrastructure, their ecosystem. When you outgrow the builder — and most serious SaaS products do — migration is painful and sometimes impossible.

If you want full control over your codebase — production-ready architecture, custom features, your own hosting — keep reading. The rest of this vibe coding tutorial walks you through doing it with AI coding tools.

Step 2: Set Up Your Project Foundation

You can't vibe code into a blank folder effectively. AI needs existing patterns to follow — file structure, naming conventions, component library, API patterns. Without them, every prompt generates code in a different style, and your project becomes an inconsistent mess within a week.

You have two options:

  • Use a starter kit — A production-ready boilerplate with authentication, payments, database, and infrastructure already configured. This is the fastest path.
  • Set up manually — Initialize a Next.js (or other framework) project, add your ORM, configure authentication, wire up payments. This takes 1–2 weeks for a solid foundation but gives you full control from line one.

What matters is consistency: a predictable file structure, shared type definitions, reusable components the AI can reference. The difference between vibe coding a prototype and vibe coding production software is the foundation underneath.

Step 3: Give Your AI Tool Context About Your Project

This is the step most beginners skip — and the one that separates good AI output from generic AI output.

Every AI coding tool supports some form of project context file: AGENTS.md for Claude Code, .cursorrules for Cursor, .windsurfrules for Windsurf. These files tell the AI about your project's patterns before it generates code.

At minimum, include:

  • Your tech stack and framework versions
  • File and folder naming conventions
  • Key components and utilities the AI should reuse
  • Patterns to follow (e.g., "server actions go in src/actions/")

Example Context File (AGENTS.md)

Tech stack: Next.js 15, TypeScript, Prisma, PostgreSQL, Tailwind, shadcn/ui.
Components live in src/components/. Pages in src/app/.
Server actions in src/actions/ — always validate with Zod schemas.
Use the existing Button, Card, and DataTable components from our UI library.
All database queries go through Prisma — never raw SQL.

With context in place, AI generates code that matches your project's conventions instead of inventing its own. This is the foundation of structured vibe coding — and it's what makes vibe coding viable for production.

Step 4: Vibe Code Your First Feature

You have a spec, a foundation, and context. Now it's time to actually vibe code. Here's the workflow, step by step.

1. Share your feature spec with the AI

Open your AI tool and give it the feature spec you wrote in Step 1. If you're using Claude Code, paste it directly. In Cursor or Windsurf, open the composer/chat and share the spec along with any relevant files.

2. Let the AI propose a plan

Don't let AI start writing code immediately. Ask it to propose an implementation plan first: which files it will create or modify, what approach it will take, which existing components it will use. Review the plan before saying "go ahead."

3. Let it generate the code

Once the plan looks right, let AI write the code. For multi-file features, agentic tools like Claude Code will create and modify multiple files in one pass. IDE tools may handle it in stages.

4. Review what it produced

Before accepting anything, check:

  • Does the file structure match your project's conventions?
  • Did it reuse existing components or create unnecessary duplicates?
  • Are types correct? Are imports pointing to real files?
  • Does the feature actually work when you run it?

5. Iterate through conversation

AI rarely gets it perfect on the first pass — and that's fine. The power of this vibe coding tutorial is showing you that iteration is the workflow, not a failure.

Iteration Prompt

"The dashboard page works, but two things: move the subscription status into a separate card component, and add a loading skeleton while the projects list fetches. Also, the 'New Project' button should use our primary Button variant from the UI library, not a plain anchor tag."

Be specific. Reference file names, component names, and exact behaviors. The more precise your feedback, the more accurate the next iteration.

Note: If you find yourself giving the same feedback repeatedly — "always use our Button component," "add loading states to all data fetches" — encode it into a reusable skill or subagent. AI tools like Claude Code support custom skills that run the same review checklist every time, so you stop repeating yourself and your code stays consistent automatically.

Step 5: Review, Test, and Ship

Don't skip review just because AI wrote it. AI-generated code compiles, passes basic tests, and looks reasonable — but it can also introduce subtle bugs, security issues, and pattern inconsistencies that compound over time.

Before you merge or deploy, run through this checklist:

  1. Logic check. Does the feature actually do what the spec says? Test the happy path and at least one edge case.
  2. Security basics. Are inputs validated? Are database queries parameterized? Are auth checks in place?
  3. Pattern consistency. Does the code follow the same patterns as the rest of your project? Or did AI invent a new approach?
  4. Quality gates. Run your linter, type checker, and any tests you have. Ask AI to write tests for the feature it just built — it's good at this.

Note: Use AI for testing too. Connect a browser automation tool like Chrome DevTools MCP to your AI agent, pair it with a testing skill, and let it click through your feature, check layouts at different screen sizes, and flag visual or functional issues — before you even open the browser yourself.

Once everything passes, commit, push, and deploy. If you set up Vercel or Netlify in Step 1, pushing to GitHub triggers an automatic deploy. Your feature is live.

Worried about AI code quality at scale? Read our data-driven analysis on vibe coding's scaling problem →

3 Mistakes That Slow Down First-Time Vibe Coders

After watching dozens of developers learn how to vibe code, these are the patterns that waste the most time:

  1. Prompting without a spec. You describe something vague, AI generates something vague, you spend 30 minutes iterating to get what you could have specified in 2 minutes of writing. The spec is the shortcut.
  2. No project context. Without context files, AI generates generic code that doesn't match your patterns. You end up with three different button styles, two API patterns, and a file structure that doesn't match anything else in the project.
  3. Accepting everything without review. AI is confident, not correct. It will generate code that looks right, runs without errors, and has a subtle auth bypass or a missing edge case. Always review the diff before accepting.

Every one of these mistakes is recoverable. But avoiding them from the start means you spend your time building features, not fixing AI's assumptions.