MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

What PVS-Studio unicorn looked like in 2016

2026-01-26 20:42:32

Have you ever noticed the charming unicorn in PVS-Studio articles while scrolling through your feed? Perhaps, some readers know that our mascot is named Cody. Just as any other developer, this guy often hangs out with Bug.

The entire internet now reminisces about 2016, so we also recalled what Cody looked like ten years ago.


Cody and one of the early versions of Bug in 2016.

Compared to his current look, the Cody from our 2010s articles appears quite minimalistic. But even then, our developers managed to weave our mascot into memes.

Cody appeared on covers not only with bugs but also with mascots of other projects, like Linux.

Linux Kernel, tested by the Linux-version of PVS-Studio

Back in 2016, our programmers were already weaving memes and trends into the design specs for article covers. Still the mascot didn't appear on every illustration...

Over time, the team grew, the output increased, and popular memes became references for illustrations featuring Cody and Bug.

A clear nod to "Squid Game" in an article about the baseline mechanism

Cody's art style changed as well: the distinct, original stylization became a thing of the past. The mix of strict geometric and fluid elements, of equine and human anatomy, gave way to a more cartoonish look. The unicorn's anatomy became more detailed. And most importantly, the mascot became brighter!


PVS-Studio designers structured the mascot's artwork. You can read more about Cody's evolution and origin story here.

And yet, the 2016 Cody evokes a warm feeling of nostalgia for those cool times when everything was different.
What will Cody look like in 10 years? Instead of guessing, we suggest helping our mascot continue fighting bugs in code. We're offering a promo code that provides 30 days of free access to the analyzer: 'back_to_2016'.

Defeat bugs with PVS-Studio!

🔥 Contract Testing: the bug that passes CI and breaks production

2026-01-26 20:39:47

You deploy with confidence.
Tests pass.
The build is green. ✅

Minutes later…

🔥 Production is broken because an API field changed.

If this has already happened to you (or will), this post is for you.

Recently, we’ve started adopting contract testing at the company where I work, integrating it into the set of validations that already run in our CI pipeline. The company has a strong testing culture and well-established quality standards, and the goal here was to complement what was already in place by adding more confidence to service-to-service communication. In this post, I want to share what we’ve learned so far and my practical impressions of using contract testing in day-to-day development, keeping things straightforward and free from unnecessary theory.

❌ The problem traditional tests don’t solve

In modern architectures, this is common:

Unit tests pass

Integration tests pass

Swagger is up to date

But… the API consumer breaks at runtime

Why?

Because traditional tests validate implementations, not agreements between systems.

And that’s where the real problem lives.

🤝 What Contract Tests are (no academic definition)

Contract Testing is basically this:

A formal agreement between API consumers and providers.

It ensures both sides agree on:

Payload structure

Field types

Status codes

Implicit rules (required fields, formats, etc.)

If someone breaks that agreement…
🚫 the build fails before reaching production.

💥 A simple (and painful) example

The consumer expects this:

{
  "id": 1,
  "name": "Patrick"
}

The provider decides to “improve” the API:

{
  "id": 1,
  "fullName": "Patrick Bastos"
}

✔️ Backend works
✔️ Swagger updated
✔️ Tests pass

❌ Frontend breaks
❌ App crashes
❌ The user finds out first

This bug is not about code — it’s about communication.

🛡️ Where Contract Tests come in

With Contract Testing, the flow changes:

Consumer defines expectations
↓
Contract is generated
↓
Provider validates the contract
↓
Deploy only happens if the contract is respected

In other words:

Whoever changes the API without warning… breaks their own pipeline.

And that’s beautiful ❤️

🧰 The most used tool in .NET: Pact

In the .NET ecosystem, the most mature and widely adopted tool is Pact, using PactNet.

Why Pact works so well

Consumer-Driven Contract Testing (CDC)

Tests written in C#

Versioned contracts

Automatic provider verification

Easy CI/CD integration

🧪 How this works in practice (very short version)
On the Consumer side

You write a test saying:

“When I call /customers/1…”

“I expect this response…”

That test generates a contract file.

On the Provider side

The backend runs a test validating:

“Does my API still respect this contract?”

If not:
❌ build fails
❌ deploy blocked

No production surprises.

⚠️ Contract Testing is NOT a silver bullet

Important to be clear:

❌ It doesn’t replace integration tests
❌ It doesn’t test business rules
❌ It doesn’t guarantee bug-free code

✅ It guarantees communication stability
✅ It prevents breaking changes
✅ It reduces silent incidents

🟢 When it’s REALLY worth it

Microservices

Multiple teams

Public APIs

Independent deployments

Frequently evolving endpoints

🟡 Maybe not worth it (for now)

Simple monoliths

Small teams

Joint deployments

Low complexity

✅ Quick checklist (real-world lessons)

❌ Don’t rely only on Swagger

✅ Version your contracts

✅ Let the consumer define the contract

❌ Don’t couple contracts to implementation

✅ Run contract tests in CI

✅ Fail fast

🎯 Conclusion

Contract Tests don’t prevent bugs.

They prevent surprises.

And in production, surprise usually means incident.

If your API changes without fear,
your consumer suffers without warning.

💬 Let’s talk

Have you ever broken production because of a contract?

Have you used Pact or another tool?

Are you still relying only on Swagger?

Drop a comment 👇
These pains are collective 😄

Linux - The 'df -h' Command

2026-01-26 20:35:01

One of the first disk commands new Linux users learn is df -h. The 'df' stands for disk filesystem and the '-h' option modifies the df command so that disk sizes are displayed in a human-readable format, using KB, MB, or GB instead of raw blocks.

The 'df -h' command is simple, fast, and incredibly useful — but the output can be confusing at first glance. You’ll often see entries that don’t look like real disks at all. Beginners may be left wondering what’s actually using storage and what isn’t.

Here’s a real example of df -h output from a desktop Linux system:

Filesystem Size Used Avail Use% Mounted on
udev 5.7G 0 5.7G 0% /dev
tmpfs 1.2G 1.7M 1.2G 1% /run
/dev/nvme1n1p2 467G 63G 381G 15% /
tmpfs 5.8G 4.0K 5.8G 1% /dev/shm
tmpfs 5.0M 8.0K 5.0M 1% /run/lock
efivarfs 192K 147K 41K 79% /sys/firmware/efi/efivars
/dev/nvme1n1p1 256M 55M 202M 22% /boot/efi
tmpfs 1.2G 4.1M 1.2G 1% /run/user/1000

The most important thing to understand is that not everything shown here is a physical disk. Some entries represent memory-backed or firmware-backed filesystems that behave like disks but don’t consume SSD or HDD space.

The line mounted at / is the key one for most users. In this case, /dev/nvme1n1p2 is the main root filesystem. This is where the operating system, applications, and most user data live. The Size, Used, and Avail columns here reflect real, persistent storage on the NVMe drive.

Entries like tmpfs and udev are different. These are virtual filesystems backed by RAM, not disk. They exist to support running processes, system services, and inter-process communication. Their contents are temporary and are cleared on reboot. Seeing large sizes here does not mean your disk is being consumed.

/dev/shm, /run, and /run/user/1000 are all examples of RAM-based storage used for performance and cleanliness. Lock files, sockets, and session data live here so they don’t clutter the real filesystem or survive reboots unnecessarily.

The EFI-related entries are also special. efivarfs exposes UEFI firmware variables to the operating system, while /boot/efi is a small dedicated partition used by the firmware to start the system. These are intentionally small and normally show higher usage percentages without causing problems.

Once you recognize the difference between real storage and virtual filesystems, df -h becomes much easier to read. For disk space concerns, focus on the filesystem mounted at / and any separate /home or data partitions. The rest are part of Linux doing its job quietly in the background.

Linux Learning Series - Ben Santora - January 2026

Build Production-ready AI Agents with AWS Bedrock & Agentcore

2026-01-26 20:31:03

So you've heard about AI agents, right? They're everywhere now… automating workflows, answering customer queries, and even planning product launches.

But here's the thing: building one that actually works in production is a whole different game compared to throwing together a ChatGPT wrapper.

I recently built an "AI-powered Product Hunt launch assistant" during the AWS AI Agent Hackathon at AWS Startup Loft, Tokyo. And honestly? It taught me a ton about what it takes to build production-ready AI agents on AWS.

In this article, I'll try to walk you through the process of how to build on AWS, think about the architecture, the tools, and share the lessons that I learnt, so you can build your own AI agent-based projects without the trial-and-error pain.

The hackathon crew at AWS AI Agent Hackathon
The hackathon crew at AWS AI Agent Hackathon

What We're Building: The Product Hunt Launch Assistant

Before diving into the tech, let me give you context. The Product Hunt Launch Assistant is an AI agent that helps entrepreneurs plan and execute their Product Hunt launches. It can:

  • Generate comprehensive launch timelines with task dependencies
  • Create marketing assets (taglines, tweets, descriptions)
  • Research successful launches in our category
  • Recommend hunters and outreach strategies
  • Remember our product context across sessions (yes, it has memory!)

The interface to prepare your project info to launch
The interface to prepare your project info to launch

The chat interface with real-time streaming responses
The chat interface with real-time streaming responses

The cool part? All of this runs on AWS Bedrock using the Strands Agents SDK and AgentCore Memory. Let me break down how it all works.

If you wanna follow along or check out the code, head over to Github.

The AWS AI Agent Stack

The core components that comprise the stack are going to be:

Component What It Does
AWS Bedrock Managed AI service that gives us access to foundation models like Claude, Amazon Nova, Llama, etc.
Strands Agents SDK AWS's open-source framework for building AI agents with Python
AgentCore Runtime Serverless execution environment for our agents with session isolation
AgentCore Memory Persistent memory system for maintaining context across sessions
AgentCore Gateway Connects our agents to APIs, Lambda functions, and MCP servers

AI Agent Architecture

Let's look at how the Product Hunt Launch Assistant is structured:

Full system architecture
Full system architecture

1. The Agent Layer (Strands SDK)

The heart of the application is the ProductHuntLaunchAgent class, where we create the agent.

from strands import Agent
from strands.models import BedrockModel

class ProductHuntLaunchAgent:
    def __init__(self, region_name: str = None, user_id: str = None, session_id: str = None):
        # Initialize the Bedrock model (Claude 3.5 Haiku)
        self.model = BedrockModel(
            model_id="anthropic.claude-3-5-haiku-20241022-v1:0",
            temperature=0.3,
            region_name=self.region,
            stream=True  # Enable streaming
        )

        # Create the agent with Product Hunt tools and memory hooks
        self.agent = Agent(
            model=self.model,
            tools=[
                generate_launch_timeline,
                generate_marketing_assets,
                research_top_launches,
            ],
            system_prompt=self.system_prompt,
            hooks=[self.memory_hooks],
        )

What I love about Strands is how minimal the code is. You give it:

  • A model (Claude via Bedrock in this case)
  • A list of tools the agent can use
  • A system prompt with domain expertise
  • Optional hooks for things like memory

And that's it! The framework handles all the reasoning, tool selection, and response generation. No complex prompt chains or hardcoded workflows.

2. Custom Tools with the @tool Decorator

Tools are where our agent gets its superpowers. Strands makes it dead simple with the @tool decorator:

from strands import tool

@tool
def generate_launch_timeline(
    product_name: str,
    product_type: str,
    launch_date: str,
    additional_notes: str = ""
) -> Dict[str, Any]:
    """
    Generate a comprehensive launch timeline and checklist for Product Hunt launch.

    Args:
        product_name: Name of the product to launch
        product_type: Type of product (SaaS, Mobile App, Chrome Extension, etc.)
        launch_date: Target launch date (e.g., "next Tuesday", "December 15, 2024")
        additional_notes: Any additional requirements or constraints
    """
    # Parse launch date, calculate timeline, return structured data
    parsed_date = parse_launch_date(launch_date)
    days_until_launch = calculate_timeline_days(parsed_date)

    timeline = create_timeline(product_name, product_type, parsed_date, days_until_launch)

    return {
        "success": True,
        "timeline": timeline,
        "total_days": days_until_launch,
        "launch_date": format_date_for_display(parsed_date),
        "key_milestones": extract_milestones(timeline)
    }

The docstring is super important here!!! as it tells the AI model what the tool does and when to use it. The model reads this and decides autonomously when to invoke each tool based on user queries.

AI-generated launch timeline with detailed task breakdown
AI-generated launch timeline with detailed task breakdown

3. The Memory System (AgentCore Memory)

This is where things get interesting. Most AI chatbots are stateless. They forget everything after each conversation. But for a SaaS product, we need persistence. We need the agent to remember:

  • What product is the user launching
  • Their preferences and communication style
  • Previous recommendations and decisions

AgentCore Memory solves this with two types of memory:

  • Short-term memory: Keeps track of the current conversation
  • Long-term memory: Stores key insights across multiple sessions

Here's how I implemented memory hooks with Strands:

from bedrock_agentcore.memory import MemoryClient
from strands.hooks import HookProvider, HookRegistry, MessageAddedEvent, AfterInvocationEvent

class ProductHuntMemoryHooks(HookProvider):
    def __init__(self, memory_id: str, client: MemoryClient, actor_id: str, session_id: str):
        self.memory_id = memory_id
        self.client = client
        self.actor_id = actor_id
        self.session_id = session_id

    def retrieve_product_context(self, event: MessageAddedEvent):
        """Retrieve product and user context BEFORE processing the query."""
        user_query = messages[-1]["content"][0]["text"]

        # Get relevant memories from both namespaces
        for context_type, namespace in self.namespaces.items():
            memories = self.client.retrieve_memories(
                memory_id=self.memory_id,
                namespace=namespace.format(actorId=self.actor_id),
                query=user_query,
                top_k=3,
            )
            # Inject context into the user's message
            # ...

    def save_launch_interaction(self, event: AfterInvocationEvent):
        """Save the interaction AFTER the agent responds."""
        self.client.create_event(
            memory_id=self.memory_id,
            actor_id=self.actor_id,
            session_id=self.session_id,
            messages=[
                (user_query, "USER"),
                (agent_response, "ASSISTANT"),
            ],
        )

The key insight here is the hook system. Before each message is processed, we retrieve relevant memories and inject them as context. After the agent responds, we save the interaction for future reference.

The agent remembering product context across sessions
The agent remembering product context across sessions

I set up two memory strategies:

  • USER_PREFERENCE: Stores user preferences, communication style, strategic approaches, etc.
  • SEMANTIC: Stores factual information about products, launch strategies, recommendations, etc.

The memories expire after 90 days (configurable), and the memory ID is stored in AWS SSM Parameter Store for persistence.

Building the API Layer

For the web interface, I used FastAPI with Server-Sent Events (SSE) for streaming responses:

from fastapi import FastAPI
from fastapi.responses import StreamingResponse

app = FastAPI()

@app.post("/api/chat-stream")
async def chat_stream(request: ChatRequest):
    async def event_generator():
        agent = get_or_create_agent(request.user_id, request.session_id)

        for chunk in agent.chat_stream(request.message):
            yield f"data: {json.dumps({'content': chunk})}\n\n"

        yield "data: [DONE]\n\n"

    return StreamingResponse(event_generator(), media_type="text/event-stream")

The streaming experience is crucial for UX. Nobody wants to stare at a loading spinner for 10 seconds.

And, that's actually it!

It's that simple and straightforward.

You can simply add a UI to the backend (or vibe code it), and you've built a fully functional, scalable, and production-ready SaaS for yourself.

But, to truly make it production-ready, there are a few things that I'd do differently.

Lessons Learned: What I'd Do Differently

1. Start with AgentCore Runtime for Production

When I built this, I ran the agent locally. For production, I'd use AgentCore Runtime from day one. It gives you:

  • Session isolation (no state leaking between users)
  • 8-hour execution windows (for long-running tasks)
  • Pay-per-use pricing
  • Built-in security with identity management

2. Use Infrastructure as Code

The hackathon version has no Terraform/CDK. Big mistake for production. I'd always go with IaC to keep things consistent and manageable:

  • Memory resources defined in CloudFormation
  • Lambda functions for tools (if needed)
  • Proper IAM roles and policies

Probably wanna check this official AWS article, which outlines building AI Agents with CloudFormation.

3. Build Modular Tools

My tools are pretty monolithic. In hindsight, I'd break them into smaller, composable pieces. For example:

  • parse_date tool
  • calculate_timeline tool
  • format_output tool

This makes the agent more flexible and easier to test.

4. Plan for Multi-Agent Systems

The Product Hunt assistant is a single agent. But as your SaaS grows, you'll want multiple agents working together:

  • A research agent that finds competitor data
  • A content agent that writes marketing copy
  • A scheduling agent that optimizes launch timing
  • An orchestrator agent that coordinates everything

Also, apart from these, some best practices that I'd put into place would be to:

  • Collect ground truth data: building a dataset of user queries and expected responses for testing
  • Use Bedrock Guardrails: adding safety rails to prevent harmful outputs
  • Monitor with AgentCore Observability: integrating with CloudWatch, Datadog, or LangSmith for clear insights and observability
  • Test tool selection: making sure the agent picks the right tool for each query

Alright, that's it! If you've made it this far, you now know more about building AI agents on AWS than most developers out there. The stack is still evolving fast, but the fundamentals of tools, memory, and agents aren't going anywhere.

If you wanna try it out yourself, find the code for the assistant on Github.

So go ahead, clone the repo, break things, and build something cool. And hey, if you end up launching on Product Hunt using this assistant, let me know, I'd love to see what you ship!

Planning Is the Real Superpower of Agentic Coding

2026-01-26 20:24:31

I see this pattern constantly: someone gives an LLM a task, it starts executing immediately, and halfway through you realize it's building the wrong thing. Or it gets stuck in a loop. Or it produces something that technically works but doesn't fit the existing codebase at all.

The instinct is to write better prompts. More detail. More constraints. More examples.

The actual fix is simpler: make it plan before it executes.

Research shows that separating planning from execution dramatically improves task success rates—by as much as 33% in complex scenarios.

In earlier articles, I wrote about why LLMs struggle with first attempts and why overloading AGENTS.md is often a symptom of that misunderstanding. This article focuses on what actually fixes that.

Why "Just Execute" Fails

This took me longer to figure out than I'd like to admit. When you ask an LLM to directly implement something, you're asking it to:

  1. Understand the requirements
  2. Analyze the existing codebase
  3. Design an approach
  4. Evaluate trade-offs
  5. Decompose into steps
  6. Execute each step
  7. Verify results

All in one shot. With one context. Using the same cognitive load throughout.

Even powerful LLMs struggle with this. Not because they lack capability, but because long-horizon planning is fundamentally hard in a step-by-step mode.

The Plan-Execute Architecture

Research on LLM agents has consistently shown that separating planning and execution yields better results.

The reasons:

Benefit Explanation
Explicit long-term planning Even strong LLMs struggle with multi-step reasoning when taking actions one at a time. Explicit planning forces consideration of the full path.
Model flexibility You can use a powerful model for planning and a lighter model for execution—or even different specialized models per phase.
Efficiency Each execution step doesn't need to reason through the entire conversation history. It just needs to execute against the plan.

What matters here: the plan becomes an artifact, and the execution becomes verification against that artifact.

If you've read about why LLMs are better at verification than first-shot generation, this should sound familiar. Creating a plan first converts the execution task from "generate good code" to "implement according to this plan"—a much clearer, more verifiable objective.

The Full Workflow

The complete picture:

Step 1: Preparation
    │
    ▼
Step 2: Design (Agree on Direction)
    │
    ▼
Step 3: Work Planning  ← The Most Important Step
    │
    ▼
Step 4: Execution
    │
    ▼
Step 5: Verification & Feedback

I'll walk through each step, but Step 3 is where the magic happens.

Step 1: Preparation

Goal: Clarify what you want to achieve, not how.

  • Create a ticket, issue, or todo document stating the goal in plain language
  • Point the LLM to AGENTS.md (or CLAUDE.md, depending on your tool) and relevant context files
  • Don't jump into implementation details yet

This is about setting the stage, not solving the problem.

Step 2: Design (Agree on Direction)

Goal: Align on the approach before any code gets written.

Don't Let It Start Coding Immediately

Instead of "implement this feature," say:

"Before implementing, present a step-by-step plan for how you would approach this."

Review the Plan

Look for:

  • Contradictions with existing architecture
  • Simpler alternatives the LLM missed
  • Misunderstandings of the requirements

At this stage, you're agreeing on what to build and why this approach. The how and in what order come in Step 3.

Step 3: Work Planning (The Most Important Step)

This section is dense. But the payoff is proportional—the more carefully you plan, the smoother execution becomes.

For small tasks, you don't need all of this. See "Scaling to Task Size" at the end.

Goal: Convert the design into executable work units with clear completion criteria.

Why This Step Matters Most

Research shows that decomposing complex tasks into subtasks significantly improves LLM success rates. Step-by-step decomposition produces more accurate results than direct generation.

But there's another reason: the work plan is an artifact.

When the plan exists, the execution task transforms:

  • Before: "Build this feature" (generation)
  • After: "Implement according to this plan" (verification)

This is the same principle from Article 1. Creating a plan first means execution becomes verification—and LLMs are better at verification.

What Work Planning Includes

  1. Task decomposition: Break the design into executable units
  2. Dependency mapping: Define order and dependencies between tasks
  3. Completion criteria: What does "done" mean for each task?
  4. Checkpoint design: When do we get external feedback?

Perspectives to Consider

I'll be honest: I learned most of these the hard way. Plans would fall apart mid-implementation, and only later did I realize I'd skipped something obvious in hindsight.

These aren't meant to be followed rigidly for every task. Think of them as a mental checklist. You don't need to get all of these right—if even one of these perspectives changes your plan, it's doing its job.

Perspective 1: Current State Analysis

Understand what exists before planning changes.

  • What is this code's actual responsibility?
  • Which parts are essential business logic vs. technical constraints?
  • What benefits and limitations does the current design provide?
  • What implicit dependencies or assumptions aren't obvious from the code?

Skipping this leads to plans that don't fit the existing codebase.

Perspective 2: Strategy Selection

Consider how to approach the transition from current to desired state.

Research options:

  • Look for similar patterns in your tech stack
  • Check how comparable projects solved this
  • Review OSS implementations, articles, documentation

Common strategy patterns:

  • Strangler Pattern: Gradual replacement, incremental migration
  • Facade Pattern: Hide complexity behind unified interface
  • Feature-Driven: Vertical slices, user-value first
  • Foundation-Driven: Build stable base first, then features on top

The key isn't applying patterns dogmatically—it's consciously choosing an approach instead of stumbling into one.

Perspective 3: Risk Assessment

Evaluate what could go wrong with your chosen strategy.

Risk Type Considerations
Technical Impact on existing systems, data integrity, performance degradation
Operational Service availability, deployment downtime, rollback procedures
Project Schedule delays, learning curve, team coordination

Skipping risk assessment leads to expensive surprises mid-implementation.

Perspective 4: Constraints

Identify hard limits before committing to a strategy.

  • Technical: Library compatibility, resource capacity, performance requirements
  • Timeline: Deadlines, milestones, external dependencies
  • Resources: Team availability, skill gaps, budget
  • Business: Time-to-market, customer impact, regulations

A strategy that ignores constraints isn't executable.

Perspective 5: Completion Levels

Define what "done" means for each task—this is critical.

Level Definition Example
L1: Functional verification Works as user-facing feature Search actually returns results
L2: Test verification New tests added and passing Type definition tests pass
L3: Build verification No compilation errors Interface definition complete

Priority: L1 > L2 > L3. Whenever possible, verify at L1 (actually works in practice).

This directly maps to "external feedback" from the previous articles. Defining completion levels upfront ensures you get external verification at each checkpoint.

Perspective 6: Integration Points

Define when to verify things work together.

Strategy Integration Point
Feature-driven When users can actually use the feature
Foundation-driven When all layers are complete and E2E tests pass
Strangler pattern At each old-to-new system cutover

Without defined integration points, you end up with "it all works individually but doesn't work together."

Task Decomposition Principles

After considering the perspectives, break down into concrete tasks:

Executable granularity:

  • Each task = one meaningful commit
  • Clear completion criteria
  • Explicit dependencies

Minimize dependencies:

  • Maximum 2 levels deep (A→B→C is okay, A→B→C→D needs redesign)
  • Tasks with 3+ chained dependencies should be split
  • Each task should ideally provide independent value

Build quality in:

  • Don't make "write tests" a separate task—include testing in the implementation task
  • Tag each task with its completion level (L1/L2/L3, though in practice L1 is almost always what you want)

Work Planning Anti-Patterns

Anti-Pattern Consequence
Skip current-state analysis Plan doesn't fit codebase
Ignore risks Expensive surprises mid-implementation
Ignore constraints Plan isn't executable
Over-detail Lose flexibility, waste planning time
Undefined completion criteria "Done" is ambiguous, verification impossible

Scaling to Task Size

Not every task needs full work planning.

Scale Planning Depth
Small (1-2 hours) Verbal/mental notes or simple TODO list
Medium (1 day to 1 week) Written work plan, but abbreviated
Large (1+ weeks) Full work plan covering all perspectives

For a typo fix, you don't need a work plan. For a multi-week refactor, you absolutely do.

Step 4: Execution

Goal: Implement according to the work plan.

Work in Small Steps

Follow the plan. One task at a time. One file, one function at a time where appropriate.

Types-First

When adding new functionality, define interfaces and types before implementing logic. Type definitions become guardrails that help both you and the LLM stay on track.

Why This Changes Everything

With a work plan in place, execution becomes verification. The LLM isn't guessing what to build—it's checking whether the implementation matches the plan.

If you need to deviate from the plan, update the plan first, then continue implementation. Don't let plan and implementation drift apart.

Step 5: Verification & Feedback

Goal: Verify results and externalize learnings.

Feedback Format

When something goes wrong, don't just paste an error. Include the intent:

❌ Just the error
[error log]

✅ Intent + error
Goal: Redirect to dashboard after authentication
Issue: Following error occurs
[error log]

Without intent, the LLM optimizes for "remove the error." With intent, it optimizes for "achieve the goal."

Externalize Learnings

If you find yourself explaining the same thing twice, it's time to write it down.

I covered this in detail in the previous article—where to put rules, what to write, and how to verify they work. The short version: write root causes, not specific incidents, and put them where they'll actually be read.

Referencing Skills and Rules

One common failure mode: you reference a skill or rule file, but the LLM just reads it and moves on without actually applying it.

The Problem

Pattern Issue
Write "see AGENTS.md" It's already loaded—redundant reference adds noise
@file.md only LLM reads it, then continues. Reading ≠ applying
"Please reference X" References it minimally, doesn't apply the content

The Solution: Blocking References

Make the reference a task with verification:

## Required Rules [MANDATORY - MUST BE ACTIVE]

**LOADING PROTOCOL:**
- STEP 1: CHECK if `.agents/skills/coding-rules/SKILL.md` is active
- STEP 2: If NOT active → Execute BLOCKING READ
- STEP 3: CONFIRM skill active before proceeding

Why This Works

Element Effect
Action verbs "CHECK", "READ", "CONFIRM"—not just "reference"
STEP numbers Forces sequence, can't skip
Before proceeding Blocking—must complete before continuing
If NOT active Conditional—skips if already loaded (efficiency)

This maps to the task clarity principle: "check if loaded → load if needed → confirm → proceed" is far clearer than "please reference this file."

How This Connects to the Theory

Step Connection to LLM Characteristics
Step 1: Preparation Task clarification
Step 2: Design Artifact-first (design doc is an artifact)
Step 3: Work Planning Artifact-first (plan is an artifact) + external feedback design
Step 4: Execution Transform "generation" into "verification against plan"
Step 5: Verification Obtain external feedback + externalize learnings

The work plan created in Step 3 converts Step 4 from "generate from scratch" to "verify against specification." This is the key mechanism for improving accuracy.

The Research

The practices in this article aren't just workflow opinions—they're backed by research on how LLM agents perform.

ADaPT (Prasad et al., NAACL 2024): Separating planning and execution, with dynamic subtask decomposition when needed, achieved up to 33% higher success rates than baselines (28.3% on ALFWorld, 27% on WebShop, 33% on TextCraft).

Plan-and-Execute (LangChain): Explicit long-term planning enables handling complex tasks that even powerful LLMs struggle with in step-by-step mode.

Multi-Layer Task Decomposition (PMC, 2024): Step-by-step models generate more accurate results than direct generation—task decomposition directly improves output quality.

Task Decomposition (Amazon Science, 2025): With proper task decomposition, smaller specialized models can match the performance of larger general models.

Key Takeaways

  1. Don't let it execute immediately. Ask for a plan first. Even just "present your approach step-by-step before implementing" makes a significant difference.
  2. Work Planning is the superpower. A plan is an artifact. Having it converts execution from generation to verification—and LLMs are better at verification.
  3. Define completion criteria. L1 (works as feature) > L2 (tests pass) > L3 (builds). Know what "done" means before starting.
  4. Scale to task size. Small task = mental note. Large task = full work plan. Don't over-plan trivial work, don't under-plan complex work.
  5. Update plan before deviating. If implementation needs to differ from the plan, update the plan first. Drift kills the verification benefit.
  6. Include intent with errors. "Goal + error" beats "just error." The LLM should know what you're trying to achieve, not just what went wrong.

References

  • Prasad, A., et al. (2024). "ADaPT: As-Needed Decomposition and Planning with Language Models." NAACL 2024 Findings. arXiv:2311.05772
  • Wang, L., et al. (2023). "Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models." ACL 2023.
  • LangChain. "Plan-and-Execute Agents." https://blog.langchain.com/planning-agents/

[SUI] List

2026-01-26 20:20:46

List es un contenedor que presenta filas de datos apiladas en una sola columna, opcionalmente proveyendo la habilidad de seleccionar uno o más miembros.

Los items de una lista están automáticamente separados por una línea. Además, List incluye la funcionalidad de seleccionar, añadir o eliminar contenido.

  • init(_:rowContent:): data es la colección de valores para crear las filas, cuyos elementos deben ser Identifiable. rowContent es un closure que define las vistas usadas para crear las filas.
  • init(_:id:rowContent:): id es el KeyPath al identificador de cada valor dentro de data.
  • init(_:selection:rowContent:): selection es un Binding que almacena uno o el conjunto de identificadores de las filas seleccionadas.
struct ContentView: View {
  var body: some View {
    List {
      Text("A List Item")
      Text("A Second List Item")
      Text("A Third List Item")
    }
  }
}