MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

Building a Markdown editor (Markflow)

2026-04-24 16:16:35

I’ve been working with Markdown editors both as a user. At some point I wanted to better understand how they actually behave under the hood, especially when documents get more complex (code blocks, math, diagrams, etc.).

As a small exploration, I built a minimal editor to experiment with these ideas:
https://github.com/IgorKha/markflow

You can try it here:
https://igorkha.github.io/markflow/

This post is not an announcement, but a summary of a few implementation decisions that might be useful if you’re building something similar.

Starting point

The initial goal was simple:

  • keep Markdown as the source of truth
  • support common extensions (code, math, diagrams)
  • avoid introducing a separate document format

From there, most of the work ended up around how editing is handled, not rendering.

Using Monaco as the editor layer

The editor is built on top of Monaco.

This gives:

  • a mature text editing model
  • good performance characteristics
  • predictable behavior for selections, undo/redo, etc.

At the same time, Monaco operates on plain text, while Markdown has an implicit structure. Bridging that gap becomes the central problem.

Working with Markdown as a structure

Instead of treating Markdown purely as a string, the implementation keeps track of its structure (via parsing into an AST).

This allows reasoning in terms of:

  • blocks (paragraphs, headings, lists)
  • fenced regions (code, math, diagrams)
  • document hierarchy

Even partial structural awareness helps avoid some classes of issues when editing mixed content.

Synchronizing structure and editor state

One of the core pieces is keeping two representations in sync:

  • the text inside Monaco
  • the parsed Markdown structure

This synchronization is used to:

  • detect which block the cursor is currently in
  • apply transformations without breaking surrounding content
  • keep rendering consistent with the editor state

This part is still evolving, but it defines most of the internal complexity.

Rendering pipeline

Rendering is based on standard tools from the Markdown ecosystem:

  • syntax highlighting via highlight.js
  • math rendering via KaTeX
  • diagrams via Mermaid

These are applied on top of the parsed Markdown rather than directly on raw text, which keeps responsibilities separated:

  • parsing → structure
  • rendering → visual output

Mobile behavior

The editor is designed to work in a browser without assuming a desktop environment.

Some adjustments were made so that:

  • layout adapts to smaller screens
  • scrolling and input remain usable
  • documents can be viewed and edited on mobile devices

This is not a separate mobile version — just the same editor adapted to different screen sizes.

Sharing without a backend

There is no backend in this project.

To make sharing possible, the document state can be encoded into a URL. Opening that link reconstructs the document in the editor.

This approach:

  • does not require storage
  • does not persist data server-side
  • works for quick sharing or examples

It’s intentionally simple and limited by URL size, but sufficient for lightweight use cases.

Closing

This project is mainly a technical exploration of how Markdown editing can be structured internally while still using a standard text editor as a base.

If you’re working on something similar, feedback or discussion would be useful.

Sustainability Isn't an Afterthought — It's an Architectural Choice

2026-04-24 16:14:33

Every Earth Day I see the same posts — reusable cups, bike commutes, paperless offices. All fine. But working on enterprise architecture, I keep circling back to a quieter question:

How much does our platform choice actually matter for sustainability?

Turns out, quite a lot. IBM's net-zero 2030 commitment isn't just policy — it's baked into how IBM Z and IBM LinuxONE are designed. A few things stand out to me as an architect:

🔹 Efficiency by design

Both platforms enable large-scale workload consolidation — fewer cores, less energy, lower CO₂e compared to sprawling distributed x86 environments.

🔹 Sustainable scale

Replacing thousands of x86 cores with a single highly utilized system cuts power, cooling, and data-center footprint. No resilience trade-off. No performance trade-off.

🔹 AI without the energy penalty

On-chip AI acceleration delivers real-time inference at the core. As AI workloads grow, this matters more every quarter.

🔹 Built for ESG transparency

Integrated environmental monitoring gives real operational data for ESG reporting — not estimates, not vendor-supplied guesses.

🌱 The Earth Day takeaway

Sustainability isn't something you bolt on after the architecture decisions are made. It starts with the core.

IBM Z and IBM LinuxONE help turn net-zero commitments into measurable action.

If you're thinking about workload consolidation, net-zero commitments, or the hidden cost of your infrastructure footprint — worth a look:

EarthDay #IBMZ #LinuxONE #SustainableIT

I Stopped Using Playwright. Here's What Replaced It.

2026-04-24 16:13:27

I stopped writing Playwright tests for integration flows. Not because they stopped working — they still work fine. But once I tried testing with Claude subagents and agent-browser, going back felt like writing jQuery after learning React.

Here's what changed.

What Playwright gets wrong

Playwright was designed for single-user, deterministic UI flows. You write selectors, set up auth fixtures, mock state, and run scripts that click through a fixed sequence. For a simple login-and-checkout flow, it's fine.

But most real apps have multiple roles interacting with shared state. A customer submits a request. An operator reviews it and assigns a specialist. The specialist does work. The customer pays. The operator ships. Each step depends on the previous one, and each actor is a different authenticated user.

In Playwright, this means:

  • Multiple auth state files (one per role)
  • Fixtures that seed the database before each test
  • Selectors that break every time the UI changes
  • Hundreds of lines of boilerplate before you've tested a single real interaction

You end up maintaining a parallel codebase just to describe what users already do naturally.

What agent-browser does instead

agent-browser is a CLI that lets AI agents control a browser via the accessibility tree. Instead of writing page.locator('[data-testid="submit"]').click(), you describe what you want in plain language and the agent figures out how to do it.

No selectors. No brittle CSS paths. If the button exists and has a label, the agent finds it. If the UI changes, the test doesn't break — the agent adapts.

For role isolation, you use Chrome profile directories. One directory per role, logged in once:

mkdir -p ~/.config/google-chrome/app-customer
mkdir -p ~/.config/google-chrome/app-operator
mkdir -p ~/.config/google-chrome/app-specialist
npx agent-browser \
  --profile ~/.config/google-chrome/app-operator \
  --headed \
  open https://yourapp.com/sign-in

The session persists. Every subsequent headless run using --profile picks it up automatically.

On magic links: use yopmail for test accounts — disposable inboxes, no registration, magic links work out of the box. If you hit email rate limits, generate the magic link URL directly via the admin API and navigate to it, no email sent:

curl -X POST https://<project>.supabase.co/auth/v1/admin/generate_link \
  -H "Authorization: Bearer $SERVICE_ROLE_KEY" \
  -H "Content-Type: application/json" \
  -d '{"type":"magiclink","email":"[email protected]"}' \
  | python3 -c "import json,sys; print(json.load(sys.stdin)['action_link'])"

The orchestrator pattern

For multi-role golden path testing, one Claude session acts as an orchestrator. It spawns one subagent at a time, each operating as a specific role. State flows forward through a shared JSON file.

Orchestrator (main Claude session)
  ├── spawn Agent(customer)    → submits request  → writes request_id
  ├── spawn Agent(operator)    → assigns handler  → writes handler_id
  ├── spawn Agent(specialist)  → does work
  ├── spawn Agent(operator)    → reviews + approves
  ├── spawn Agent(customer)    → pays or confirms
  └── spawn Agent(customer)    → leaves feedback

The state file:

{
  "run_id": "run-2026-04-24",
  "current_request_id": null,
  "confirmation_token": null,
  "steps_completed": []
}

Each subagent gets a self-contained prompt with the current state injected. It reports back any new values — IDs, tokens visible in URLs — and the orchestrator writes them before spawning the next agent.

Results are appended to a log file, log-and-continue, never stop on failure:

## [operator] Assign Specialist — PASS
## [specialist] Submit Work — PASS
## [operator] Approve Work — FAIL: approve button not found
## [customer] Confirm Receipt — PASS

The cost argument is dead

The main counterargument to AI-based testing has always been cost. Claude API calls aren't free, Playwright is.

But if you're running Claude Code on a subscription, that argument disappears. Subagents run against your subscription, not per-token billing. A full 10-step golden path run costs nothing extra.

The only remaining case for Playwright is raw speed — milliseconds per test vs minutes for an agent run. That matters if you need tests on every commit in a tight CI loop. For pre-deploy checks, QA runs, or anything not in a sub-second CI pipeline, there's no practical reason to choose Playwright.

What this means in practice

I haven't written a Playwright test in months. The agent tests cover more ground, break less often, and took a fraction of the time to set up. The only thing I gave up is being able to run them on every commit — which, for integration tests covering a 6-role flow, was never realistic anyway.

If you're still writing Playwright tests for multi-role integration flows, try this setup once. You probably won't go back.

LLM-Native APIs: How the Runtime behind REST Changed Fundamentally in 2026

2026-04-24 16:13:16

Introduction

For over a decade, the runtime behind a REST endpoint made a set of assumptions that were safe to make. A request maps to a single, predictable operation. The response shape is known before execution begins. Each request is self-contained — no memory of what came before (stateless). Business logic is deterministic: same input, same output, every time

These assumptions held because they matched the workload. CRUD operations, relational queries, rule-based decisions — all of these are stateless, deterministic, and fast. REST was designed around them and served them well. But non-determinism is not new to backend systems. Recommender systems have been probabilistic for 15+ years long before LLMs existed. None of this is novel territory.

What is new is the general-purpose reasoning black box sitting behind your endpoint — a system that interprets intent, invokes tools dynamically, and produces outputs. The current challenge is variable latency, variable cost, unbounded tool use, and stateful multi-step execution — all behind an endpoint that looks exactly like a REST API to the client.

Traditional REST APIs before the LLM Era:

  • REST endpoints – Predefined endpoints responsible for specific operations like fetching, saving, and updating data
  • Deterministic behavior – The outcome, format, and response structure were known in advance
  • Strict schemas – Systems relied on predefined schemas and models
  • Stateless interactions – Each request was self-contained and independent
  • Rule-based business – Logic has long been the backbone of backend systems, translating requirements into deterministic "if–then" decisions.

"LLM workloads don't break REST. They break the runtime assumptions your backend was built on."

Coming to 2026 – LLM Era:

Applications are no longer asking for predictable responses. When a user asks:

"Analyse these 4 PDFs, compare insights, and tell me the risks."

The execution path is decided at runtime by a reasoning engine. The operation takes 20–30 seconds and may invoke a dozen tools along the way. The result is non-deterministic: run it twice, get two different outputs.

This isn't REST evolving. The protocol is the same. What's changed is the runtime behind the endpoint — and that runtime now needs to handle things that traditional backends were never designed for:

  • Reasoning engines that interpret intent rather than match routes
  • Stateful workflows that span multiple steps, tools, and model calls
  • Non-deterministic outputs that can't be regression-tested the same way
  • Agent coordinators – Orchestrate multiple specialized agents to complete complex tasks
  • Memory that persists context across requests and sessions

The shift is not about adopting new protocols. It's about recognising that the contract your endpoint exposes stays simple — while the system behind it becomes fundamentally more complex. This article breaks down what that runtime looks like, what it costs, and where it fails.

What Traditional REST Assumed

Assumption Reality with LLM Workloads
Fixed response schema Generative, variable output
Stateless per request Multi-step, session-aware execution
Deterministic logic Probabilistic reasoning engine
Millisecond latency 10–30s per complex request
Rule-based routing Intent-driven dynamic task planning
Predictable cost Variable — $0.01 to $1.00+ per request

The 3-Layer Architecture of LLM-Native APIs

1. The Orchestration Layer

(Reasoning + Tools + Workflow)

The orchestration layer in LLM-based REST APIs acts as the central control plane that transforms high-level user intent into coordinated, executable workflows. Unlike traditional backends, where requests map directly to a single service or endpoint, the orchestration layer:

  • Extracts intent — interprets what the user wants, not just what they typed
  • Plans execution — builds a task graph dynamically based on context
  • Routes and coordinates — dispatches to retrieval systems, tools, and external services
  • Manages state — maintains context across steps, handles retry, and feeds intermediate outputs into subsequent stages

This is what separates an LLM-native backend from simply wrapping a model call in a FastAPI route.

Scenario Stack When to pick it
Simple agent, workflows < 30s FastAPI + LangGraph + pgvector + Celery Early stage, Postgres already in use, < 1M vectors
Long-running durable workflows > 5 min FastAPI + Temporal + Pinecone + LangGraph Workflows must survive crashes; partial state has value
Cost-sensitive, high Postgres investment FastAPI + pgvector + Pydantic-AI + Inngest Avoiding infra sprawl; < 5M vectors; moderate QPS
Maximum control, latency-critical FastAPI + raw asyncio + Qdrant + custom retry P95 < 100ms target; team willing to own retry/backoff logic

The MCP (Model Context Protocol) Tools:

Advanced API capabilities are exposed as MCP tools, which are created and invoked to get the required data from external tools/data sources like:

  • Databases, Data warehouses, Vector databases
  • File storage and document systems
  • Monitoring and analytics tools
  • Internal microservices

MCP introduces a schema-driven interface where tools are discoverable and callable by the model. MCP enables a declarative approach where tools are exposed as first-class, machine-readable entities, allowing LLMs to reason about when and how to use them.

In traditional API architectures, orchestration logic resides entirely within backend services, with developers explicitly defining control flow and integrations. MCP fundamentally changes this paradigm by elevating the LLM into an active participant in system execution and decision-making. MCP introduces layer of governance and safety in LLM-driven systems. Enforcing schemas, input validation, and access controls at the tool level ensures that model actions remain predictable and auditable.

2. The Memory Layer

(Short-Term + Long-Term + Semantic Memory)

Memory solves one problem: context doesn't survive across steps or sessions by default. Without it, every request starts blind — no knowledge of prior interactions, no intermediate state, no retrieved domain knowledge. Though not everything worth computing is worth storing. Storing too much degrades retrieval quality. The more noise in your vector store, the more confidently wrong results you get back.

What should we store?

  • Document embeddings + chunk metadata
  • Final summarised outputs
  • Session context (within TTL)
  • User-level preferences

Memory Types and Their Limits

Short-term memory — Session-level context held in-memory or fast cache (Redis).

  • Expires with the session
  • Safe to use freely; cost is low and staleness isn't a risk

Long-term memory — Vector-based semantic storage (pgvector, Pinecone, LanceDB).

  • Survives across sessions; powers RAG retrieval
  • Risk: Gets stale. A document embedded 6 months ago may no longer reflect current reality. Without TTL policies, old context poisons new queries

Workflow memory — Intermediate execution state across steps.

  • Enables resumption after failure or cancellation
  • Risk: Partial state from a failed run can corrupt a retry if not versioned or cleared correctly

Where Memory Fails

Vector stores are lossy.

  • Embedding-based retrieval doesn't return the correct chunk — it returns the most similar chunk. On ambiguous or underspecified queries, that's often the wrong one. The model then reasons confidently on bad input. The output looks plausible. It isn't.

Embeddings drift across model versions.

  • If you upgrade your embedding model, every stored vector becomes semantically misaligned with new queries. Searches degrade silently — no errors, just worse results.
  • Always version-stamp embeddings and plan for periodic re-indexing when upgrading models to avoid these issues.

Stale memory hurts reasoning.

  • A chunk retrieved from a session 3 months ago may contradict the current document set. Without TTL policies per memory type, the system treats outdated context as ground truth. Define explicit expiry for each memory tier.

Retrieval confidence is not retrieval accuracy.

  • The model has no way to know that a retrieved chunk is wrong — it treats retrieved content as authoritative. There is no built-in scepticism. This means garbage in, confident garbage out. Never treat retrieved chunks as ground truth — surface retrieval confidence in traces.

3. The Interaction Layer

(API Gateway + Protocols)

The interaction layer in LLM-based REST APIs serves as the primary touchpoint between clients and the underlying intelligence of the system, translating human intent into structured requests and delivering responses in a consumable form.

Unlike traditional APIs that expose rigid, operation-specific endpoints, the interaction layer is designed around intent-driven communication, where a single endpoint can handle a wide range of tasks expressed in natural language. It is responsible for:

  • Request validation
  • Authentication
  • Context injection
  • Input transformation (e.g., Pydantic schemas)

On the response side, it standardizes outputs—whether textual insights, structured data, or progressive updates (streams of data).

Here are the examples of

  • Chat-style endpoints
    • /chat – conversational
    • /agent – tool-driven workflow executor
    • /reason – produce structured reasoning
  • Function-calling endpoints
    • /function-call – structured tool calls

In certain cases, the interaction layer can leverage Server-Sent Events (SSE) to provide a streaming interface for real-time feedback. For long running or multi-step tasks, SSE enables the server to push incremental updates, such as:

  • Processing status
  • Partial summaries
  • Evolving insights—directly to the client over a single HTTP connection.

This significantly improves user experience by reducing latency and increasing transparency into system behavior.

  • Streaming responses
    • /stream – stream tokens

However, SSE is used strictly as a delivery mechanism within the interaction layer and does not replace the underlying asynchronous execution systems. It allows LLM-based APIs to feel responsive and interactive while still relying on robust orchestration and processing layers behind the scenes.

End-to-End LLM Request Lifecycle

User asks: "Analyse these 4 PDFs, compare insights, and tell me the risks."

Step 1 — Request Ingestion *(Interaction Layer)*

  • Validate input schema, auth, document URLs
  • Fail fast: Return 422 before any LLM call if validation fails — saves cost

Step 2 — Interaction Mode Setup *(Interaction Layer)*

  • Decide: sync response or SSE streaming
  • Issue a job ID immediately — acts as resumption token if SSE connection drops

Step 3 — Intent Parsing & Task Decomposition *(Orchestration Layer)*

  • LLM breaks prompt into task graph: Ingest → Extract → Summarize → Compare → Risk
  • Guard: If parsed plan looks incomplete or ambiguous, surface a clarification prompt — don't proceed into an expensive workflow on a flawed plan

Step 4 — Document Ingestion *(Memory Layer)*

Scenario Recovery
Scanned PDF (no text layer) Trigger OCR fallback
Password-protected Flag, skip, notify user
Corrupted / unreachable Retry × 3 with backoff, then skip

Rule: One bad document should never abort the entire workflow. Continue with remaining documents.

Step 5 — Text Extraction & Chunking *(Memory Layer)*

  • Extract text; split into chunks
  • Filter low-confidence OCR output — don't embed junk
  • Chunk size must be calibrated against model context window limits

Step 6 — Embedding & Vector Storage *(Memory Layer)*

  • Convert chunks → embeddings → vector DB
  • Rate limits: Retry with exponential backoff, not hard failure
  • Version drift: Embeddings are model-version specific — version-stamp everything; plan for re-indexing on model upgrades

Step 7 — Parallel Document Processing *(Orchestration Layer)*

  • Summaries all 4 documents concurrently
  • Partial failure: If 3 of 4 succeed, proceed — don't abort for one timeout
  • Set per-document timeouts, not a single global one

Step 8 — Cross-Document Reasoning *(Orchestration Layer)*

  • Compare summaries; identify overlaps, conflicts; generate risks
  • Context overflow: Combined summaries may exceed context window — use map-reduce (reason over pairs, then synthesize). Never silently truncate
  • Reasoning loops: Cap tool invocations (e.g. max 20 steps) with a hard circuit breaker

Step 9 — Response Aggregation *(Orchestration Layer)*

  • Combine insights + comparisons + risks
  • Partial failure: If one component (e.g. risk analysis) fails, return what succeeded with clear metadata — never return a generic error

Step 10 — Cancellation & Timeout Handling *(Orchestration + Interaction Layers)*

  • Propagate cancellation signal down to async tasks when user aborts
  • Persist any intermediate results produced so far
  • Without this: backend keeps running, burning LLM credits, after the user has left

Step 11 — Response Delivery *(Interaction Layer)*

  • Stream via SSE or return full response
  • Run basic schema validation on LLM output before delivery — especially if downstream systems consume it programmatically

Step 12 — Memory Persistence *(Memory Layer)*

  • Store embeddings, summaries, final output
  • Set TTL policies — stale memory retrieved months later can hurt reasoning
  • Check memory before re-running on retry — enables idempotency

Failure Surface Summary

Step Failure Mode Recovery
Request Ingestion Bad schema / unreachable URL 422 before LLM call
Interaction Setup SSE drops Resumption via job ID
Intent Parsing Hallucinated / incomplete plan Confidence gate → clarify
Document Ingestion Scanned / corrupt / protected Per-doc fallback; partial proceed
Extraction OCR noise / garbled text Quality filter; tag low confidence
Embedding Rate limit / model drift Backoff retries; version-stamp
Parallel Processing Partial LLM timeout Min success threshold
Reasoning Context overflow / loops Map-reduce; step budget cap
Aggregation Component failure Partial result with metadata
Cancellation Mid-workflow abort Propagate signal; persist partial state
Delivery Malformed output Pre-delivery schema check
Persistence Stale context / duplicate run TTL policy; idempotency check

Design principle: Partial success with honest metadata beats a hard failure every time. Build for the broken path — the happy path takes care of itself.

Final Thoughts

With LLMs in the picture, APIs are no longer just interfaces—they're becoming part of systems that can interpret intent, reason through tasks, and coordinate execution dynamically.

At its core, this article highlights a shift in how we design backends:

  • From deterministic endpoints → intent-driven systems
  • From static workflows → dynamic orchestration
  • From stateless APIs → memory-aware architectures
  • From hardcoded logic → model-assisted decision making

"REST isn't evolving. The runtime behind your endpoint is being replaced"

Most will feel this shift not as a clean architectural migration, but as accumulated pressure: timeouts that don't make sense, costs that don't map to load, failures that don't reproduce.

The harder question is: does your current backend infrastructure support what you're asking it to do? Not the endpoint. Not the framework. The runtime — the orchestration, the memory, the failure recovery, the cost model.

If the answer is uncertain, that uncertainty is the signal. Start there!!!

Setting Up Docker on My Hosting Server (selfmade.lab)

2026-04-24 16:09:00

Today I started setting up Docker on my hosting server as part of my project. My goal is to run PostgreSQL and backend services in containers and manage everything cleanly.

This post is a simple log of what I did today — step by step

Goal for Today

  • Set up Docker on my hosting server
  • Prepare environment for database and backend
  • Begin container-based development
  • Lab name: selfmade.lab

Step 1: Connected to My Server

First, I connected to my hosting server using SSH:

ssh root@your_server_ip

After login, I confirmed I’m inside the server.

Step 2: Installed Docker

Then I installed Docker using basic commands:

apt update
apt install docker.io -y

After installation, I started Docker:

systemctl start docker
systemctl enable docker

To check if Docker is working:

docker --version

Step 3: Tested Docker

I ran a simple test container:

docker run hello-world

This confirmed Docker is installed and running correctly.

Step 4: Started PostgreSQL Container

Next, I started my database container:

docker run -d \
--name selfmade-postgres \
-e POSTGRES_PASSWORD=1234 \
-p 5432:5432 \
postgres

Now PostgreSQL is running inside Docker.

Step 5: Opened Database Port

To allow external connection:

ufw allow 5432

Step 6: Connected Using pgAdmin

From my local system, I connected using:

  • Host: your_server_ip
  • Port: 5432
  • Username: postgres
  • Password: 1234

Connection was successful

Step 7: Planning Next Steps

Today I only completed the base setup. Next, I plan to:

  • Use docker-compose for better management
  • Add FastAPI backend
  • Secure database using .env
  • Setup domain for selfmade.lab
  • Add Nginx for reverse proxy

Challenges I Faced

  • Initial confusion with Docker setup
  • Understanding server vs local environment
  • Port access configuration

But step by step, everything started working.

What I Learned Today

  • Docker can be installed easily on a server
  • Containers simplify backend setup
  • PostgreSQL runs smoothly inside Docker
  • Remote connection using pgAdmin is very useful

Final Thoughts

Today was a strong start for my project infrastructure. Setting up Docker on my hosting server gave me more confidence to move forward with deployment.

More updates coming soon as I build selfmade.lab

Multi-Agent AI Systems: How Multiple AI Agents Work Together to Automate Complex Workflows

2026-04-24 15:54:20

Most businesses today don’t struggle with lack of tools—they struggle with coordination. One system handles customer data, another manages operations, and yet another processes analytics. The real bottleneck isn’t capability—it’s orchestration.
That’s where multi-agent AI systems come in.
Instead of relying on a single AI model to handle everything, multi-agent systems use multiple specialized AI agents that collaborate, communicate, and divide tasks—much like a high-performing team. Platforms like Rohirrim are exploring how this model can transform fragmented workflows into intelligent, autonomous systems that actually get work done.

**What Are Multi-Agent AI Systems?

**
A multi-agent AI system is a network of independent AI agents, each designed for a specific task, that work together toward a shared goal.
Think of it like a digital organization:

  • One agent gathers data
  • Another analyzes it
  • A third makes decisions
  • A fourth executes actions Instead of a single overloaded AI trying to do everything, each agent focuses on what it does best. This modular structure is what allows businesses to automate complex, multi-step workflows that were previously impossible—or required heavy human involvement. **

Why Single AI Agents Fall Short

**
Single-agent systems are powerful but limited. They struggle when:

  • Tasks require multi-step reasoning
  • Different tools or APIs need to be used
  • Decisions depend on dynamic, real-time inputs
  • Workflows involve dependencies between tasks
    For example, automating a sales pipeline isn’t just one task. It includes:

  • Lead identification

  • Data enrichment

  • Qualification

  • Outreach

  • Follow-ups

  • CRM updates
    A single AI agent can’t efficiently manage all of this without becoming slow, error-prone, or rigid.
    Multi-agent systems solve this by distributing the workload.
    **

How Multi-Agent Systems Work (Step-by-Step)

**
Let’s break down how these systems actually operate in real-world scenarios.
**

1. Task Decomposition

**
The system first breaks a complex workflow into smaller, manageable tasks.
For example:
“Automate customer onboarding” becomes:

  • Collect user data
  • Verify documents
  • Create account
  • Send onboarding emails
  • Update internal systems Each of these becomes a responsibility for a different agent. **

2. Agent Specialization

**
Each agent is assigned a clear role. For example:

  • Data Agent → Collects and validates inputs
  • Decision Agent → Applies logic or rules
  • Execution Agent → Performs actions (emails, updates, API calls)
  • Monitoring Agent → Tracks outcomes and errors This specialization improves both accuracy and speed. **

3. Communication Between Agents

**
Agents don’t work in isolation—they constantly exchange information.

  • One agent passes structured data to another
  • Another agent triggers the next step
  • Some systems use shared memory or messaging queues This coordination is what turns individual actions into a seamless workflow. **

4. Feedback and Iteration

**
Advanced systems include feedback loops:

  • Agents learn from outcomes
  • Errors trigger corrections
  • Decisions improve over time This makes the system adaptive—not just automated **

Real-World Use Cases

**
Multi-agent AI isn’t theoretical—it’s already being applied across industries.
**

1. Customer Support Automation

**
Instead of a single chatbot:

  • One agent understands intent
  • Another retrieves knowledge base data
  • A third drafts responses
  • A fourth escalates complex cases Result: faster, more accurate support without overwhelming human teams.

2. Marketing Campaign Execution

A multi-agent setup can:

  • Analyze audience data
  • Generate campaign ideas
  • Create content
  • Schedule posts
  • Track performance All automatically. **

3. Financial Operations

**
Agents can collaborate to:

  • Process invoices
  • Detect fraud patterns
  • Reconcile accounts
  • Generate reports This reduces manual errors and speeds up operations significantly. **

4. Software Development Workflows

**
In development environments:

  • One agent writes code
  • Another reviews it
  • Another tests it
  • Another deploys it This is already being explored in AI-powered DevOps pipelines. **

Real Data: Why This Matters Now

**
The shift toward multi-agent systems is backed by real trends:

  • According to industry reports, over 60% of enterprises are experimenting with AI agents in workflows by 2026
  • Companies using AI-driven automation report 30–50% reduction in operational costs -** Multi-agent architectures** improve task completion rates by up to 40% compared to single-agent systems in complex workflows The takeaway: businesses aren’t just adopting AI—they’re evolving toward collaborative AI systems. **

Key Benefits of Multi-Agent AI Systems

**
**

1. Scalability

**
You can add more agents as workflows grow—no need to redesign the entire system.
**

2. Flexibility

**
Agents can be updated, replaced, or improved independently.
**

3. Efficiency

**
Parallel processing allows multiple tasks to run simultaneously.
**

4. Resilience

**
If one agent fails, others can continue functioning—reducing system-wide failure risks.
**

## Challenges You Should Know

Multi-agent systems aren’t magic—they come with complexity.

Coordination Overhead

**
Managing communication between agents can become complicated.
**

Error Propagation

**
If one agent makes a mistake, it can affect downstream tasks.
**

System Design

**
Designing efficient agent roles and workflows requires planning.
This is why structured frameworks and platforms are becoming essential.
**

Where This Fits in Your AI Journey

**
If you’re new to AI automation, jumping directly into multi-agent systems might be overwhelming.
A smarter approach:
Start with single-agent workflows, then scale.
If you haven’t already, check out How to Build AI Agents That Automate Business Workflows — it lays the foundation for understanding how individual agents work before combining them into more advanced systems.
**

The Future: Autonomous Business Operations

**
Multi-agent AI systems are a stepping stone toward fully autonomous operations.
In the near future, businesses won’t just use AI tools—they’ll deploy AI teams that:

  • Make decisions
  • Execute tasks
  • Optimize processes in real time This isn’t about replacing humans. It’s about removing repetitive work so humans can focus on strategy, creativity, and growth. **

Final Thoughts

**
Multi-agent AI systems represent a shift from isolated automation to collaborative intelligence. Instead of one AI trying to do everything, multiple agents work together—each focused, efficient, and coordinated.
That’s how complex workflows become manageable.
That’s how automation becomes scalable.
And that’s how businesses move from doing work to orchestrating outcomes.
If single-agent AI was the first step, multi-agent systems are where things start getting truly transformative.