2026-04-24 16:16:35
I’ve been working with Markdown editors both as a user. At some point I wanted to better understand how they actually behave under the hood, especially when documents get more complex (code blocks, math, diagrams, etc.).
As a small exploration, I built a minimal editor to experiment with these ideas:
https://github.com/IgorKha/markflow
You can try it here:
https://igorkha.github.io/markflow/
This post is not an announcement, but a summary of a few implementation decisions that might be useful if you’re building something similar.
The initial goal was simple:
From there, most of the work ended up around how editing is handled, not rendering.
The editor is built on top of Monaco.
This gives:
At the same time, Monaco operates on plain text, while Markdown has an implicit structure. Bridging that gap becomes the central problem.
Instead of treating Markdown purely as a string, the implementation keeps track of its structure (via parsing into an AST).
This allows reasoning in terms of:
Even partial structural awareness helps avoid some classes of issues when editing mixed content.
One of the core pieces is keeping two representations in sync:
This synchronization is used to:
This part is still evolving, but it defines most of the internal complexity.
Rendering is based on standard tools from the Markdown ecosystem:
highlight.js
KaTeX
Mermaid
These are applied on top of the parsed Markdown rather than directly on raw text, which keeps responsibilities separated:
The editor is designed to work in a browser without assuming a desktop environment.
Some adjustments were made so that:
This is not a separate mobile version — just the same editor adapted to different screen sizes.
There is no backend in this project.
To make sharing possible, the document state can be encoded into a URL. Opening that link reconstructs the document in the editor.
This approach:
It’s intentionally simple and limited by URL size, but sufficient for lightweight use cases.
This project is mainly a technical exploration of how Markdown editing can be structured internally while still using a standard text editor as a base.
If you’re working on something similar, feedback or discussion would be useful.
2026-04-24 16:14:33
Every Earth Day I see the same posts — reusable cups, bike commutes, paperless offices. All fine. But working on enterprise architecture, I keep circling back to a quieter question:
How much does our platform choice actually matter for sustainability?
Turns out, quite a lot. IBM's net-zero 2030 commitment isn't just policy — it's baked into how IBM Z and IBM LinuxONE are designed. A few things stand out to me as an architect:
Both platforms enable large-scale workload consolidation — fewer cores, less energy, lower CO₂e compared to sprawling distributed x86 environments.
Replacing thousands of x86 cores with a single highly utilized system cuts power, cooling, and data-center footprint. No resilience trade-off. No performance trade-off.
On-chip AI acceleration delivers real-time inference at the core. As AI workloads grow, this matters more every quarter.
Integrated environmental monitoring gives real operational data for ESG reporting — not estimates, not vendor-supplied guesses.
Sustainability isn't something you bolt on after the architecture decisions are made. It starts with the core.
IBM Z and IBM LinuxONE help turn net-zero commitments into measurable action.
If you're thinking about workload consolidation, net-zero commitments, or the hidden cost of your infrastructure footprint — worth a look:
2026-04-24 16:13:27
I stopped writing Playwright tests for integration flows. Not because they stopped working — they still work fine. But once I tried testing with Claude subagents and agent-browser, going back felt like writing jQuery after learning React.
Here's what changed.
Playwright was designed for single-user, deterministic UI flows. You write selectors, set up auth fixtures, mock state, and run scripts that click through a fixed sequence. For a simple login-and-checkout flow, it's fine.
But most real apps have multiple roles interacting with shared state. A customer submits a request. An operator reviews it and assigns a specialist. The specialist does work. The customer pays. The operator ships. Each step depends on the previous one, and each actor is a different authenticated user.
In Playwright, this means:
You end up maintaining a parallel codebase just to describe what users already do naturally.
agent-browser is a CLI that lets AI agents control a browser via the accessibility tree. Instead of writing page.locator('[data-testid="submit"]').click(), you describe what you want in plain language and the agent figures out how to do it.
No selectors. No brittle CSS paths. If the button exists and has a label, the agent finds it. If the UI changes, the test doesn't break — the agent adapts.
For role isolation, you use Chrome profile directories. One directory per role, logged in once:
mkdir -p ~/.config/google-chrome/app-customer
mkdir -p ~/.config/google-chrome/app-operator
mkdir -p ~/.config/google-chrome/app-specialist
npx agent-browser \
--profile ~/.config/google-chrome/app-operator \
--headed \
open https://yourapp.com/sign-in
The session persists. Every subsequent headless run using --profile picks it up automatically.
On magic links: use yopmail for test accounts — disposable inboxes, no registration, magic links work out of the box. If you hit email rate limits, generate the magic link URL directly via the admin API and navigate to it, no email sent:
curl -X POST https://<project>.supabase.co/auth/v1/admin/generate_link \
-H "Authorization: Bearer $SERVICE_ROLE_KEY" \
-H "Content-Type: application/json" \
-d '{"type":"magiclink","email":"[email protected]"}' \
| python3 -c "import json,sys; print(json.load(sys.stdin)['action_link'])"
For multi-role golden path testing, one Claude session acts as an orchestrator. It spawns one subagent at a time, each operating as a specific role. State flows forward through a shared JSON file.
Orchestrator (main Claude session)
├── spawn Agent(customer) → submits request → writes request_id
├── spawn Agent(operator) → assigns handler → writes handler_id
├── spawn Agent(specialist) → does work
├── spawn Agent(operator) → reviews + approves
├── spawn Agent(customer) → pays or confirms
└── spawn Agent(customer) → leaves feedback
The state file:
{
"run_id": "run-2026-04-24",
"current_request_id": null,
"confirmation_token": null,
"steps_completed": []
}
Each subagent gets a self-contained prompt with the current state injected. It reports back any new values — IDs, tokens visible in URLs — and the orchestrator writes them before spawning the next agent.
Results are appended to a log file, log-and-continue, never stop on failure:
## [operator] Assign Specialist — PASS
## [specialist] Submit Work — PASS
## [operator] Approve Work — FAIL: approve button not found
## [customer] Confirm Receipt — PASS
The main counterargument to AI-based testing has always been cost. Claude API calls aren't free, Playwright is.
But if you're running Claude Code on a subscription, that argument disappears. Subagents run against your subscription, not per-token billing. A full 10-step golden path run costs nothing extra.
The only remaining case for Playwright is raw speed — milliseconds per test vs minutes for an agent run. That matters if you need tests on every commit in a tight CI loop. For pre-deploy checks, QA runs, or anything not in a sub-second CI pipeline, there's no practical reason to choose Playwright.
I haven't written a Playwright test in months. The agent tests cover more ground, break less often, and took a fraction of the time to set up. The only thing I gave up is being able to run them on every commit — which, for integration tests covering a 6-role flow, was never realistic anyway.
If you're still writing Playwright tests for multi-role integration flows, try this setup once. You probably won't go back.
2026-04-24 16:13:16
For over a decade, the runtime behind a REST endpoint made a set of assumptions that were safe to make. A request maps to a single, predictable operation. The response shape is known before execution begins. Each request is self-contained — no memory of what came before (stateless). Business logic is deterministic: same input, same output, every time
These assumptions held because they matched the workload. CRUD operations, relational queries, rule-based decisions — all of these are stateless, deterministic, and fast. REST was designed around them and served them well. But non-determinism is not new to backend systems. Recommender systems have been probabilistic for 15+ years long before LLMs existed. None of this is novel territory.
What is new is the general-purpose reasoning black box sitting behind your endpoint — a system that interprets intent, invokes tools dynamically, and produces outputs. The current challenge is variable latency, variable cost, unbounded tool use, and stateful multi-step execution — all behind an endpoint that looks exactly like a REST API to the client.
Traditional REST APIs before the LLM Era:
"LLM workloads don't break REST. They break the runtime assumptions your backend was built on."
Coming to 2026 – LLM Era:
Applications are no longer asking for predictable responses. When a user asks:
"Analyse these 4 PDFs, compare insights, and tell me the risks."
The execution path is decided at runtime by a reasoning engine. The operation takes 20–30 seconds and may invoke a dozen tools along the way. The result is non-deterministic: run it twice, get two different outputs.
This isn't REST evolving. The protocol is the same. What's changed is the runtime behind the endpoint — and that runtime now needs to handle things that traditional backends were never designed for:
The shift is not about adopting new protocols. It's about recognising that the contract your endpoint exposes stays simple — while the system behind it becomes fundamentally more complex. This article breaks down what that runtime looks like, what it costs, and where it fails.
| Assumption | Reality with LLM Workloads |
|---|---|
| Fixed response schema | Generative, variable output |
| Stateless per request | Multi-step, session-aware execution |
| Deterministic logic | Probabilistic reasoning engine |
| Millisecond latency | 10–30s per complex request |
| Rule-based routing | Intent-driven dynamic task planning |
| Predictable cost | Variable — $0.01 to $1.00+ per request |
(Reasoning + Tools + Workflow)
The orchestration layer in LLM-based REST APIs acts as the central control plane that transforms high-level user intent into coordinated, executable workflows. Unlike traditional backends, where requests map directly to a single service or endpoint, the orchestration layer:
This is what separates an LLM-native backend from simply wrapping a model call in a FastAPI route.
| Scenario | Stack | When to pick it |
|---|---|---|
| Simple agent, workflows < 30s | FastAPI + LangGraph + pgvector + Celery | Early stage, Postgres already in use, < 1M vectors |
| Long-running durable workflows > 5 min | FastAPI + Temporal + Pinecone + LangGraph | Workflows must survive crashes; partial state has value |
| Cost-sensitive, high Postgres investment | FastAPI + pgvector + Pydantic-AI + Inngest | Avoiding infra sprawl; < 5M vectors; moderate QPS |
| Maximum control, latency-critical | FastAPI + raw asyncio + Qdrant + custom retry | P95 < 100ms target; team willing to own retry/backoff logic |
Advanced API capabilities are exposed as MCP tools, which are created and invoked to get the required data from external tools/data sources like:
MCP introduces a schema-driven interface where tools are discoverable and callable by the model. MCP enables a declarative approach where tools are exposed as first-class, machine-readable entities, allowing LLMs to reason about when and how to use them.
In traditional API architectures, orchestration logic resides entirely within backend services, with developers explicitly defining control flow and integrations. MCP fundamentally changes this paradigm by elevating the LLM into an active participant in system execution and decision-making. MCP introduces layer of governance and safety in LLM-driven systems. Enforcing schemas, input validation, and access controls at the tool level ensures that model actions remain predictable and auditable.
(Short-Term + Long-Term + Semantic Memory)
Memory solves one problem: context doesn't survive across steps or sessions by default. Without it, every request starts blind — no knowledge of prior interactions, no intermediate state, no retrieved domain knowledge. Though not everything worth computing is worth storing. Storing too much degrades retrieval quality. The more noise in your vector store, the more confidently wrong results you get back.
What should we store?
Short-term memory — Session-level context held in-memory or fast cache (Redis).
Long-term memory — Vector-based semantic storage (pgvector, Pinecone, LanceDB).
Workflow memory — Intermediate execution state across steps.
Vector stores are lossy.
Embeddings drift across model versions.
Stale memory hurts reasoning.
Retrieval confidence is not retrieval accuracy.
(API Gateway + Protocols)
The interaction layer in LLM-based REST APIs serves as the primary touchpoint between clients and the underlying intelligence of the system, translating human intent into structured requests and delivering responses in a consumable form.
Unlike traditional APIs that expose rigid, operation-specific endpoints, the interaction layer is designed around intent-driven communication, where a single endpoint can handle a wide range of tasks expressed in natural language. It is responsible for:
On the response side, it standardizes outputs—whether textual insights, structured data, or progressive updates (streams of data).
Here are the examples of
In certain cases, the interaction layer can leverage Server-Sent Events (SSE) to provide a streaming interface for real-time feedback. For long running or multi-step tasks, SSE enables the server to push incremental updates, such as:
This significantly improves user experience by reducing latency and increasing transparency into system behavior.
However, SSE is used strictly as a delivery mechanism within the interaction layer and does not replace the underlying asynchronous execution systems. It allows LLM-based APIs to feel responsive and interactive while still relying on robust orchestration and processing layers behind the scenes.
User asks: "Analyse these 4 PDFs, compare insights, and tell me the risks."
| Scenario | Recovery |
|---|---|
| Scanned PDF (no text layer) | Trigger OCR fallback |
| Password-protected | Flag, skip, notify user |
| Corrupted / unreachable | Retry × 3 with backoff, then skip |
Rule: One bad document should never abort the entire workflow. Continue with remaining documents.
| Step | Failure Mode | Recovery |
|---|---|---|
| Request Ingestion | Bad schema / unreachable URL | 422 before LLM call |
| Interaction Setup | SSE drops | Resumption via job ID |
| Intent Parsing | Hallucinated / incomplete plan | Confidence gate → clarify |
| Document Ingestion | Scanned / corrupt / protected | Per-doc fallback; partial proceed |
| Extraction | OCR noise / garbled text | Quality filter; tag low confidence |
| Embedding | Rate limit / model drift | Backoff retries; version-stamp |
| Parallel Processing | Partial LLM timeout | Min success threshold |
| Reasoning | Context overflow / loops | Map-reduce; step budget cap |
| Aggregation | Component failure | Partial result with metadata |
| Cancellation | Mid-workflow abort | Propagate signal; persist partial state |
| Delivery | Malformed output | Pre-delivery schema check |
| Persistence | Stale context / duplicate run | TTL policy; idempotency check |
Design principle: Partial success with honest metadata beats a hard failure every time. Build for the broken path — the happy path takes care of itself.
With LLMs in the picture, APIs are no longer just interfaces—they're becoming part of systems that can interpret intent, reason through tasks, and coordinate execution dynamically.
At its core, this article highlights a shift in how we design backends:
"REST isn't evolving. The runtime behind your endpoint is being replaced"
Most will feel this shift not as a clean architectural migration, but as accumulated pressure: timeouts that don't make sense, costs that don't map to load, failures that don't reproduce.
The harder question is: does your current backend infrastructure support what you're asking it to do? Not the endpoint. Not the framework. The runtime — the orchestration, the memory, the failure recovery, the cost model.
If the answer is uncertain, that uncertainty is the signal. Start there!!!
2026-04-24 16:09:00
Today I started setting up Docker on my hosting server as part of my project. My goal is to run PostgreSQL and backend services in containers and manage everything cleanly.
This post is a simple log of what I did today — step by step
First, I connected to my hosting server using SSH:
ssh root@your_server_ip
After login, I confirmed I’m inside the server.
Then I installed Docker using basic commands:
apt update
apt install docker.io -y
After installation, I started Docker:
systemctl start docker
systemctl enable docker
To check if Docker is working:
docker --version
I ran a simple test container:
docker run hello-world
This confirmed Docker is installed and running correctly.
Next, I started my database container:
docker run -d \
--name selfmade-postgres \
-e POSTGRES_PASSWORD=1234 \
-p 5432:5432 \
postgres
Now PostgreSQL is running inside Docker.
To allow external connection:
ufw allow 5432
From my local system, I connected using:
your_server_ip
5432
postgres
1234
Connection was successful
Today I only completed the base setup. Next, I plan to:
.env
But step by step, everything started working.
Today was a strong start for my project infrastructure. Setting up Docker on my hosting server gave me more confidence to move forward with deployment.
More updates coming soon as I build selfmade.lab
2026-04-24 15:54:20
Most businesses today don’t struggle with lack of tools—they struggle with coordination. One system handles customer data, another manages operations, and yet another processes analytics. The real bottleneck isn’t capability—it’s orchestration.
That’s where multi-agent AI systems come in.
Instead of relying on a single AI model to handle everything, multi-agent systems use multiple specialized AI agents that collaborate, communicate, and divide tasks—much like a high-performing team. Platforms like Rohirrim are exploring how this model can transform fragmented workflows into intelligent, autonomous systems that actually get work done.
**
A multi-agent AI system is a network of independent AI agents, each designed for a specific task, that work together toward a shared goal.
Think of it like a digital organization:
**
Single-agent systems are powerful but limited. They struggle when:
Workflows involve dependencies between tasks
For example, automating a sales pipeline isn’t just one task. It includes:
Lead identification
Data enrichment
Qualification
Outreach
Follow-ups
CRM updates
A single AI agent can’t efficiently manage all of this without becoming slow, error-prone, or rigid.
Multi-agent systems solve this by distributing the workload.
**
**
Let’s break down how these systems actually operate in real-world scenarios.
**
**
The system first breaks a complex workflow into smaller, manageable tasks.
For example:
“Automate customer onboarding” becomes:
**
Each agent is assigned a clear role. For example:
**
Agents don’t work in isolation—they constantly exchange information.
**
Advanced systems include feedback loops:
**
Multi-agent AI isn’t theoretical—it’s already being applied across industries.
**
**
Instead of a single chatbot:
A multi-agent setup can:
**
Agents can collaborate to:
**
In development environments:
**
The shift toward multi-agent systems is backed by real trends:
**
**
**
You can add more agents as workflows grow—no need to redesign the entire system.
**
**
Agents can be updated, replaced, or improved independently.
**
**
Parallel processing allows multiple tasks to run simultaneously.
**
**
If one agent fails, others can continue functioning—reducing system-wide failure risks.
**
## Challenges You Should Know
Multi-agent systems aren’t magic—they come with complexity.
**
Managing communication between agents can become complicated.
**
**
If one agent makes a mistake, it can affect downstream tasks.
**
**
Designing efficient agent roles and workflows requires planning.
This is why structured frameworks and platforms are becoming essential.
**
**
If you’re new to AI automation, jumping directly into multi-agent systems might be overwhelming.
A smarter approach:
Start with single-agent workflows, then scale.
If you haven’t already, check out How to Build AI Agents That Automate Business Workflows — it lays the foundation for understanding how individual agents work before combining them into more advanced systems.
**
**
Multi-agent AI systems are a stepping stone toward fully autonomous operations.
In the near future, businesses won’t just use AI tools—they’ll deploy AI teams that:
**
Multi-agent AI systems represent a shift from isolated automation to collaborative intelligence. Instead of one AI trying to do everything, multiple agents work together—each focused, efficient, and coordinated.
That’s how complex workflows become manageable.
That’s how automation becomes scalable.
And that’s how businesses move from doing work to orchestrating outcomes.
If single-agent AI was the first step, multi-agent systems are where things start getting truly transformative.