2026-03-03 07:13:19
In Part 1, I introduced the Viable System Model (VSM) and how it maps to multi-agent AI systems. The response was great — but the most common question was: "OK, the theory makes sense. But how is this actually different from what CrewAI/LangGraph/AutoGen already do?"
Fair question. Let me answer it properly.
Every multi-agent framework gives you System 1 — Operations. The agents that do actual work. Define a role, give it tools, let it run. CrewAI calls them "agents." LangGraph calls them "nodes." AutoGen calls them "agents" too. This part works.
The problem is that operations is 1 of 6 necessary control functions. The other five — coordination, optimization, audit, intelligence, and identity — are either missing entirely or left as an exercise for the developer.
Here's what that looks like:
S1 S2 S3 S3* S4 S5
Ops Coord Optim Audit Intel Ident
CrewAI ✅ ❌ ⚠️ ❌ ❌ ❌
LangGraph ✅ ❌ ⚠️ ❌ ❌ ❌
OpenAI Agents ✅ ❌ ❌ ❌ ❌ ❌
AutoGen ⚠️ ⚠️ ❌ ❌ ❌ ❌
ViableOS ✅ ✅ ✅ ✅ ✅ ✅
This isn't a knock on those frameworks. They're excellent infrastructure — they give you the building blocks to run agents. But infrastructure isn't organization. It's like having Kubernetes without knowing what services to deploy, how they should communicate, and who watches them.
Stafford Beer's Viable System Model isn't a framework you bolt onto agents. It's a structural theory about what ANY viable system needs to survive — whether it's a cell, a company, or a swarm of AI agents. He published it in 1972. It's been validated on governments, corporations, and cooperatives. And it maps 1:1 to multi-agent AI.
The key insight: viability requires specific communication channels, not just capable components.
In a flat multi-agent system with 5 agents, you potentially need 20 direct communication channels. Every agent might talk to every other agent. That's n×(n-1) complexity. It doesn't scale. More importantly, it doesn't differentiate — a resource conflict looks the same as a strategic concern looks the same as an emergency.
The VSM replaces this with structured channels, each with a specific purpose:
S5 (Identity/Policy)
↕ balance channel
S4 (Intelligence) S3 (Optimization)
↕ strategy bridge ↕ command channel
S2 (Coordination)
↕ coordination rules
S1a ←→ S1b ←→ S1c (Operations)
↕
S3* (Audit — independent, different provider)
S2 coordination rules prevent conflicts between S1 units. Not by managing them, but by establishing traffic rules. "If you deploy, notify ops. If you claim a feature, verify with dev first."
S3 command channel gives optimization authority over operations. "Shift 20% of your token budget to the high-priority task." This is top-down resource allocation with teeth.
S3* audit bypass goes directly from the auditor into S1 operations — read-only, independent, different LLM provider. "I checked the last 5 commits. Tests didn't actually pass." (More on why "different provider" matters below.)
S4→S3 strategy bridge injects external intelligence into operational planning. "Competitor just launched feature X. Here's a briefing."
S5 balance channel ensures the system doesn't drift too far toward internal optimization (S3) or external scanning (S4). Too much S3 = navel-gazing. Too much S4 = strategy tourism.
Algedonic channel — the emergency bypass. Any agent can signal existential issues directly to S5 and the human, skipping the entire hierarchy. Named after the Greek words for pain (algos) and pleasure (hedone). This is your system's fire alarm.
These channels aren't nice-to-haves. Each one prevents a specific failure mode:
| Without this channel... | You get... |
|---|---|
| S2 coordination | Agents contradicting each other |
| S3 command | No resource control, token budgets explode |
| S3* audit | Hallucinations go undetected |
| S4→S3 bridge | System optimizes for yesterday's world |
| S5 balance | Either navel-gazing or strategy tourism |
| Algedonic | Critical issues buried in status reports |
That's the difference between "a list of agents with a router" and "a viable system." The agents are the same. The organization makes them work.
Let me zoom in on one problem that every multi-agent system has but almost nobody talks about: context window amnesia.
LLMs don't have persistent memory. Everything lives in the context window — a buffer of recent messages that eventually overflows. When S3 (Optimization) sends a directive to an S1 worker — say, "switch to a cheaper model for routine tasks to stay within budget" — that directive enters the context window. For maybe 20-40 turns, the agent remembers. Then newer messages push it out.
The agent doesn't refuse the directive. It doesn't disagree. It simply forgets it existed.
In a human organization, this is the memo that nobody read. The policy that got announced but never enforced. The quarterly goal that was abandoned by February. Stafford Beer saw this problem 50 years ago and his solution had a name: Vollzug.
Vollzug is German for the confirmed execution of a directive. Not "I heard you" — but "I heard you, I did it, and here's proof." Beer was a British cyberneticist, but he borrowed the German term because English doesn't have a single word for this concept. Three steps, each with a hard timeout:
vollzug_protocol:
enabled: true
timeout_quittung: 30min # Must acknowledge within 30 min
timeout_vollzug: 48h # Must execute within 48 hours
on_timeout: escalate # Auto-escalate if missed
Step 1 — Quittung (Acknowledgment). The receiving agent has 30 minutes to confirm receipt. No confirmation → auto-escalate. This catches the case where a directive is sent but never enters the agent's active context.
Step 2 — Vollzug (Execution). The agent has 48 hours to carry out the directive. The timeout scales with team size — a 2-person org gets 12 hours, a 10-person org gets a full week.
Step 3 — Report. Confirm completion with evidence. Not "done" — but "done, and here's what changed."
If any step times out, the system escalates automatically. But not everything goes through the same path:
escalation_chains:
operational:
path: [s2-coordination, s3-optimization, human]
timeout_per_step: 2h
quality:
path: [s3-optimization, human]
timeout_per_step: 2h
strategic:
path: [s4-intelligence, s5-policy, human]
timeout_per_step: 4h
algedonic:
path: [s5-policy, human]
timeout_per_step: 15min
An operational timeout goes through coordination first. A quality issue goes straight to optimization. A strategic concern routes through intelligence and policy. And an existential threat — the algedonic channel — reaches the human in 15 minutes, no matter what.
This is what "from topology to behavior" means. It's not enough to define which agents exist. You need to define how they behave when things go wrong. When context is lost. When directives are ignored. When the whole system is on fire. That's the gap between a diagram and an operating system.
And here's why this matters specifically for LLM-based agents: LLMs are optimized to produce coherent, confident outputs. An agent reporting "task completed" sounds exactly like an agent that actually completed the task — and one that hallucinated the completion. Without Vollzug, without S3* audit, without escalation chains — you have no way to tell the difference.
ViableOS takes all of this and turns it into working software. You describe your organization — or let an AI-powered assessment interview figure it out — and it generates the full VSM package: every agent, every channel, every behavioral spec.
What works today:
What's auto-derived, not hand-configured:
Small team (1-2 people) → shorter timeouts, more human approval, daily reporting. Large team (10+) → more agent autonomy, longer execution windows, weekly reporting. Regulatory external forces → monthly premise checks, elevated mode triggers. You can override everything, but the defaults are designed to be sensible based on 50 years of organizational theory.
What we haven't built yet:
The runtime engine. ViableOS currently generates the configuration for a viable agent organization. It doesn't yet execute it. There's no live enforcement of vollzug timeouts, no real-time escalation routing, no Operations Room. That's v0.3 — and it's where I need help.
Theory is worth nothing without practice. So the first real test of ViableOS will be my own company — a small medical care software firm in Germany.
It's a good test case for three reasons:
The domain is regulated. GDPR, healthcare data laws, documentation requirements. This forces the system to take identity and values seriously — "patient privacy above everything" isn't a nice-to-have, it's legally required. S5 (Identity) earns its keep here. And S3* audit with a different LLM provider isn't theoretical elegance — it's practical necessity when agents touch patient-adjacent workflows.
The stakes are real. When agents handle scheduling, documentation, or billing, hallucinations aren't just annoying — they're potentially harmful. The Vollzug Protocol isn't academic neatness. It's "did you actually update that patient record, or did you just tell me you did?"
It's small enough to be honest about. Solo founder, small team. If ViableOS generates reasonable defaults for an organization this size, and if those defaults actually change agent behavior in practice, that's validation. If they don't — that's equally valuable information. I'll document the entire process publicly.
But one test case doesn't validate a theory. If you're running multi-agent systems — on CrewAI, LangGraph, AutoGen, or your own framework — I'd genuinely love you to try ViableOS on your setup and tell us:
pip install -e ".[dev]"
viableos api
# Open http://localhost:5173
GitHub: github.com/philipp-lm/ViableOS
Open an issue, start a discussion, or drop a comment here. Every behavioral spec in ViableOS started from theory — now we need practice to validate it. The more diverse the test cases, the better the system gets.
This is part 2 of a series on applying organizational design to AI agent systems. Part 1: Your AI Agents Need an Org Chart. Next: building the runtime that actually enforces these specs — the Operations Room.
Philipp Enderle — Engineer (KIT, TU Munich, UC Berkeley). 9 years strategy consulting at Deloitte and Berylls by AlixPartners, designing org transformations for DAX automotive companies. Now applying the same organizational theory to AI agent teams.
2026-03-03 07:00:00
Finding influencers is easy. Getting their contact information and actually reaching out to them is the hard part.
Most marketers spend hours manually scrolling through Instagram and TikTok bios, looking for an email address, copying it into a spreadsheet, and sending generic templates.
As developers, we can automate this entire workflow.
In this tutorial, I'll show you how to build a Node.js pipeline that:
npm init -y
npm install axios dotenv json2csv
Create your .env file:
SOCIAVAULT_API_KEY=your_api_key_here
We'll use SociaVault to search for recent posts under a specific hashtag (e.g., #skincareroutine), and then extract the profile information of the creators.
require('dotenv').config();
const axios = require('axios');
const { Parser } = require('json2csv');
const fs = require('fs');
const API_KEY = process.env.SOCIAVAULT_API_KEY;
const BASE_URL = 'https://api.sociavault.com/v1';
async function getCreatorsByHashtag(hashtag, limit = 50) {
console.log(`Fetching posts for #${hashtag}...`);
try {
const response = await axios.get(`${BASE_URL}/instagram/hashtag/posts`, {
headers: { 'Authorization': `Bearer ${API_KEY}` },
params: { hashtag, limit }
});
// Extract unique creators from the posts
const creators = new Map();
response.data.data.forEach(post => {
const owner = post.owner;
if (!creators.has(owner.username)) {
creators.set(owner.username, owner);
}
});
return Array.from(creators.values());
} catch (error) {
console.error('Error fetching creators:', error.message);
return [];
}
}
Many creators put their email in their bio. We need a robust Regex to find it. We also want to filter out emails that belong to talent agencies (e.g., management@..., [email protected]).
function extractEmail(text) {
if (!text) return null;
// Standard email regex
const emailRegex = /([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)/gi;
const matches = text.match(emailRegex);
if (!matches) return null;
const email = matches[0].toLowerCase();
// Filter out common agency/management emails
const agencyKeywords = ['management', 'agency', 'talent', 'collab', 'pr@'];
const isAgency = agencyKeywords.some(keyword => email.includes(keyword));
if (isAgency) {
return null; // Skip agencies, we want direct contact
}
return email;
}
The hashtag search gives us basic profile info. To get the full bio (where the email lives), we need to fetch the full profile details.
async function enrichCreatorProfile(username) {
try {
const response = await axios.get(`${BASE_URL}/instagram/profile`, {
headers: { 'Authorization': `Bearer ${API_KEY}` },
params: { username }
});
const profile = response.data.data;
const email = extractEmail(profile.biography) || profile.public_email;
return {
username: profile.username,
full_name: profile.full_name,
followers: profile.followers_count,
engagement_rate: profile.engagement_rate,
email: email,
bio: profile.biography.replace(/\n/g, ' ') // Clean up newlines
};
} catch (error) {
return null;
}
}
Now we orchestrate the pipeline: Fetch posts -> Get unique creators -> Enrich profiles -> Extract emails -> Save to CSV.
async function runPipeline(hashtag) {
const creators = await getCreatorsByHashtag(hashtag, 50);
console.log(`Found ${creators.length} unique creators. Enriching profiles...`);
const leads = [];
for (const creator of creators) {
// Add a small delay to be polite to the API
await new Promise(resolve => setTimeout(resolve, 500));
const enriched = await enrichCreatorProfile(creator.username);
// Only keep creators who have an email AND have over 10k followers
if (enriched && enriched.email && enriched.followers > 10000) {
console.log(`✅ Found lead: ${enriched.username} (${enriched.email})`);
leads.push(enriched);
}
}
console.log(`\nPipeline complete! Found ${leads.length} qualified leads with emails.`);
if (leads.length > 0) {
// Export to CSV
const parser = new Parser();
const csv = parser.parse(leads);
fs.writeFileSync(`${hashtag}_leads.csv`, csv);
console.log(`Saved to ${hashtag}_leads.csv`);
}
}
// Run it!
runPipeline('skincareroutine');
When you run this script, it will output a clean CSV file containing:
You can upload this CSV directly into your cold email software. Because you have their full_name and username, you can write highly personalized email templates:
"Hey {{full_name}}, loved your recent post on your {{username}} account about #skincareroutine..."
If you try to build this using Puppeteer or Playwright, Instagram will block your IP address within 10 minutes.
By using SociaVault, you bypass login walls, CAPTCHAs, and IP bans. You just make a REST API call and get clean JSON data back.
Get your free API key at SociaVault.com and automate your influencer outreach today.
2026-03-03 06:41:51
I turned 40-something today. I'm not going to tell you the exact number because it doesn't matter and also because I'm a little bit in denial. What I will tell you is that I've been carrying a notebook — sometimes physical, sometimes digital, always somewhere — full of ideas for the better part of 20 years.
App concepts. Business plans. Side projects. Systems I wanted to build. Problems I knew how to solve but never had the runway to sit down and solve them.
The ideas were never the problem.
If you've worked in tech long enough, you know the gap. It's the space between knowing what to build and actually building it. Not because you're lazy — because the activation energy is enormous.
You want to build an iOS app? Great. Learn Swift. Learn SwiftUI. Learn Xcode's opinions about how your project should be structured. Set up certificates. Figure out StoreKit. That's before you write a single line of business logic.
You want to self-host something? Cool. Spin up a container. Configure a reverse proxy. Set up DNS. Wire up authentication. Debug the one nginx directive that's silently eating your auth headers. That's before anyone can actually use the thing.
Every idea came with a tax — hours of scaffolding, configuration, and yak-shaving before you got to the part that mattered. And when you're working a full-time job, raising a family, and trying to maintain some semblance of a life outside of a terminal, that tax is a dealbreaker.
So the notebook grew. And the projects didn't.
Three things converged in the last year, and I don't think any one of them works without the others.
I've always had a home machine. But the M4 Pro Mac Mini hit a sweet spot I hadn't seen before: enough power to run Docker stacks, enough storage to host media, enough headroom to experiment with local AI models — and it runs 24/7 at barely a whisper on the power bill.
It's not a server rack. It's a $1,600 box on a shelf that runs my entire infrastructure. Plex, Radarr, Sonarr, a reverse proxy, SSO, a dashboard, a self-assessment platform, a portfolio site, and a kids' reading app — all on one machine, all containerized, all accessible from anywhere through a Cloudflare Tunnel.
The barrier to deploying something went from "figure out hosting" to "write a compose file and add a proxy host." That matters more than it sounds.
I'd tried every note-taking system. Notion. OneNote. Apple Notes. Google Docs. Plain text files in a folder called "notes" that I'd forget about in six months.
Obsidian stuck because it works the way my brain works: everything is a file, everything links to everything else, and the structure emerges from the connections rather than being imposed upfront. My vault isn't a notebook — it's a knowledge graph. Projects link to research. Research links to ideas. Ideas link to technical specs. Specs link to build logs.
When I sit down to work on something, the context is already there. I'm not starting from scratch — I'm continuing a thread.
This is the one that closed the gap.
I'm not talking about asking ChatGPT to write a README. I'm talking about AI as a development partner — something that can hold context across an entire project, scaffold code in languages I'm still learning, debug infrastructure issues by reading logs and configs, and maintain documentation as we go.
The workflow looks like this: I describe what I want to build. We plan it together — phased, with verification steps, with a tech spec that lives in the vault. Then we build it, phase by phase. When a phase is done, we test it, document it, and checkpoint it to Git.
I built a complete iOS app — from concept to TestFlight — in a few weeks. Not because AI wrote it for me, but because AI handled the parts that used to stop me cold: the Swift syntax I didn't know yet, the StoreKit integration I'd never done, the Xcode build errors that would have cost me a weekend of Stack Overflow.
The ideas in my notebook finally had a way to become real things. Not prototypes. Not half-finished repos. Deployed, running, usable things.
In the last month alone:
None of these are finished. All of them are real. That's the difference.
If you're curious about the actual tooling:
| Layer | What | Why |
|---|---|---|
| Hardware | Mac Mini M4 Pro, 24GB, 1TB NVMe | Silent, powerful, always on |
| Notes | Obsidian vault (iCloud-synced) | Knowledge graph, not a notebook |
| Containers | Docker Desktop + Portainer | Everything runs in containers |
| Routing | Cloudflare Tunnel + Nginx Proxy Manager | Zero open ports, public access |
| Auth | Authentik | Self-hosted SSO, Google OAuth |
| AI | Claude Code + Gemini | Planning, building, documenting |
| Version control | Git + /checkpoint
|
Every session ends with a commit |
| Sites | Astro (static) | Fast, content-driven, simple |
The key insight isn't any single tool. It's that the entire pipeline — from idea to deployed product — now fits on one machine and moves fast enough to keep up with the rate I generate ideas.
Twenty years of carrying ideas around taught me something I didn't expect: the ideas don't expire. The iOS app I'm building now started as a scribble in a notebook years ago. The self-assessment platform came from a conversation about certification prep that I'd been thinking about for months.
What expires is motivation. And motivation dies when the distance between "I have an idea" and "I have a working thing" is measured in months instead of days.
The gap was never talent or imagination or time management. The gap was tooling. The cost of turning a thought into a running application was simply too high for someone with a day job and a life.
That cost just dropped by an order of magnitude.
The notebook isn't empty yet. There are still ideas in there — some good, some terrible, some I won't know until I build them. But the backlog is moving now, and the system that moves it is documented, repeatable, and improving.
I'm not writing this to sell you on AI or home labs or Obsidian. I'm writing it because if you're someone who's been carrying ideas around for years — if you've got a graveyard of half-started repos and abandoned side projects — the execution gap is smaller than it's ever been.
The tools exist. The hardware is affordable. The AI is good enough to be a real partner, not just a fancy autocomplete.
The only thing left is to start.
Originally published at charlieseay.com
2026-03-03 06:35:34
The original article nailed the diagnosis. Here's the architecture I use to prevent the quiet disappearance it describes.
The [https://dev.to/the_nortern_dev/the-hardest-part-of-being-a-developer-isnt-coding-its-disappearing-quietly-52l) makes a point most developers recognize instantly: you don't burn out from code, you burn out from erasure. From becoming the quiet, reliable node that everyone depends on but nobody actually sees. From being the person who answers the late-night Slack messages, unblocks the pipeline, patches the brittle system—and then disappears again until the next emergency.
Developers don't quit because the work is hard. Developers quit because the work slowly dissolves the parts of them that aren't work.
That's why I insist on keeping my competitive ballroom weekends intact—Saturdays to dance, Sundays to teach. Not because it's a "hobby," but because it's the only system I've found that reliably prevents the quiet disappearance the article describes.
Most engineering environments reward the same traits that make people vanish—high reliability, low emotional footprint, asynchronous communication, deep focus, quiet competence. These traits make you effective, but they also make you invisible. You become the person who "just gets things done," which is corporate shorthand for "we don't have to think about you."
Over time, that invisibility becomes internal. You stop thinking about you, too.
I'm naturally reserved and introspective, so the social silence of remote work doesn't bother me the way it bothers some people. But the disciplined side of me—the part that thinks five to twenty years down the line—insists on maintaining a channel that forces presence, even if it's strict and structured rather than social in the conventional sense. The goal isn't to become an extrovert. The goal is to maintain a practice that keeps me from dissolving into my own usefulness.
Competitive ballroom is the one place where the developer failure mode simply cannot operate.
In a partnered dance, your partner feels everything—your hesitation, your confidence, your frame, your breath. There is no background mode. There is no "quietly competent" role where you deliver results without being perceived. The floor demands eye contact, presence, timing, projection, and shared risk—all at once, all in real time. You are seen whether you want to be or not. And that forced visibility is precisely the point.
Software work is disembodied. You live in your head for days at a stretch—reading code, reviewing abstractions, communicating through text that flattens every nuance into a thread. Ballroom is the structural inverse—physical, rhythmic, expressive, immediate. It forces you back into your body after a week of living behind a screen.
And there's a distinction that matters more than people realize: in engineering, excellence often means being mined for answers. Someone breaks production, someone pings you at 10 PM, someone needs you to "quickly review this PR." Your expertise becomes something others extract from you. In ballroom, excellence is something you inhabit. You show up, you dance, you sweat, you improve. No one is extracting anything from you. The excellence stays yours.
Teaching competitive students isn't "extra work." It's structural protection.
In tech, leadership often means being the person who absorbs ambiguity and shields everyone else—draining work disguised as authority. In ballroom, leadership means shaping technique, shaping confidence, shaping discipline. It's generative rather than extractive. You build capacity in someone else without losing your own.
Teaching also gives you something engineering rarely does—immediate, visible impact. Engineering impact is often invisible or delayed by quarters, buried in metrics dashboards no one reads. Teaching gives you instant feedback and tangible improvement. You watch someone execute a technique they couldn't do last week. That matters in a way that closing a Jira ticket never will.
Most importantly, teaching anchors your identity outside of work. Developers who disappear quietly usually have one thing in common—their entire identity is tied to being useful inside a system that doesn't see them. Teaching creates a second domain of competence, one that isn't tied to sprint velocity or on-call rotations or the quiet dread of another Sunday night deployment.
The hardest part of being a developer isn't coding. It's staying human in a system that rewards you for becoming a ghost.
Ballroom is my anti-ghost architecture. It's the place where I am visible—not because I demand attention, but because the discipline makes hiding physically impossible. It's the place where I am embodied rather than abstracted, where I am not extractable, where I am not "the reliable one," where I am not disappearing quietly.
Every developer needs something like that—a domain that structurally prevents the fade. It doesn't have to be dance. It has to be something where you cannot operate on autopilot, where your presence is required in a way that code reviews and async threads will never require it, where excellence is something you carry in your body rather than something that gets pulled out of you in a meeting.
Mine just happens to involve rhinestones, frame, and a competitive floor.
This is a response to @the_nortern_dev's piece on disappearing quietly. If you've found your own anti-ghost architecture, I'd like to hear what it is.
2026-03-03 06:30:50
I've been running multiple AI coding agents in parallel — five, six, sometimes eight workspaces at once, each tackling a different feature or fix on the same codebase. It's productive in bursts. You feel like you've hired a small team. Then you stop and look at what you've actually produced, and things get weird.
One agent added dynamic model discovery. Another agent, solving a different problem in a different workspace, also added dynamic model discovery — a slightly different version with a different class name. A third agent needed model listing as part of its feature, saw neither of the other two, and inlined its own implementation. I now had three versions of the same concept across three branches, none of which knew about the others.
This is what I'm calling agentic drift : the gradual, invisible divergence that happens when parallel autonomous agents work on related parts of a codebase without coordination. It's not a merge conflict in the git sense — your files might merge cleanly. It's a semantic conflict. The code compiles, the tests pass, but you've built the same thing three times and each version encodes slightly different assumptions about how it should work.
The workflow that creates this is seductive because the beginning feels so good. You identify six things that need doing. You spin up six agents. Each gets a workspace — a clean branch, a focused task, full autonomy. You check in an hour later and each one has made real progress. Pull requests start appearing. You feel like a CTO.
The problem starts when the tasks aren't truly independent. And they almost never are. Software is a graph, not a list. Feature A needs a utility. Feature B needs a similar utility. Feature C refactors the module where that utility should live. None of these agents talk to each other. They each make locally reasonable decisions that are globally incoherent.
What you get looks like this:
The longer you wait to integrate, the worse it gets. Each workspace drifts further from the others. The merge at the end isn't additive — it's archaeological. You're reconstructing intent from divergent timelines.
I just went through this on Glue, a terminal-based coding agent I've been building. After a stretch of parallel work using Conductor (which makes spinning up parallel agents dangerously easy), I had:
Figuring out what to merge, in what order, and how to reconcile the contradictions took longer than building any individual feature. This is the integration tax. It's the cost you pay for the parallelism, and it's nonlinear — two parallel agents are maybe 1.5x the integration work; eight are closer to 5x.
The nasty part is that each individual PR looks fine. It has tests. It has a clear description. The code is clean. It's only when you lay them all out and trace the shared surfaces that you see the mess. Feature B assumes feature A was never built. Feature D removes something feature E extends. The model registry was refactored by one agent and kept intact by three others.
Separately from the drift problem, I've been experimenting with a prompting technique for code improvement that I think might help with the integration step. The technique is simple:
Look at this code. Now imagine it was actually excellent — well-structured, handles edge cases elegantly, has clean data flow, clear abstractions. Describe that imaginary version in detail. Then compare it to what we actually have.
I'm calling this idealized diffing. Instead of asking "what's wrong with this code" (which tends to produce surface -level nitpicks) or "refactor this" (which tends to produce incremental changes), you ask the model to construct a complete mental image of the ideal version first, then use the gap between ideal and actual as a structured improvement plan.
The hypothesis: when you give the model a concrete codebase as reference, the "imagined better version" stays grounded. It can see the actual constraints — this is a TUI that needs to handle pasting, that's a session store with backward compatibility requirements. The idealized version respects those constraints while improving the architecture. Without a codebase as reference, the model hallucinates details or produces something generic.
Early results are promising. When I apply this to a module after merging conflicting branches, it tends to surface the right questions: "these two implementations serve the same purpose but encode different assumptions about X — here's how they should be unified." It's essentially using imagination as a form of code review, but one that produces a target state rather than a list of complaints.
The technique works as pre-work for refactoring. You don't execute the idealized version directly — it's a north star that helps you figure out what the merged code should look like before you start editing. Think of it as the architectural equivalent of writing tests before code: you define the desired shape before you start cutting.
I'm not the only one running into this. The problem is emerging wherever people scale up parallel agent work:
There's also MCP Agent Mail, which gives agents identities, inboxes, and file reservation leases — essentially Gmail for coding agents, backed by Git and SQLite. Agents can claim exclusive locks on files before editing and send messages to coordinate. On paper it solves the coordination problem. In practice, it feels like ceremony — another system to set up, another protocol for agents to follow, another thing that can break. I haven't used it extensively enough to say it's not worth it, but my instinct says the overhead of teaching every agent to check its mail before writing code might eat the gains from the coordination it provides. Similar vibes to Beads — thoughtful design, but the setup cost might exceed the problem cost for most workflows.
The tooling is catching up. But right now, the coordination problem is mostly unsolved — the tools detect conflicts earlier or add coordination protocols, but don't prevent the semantic drift that causes them.
Agentic drift probably can't be eliminated. Parallelism is too useful, and the cost of full coordination between agents would eat the productivity gains. But it can be managed:
Shorter integration cycles. The single biggest lever. Merge early, merge often. Don't let five branches run for a day — integrate every few hours. The integration tax compounds.
Shared context files. Give all agents a living document that describes the current architecture, recent decisions, and in-progress work. Something like a AGENTS.md or CLAUDE.md that every workspace reads. This doesn't prevent drift but it reduces the radius.
Early conflict detection. Tools like Clash can hook into your agent workflow and warn before a write happens that would conflict with another worktree. This doesn't solve drift, but it catches the mechanical conflicts early enough to redirect.
Trunk-based development with agents. Instead of long-lived feature branches, have agents work in short-lived branches that merge to main quickly. One feature per branch, one branch per hour. This conflicts with the "spin up six agents" workflow but it might be net positive.
Post-merge idealized diffing. After merging a batch of branches, run the idealization prompt on each module that was touched by multiple branches. Let the model identify where the merged code has contradictions or redundancies, then clean up deliberately.
Architectural boundaries. The less shared surface area between tasks, the less drift. If agent A works on the CLI entry point and agent B works on observability, they mostly won't step on each other. If they both touch app.dart — and they will, because god classes are drift magnets — you have a problem.
I don't want to be too down on parallel agents. The throughput is real. Features that would take a week of focused solo work can ship in a day. The quality is often surprisingly good — each individual agent does careful, tested work. The problem is purely at the integration layer.
It's the same tradeoff that real engineering teams face, just compressed into hours instead of sprints. Brooks's Law says adding people to a late project makes it later. The agentic version might be: adding agents to a coupled codebase makes the merge harder. The agents are fast, but the merge is still manual, still requires understanding the full picture, and still falls on you.
The answer isn't fewer agents. It's better integration discipline, better shared context, and maybe — if the idealized diffing technique holds up — better tools for reasoning about what the combined output should look like before you start stitching it together.
There's a possibility I keep circling back to: maybe the entire worktree-per-agent model is wrong, and the answer is just... don't isolate them.
If all agents work in the same directory on the same branch, there's no merge step. Agent A writes a utility, agent B sees it immediately, agent C builds on it. No divergence, no phantom dependencies, no archaeological merge at the end. The drift problem disappears because there's only one reality.
I've done this too, and it works — sort of. The agents step on each other less than you'd expect. They can commit their own changes in logical chunks. There's no integration tax because there's nothing to integrate.
But you lose things. For compiled languages, you get half-built broken states while agents are mid-feature. If two agents touch the same screen or module, one of them is working against a moving target. You can't preview agent A's work without also seeing agent B's half-finished changes. And the commit history becomes a mess — interleaved changes from different features, hard to revert cleanly if one feature turns out wrong.
The worktree model gives you clean isolation and clean commits at the cost of drift. The shared model gives you coherence at the cost of messy intermediate states and tangled history. Neither is obviously better. It might depend on the language (interpreted vs compiled), the codebase size, and how much the tasks overlap.
I suspect the real answer is somewhere in between — maybe two or three agents sharing one workspace, with a fourth working in isolation on something truly independent. But I haven't found that sweet spot yet. If you have, I'd like to hear about it.
For now, I'm going back to merging eight branches that all modified the same file.
2026-03-03 06:29:14
Every AI session starts cold.
You open Claude, ChatGPT, or Gemini and immediately start re-explaining the same things:
Every. Single. Session.
So I built recall.
pip install recall
recall remember "I prefer Python over JavaScript"
recall remember "Always use type hints"
recall remember "JWT for auth in synaptiq, not sessions"
recall inject # → clipboard, paste into any AI chat
recall inject --target claude # → ~/.recall/injected.md for Claude Code
Store: recall remember "text" appends to ~/.recall/memories.jsonl. Plain JSON lines. Human-readable. No database.
Rank: recall inject reads your current directory name and recent git commits to understand context. If ANTHROPIC_API_KEY is set, it sends your memories + context to claude-haiku and gets back the 8 most relevant. No key? It injects all of them.
Inject: The output is a clean Markdown block:
## My preferences and decisions
- I prefer Python over JavaScript
- Always use type hints
- JWT for auth in synaptiq, not sessions
- Short commit messages, imperative mood
Clipboard for ChatGPT, Gemini, or any AI chat. Or written to ~/.recall/injected.md for Claude Code.
Add one line to your global ~/.claude/CLAUDE.md:
See: ~/.recall/injected.md
Run recall inject --target claude once. Now every Claude Code session opens with your preferences already loaded. No paste. No re-explaining.
Without a key, all memories are injected. That's fine for 10-20 memories.
With a key, Claude Haiku reads your current directory and recent git commits, then picks the 8 most relevant memories for right now. Working on a Next.js project? It surfaces the frontend preferences. Debugging auth? It surfaces the JWT decision.
export ANTHROPIC_API_KEY=sk-ant-...
recall inject # → 8 most relevant memories, ranked by Haiku
~/.recall/memories.jsonl
{"id": 1, "text": "I prefer Python over JavaScript", "created_at": "2026-03-02T10:00:00+00:00", "tags": []}
{"id": 2, "text": "Always use type hints", "created_at": "2026-03-02T10:01:00+00:00", "tags": []}
Plain text. Trivially portable. Back it up with one cp command.
recall remember "text" # store a memory
recall list # show all memories with IDs
recall search "python" # search by keyword
recall forget 3 # delete by ID
recall inject # → clipboard
recall inject --target claude # → ~/.recall/injected.md
OpenAI has memory in ChatGPT. It's cloud-based, opaque, and only works in ChatGPT.
recall is:
~/.recall/ is yours, alwayspip install recall
recall remember "I prefer Python over JavaScript"
recall remember "Always use type hints"
recall inject
Source: github.com/LakshmiSravyaVedantham/recall
PRs welcome — especially for tag-based filtering and integrations with other AI tools.
What preferences do you re-explain to AI every session? Drop them in the comments — I'll add them to my own recall list.