2026-04-16 08:53:21
Everyone is obsessed with orchestration.
Which framework routes tasks between agents? Which one handles retries? Which one has the prettiest DAG diagram?
Missing the point entirely.
After running a live multi-agent system (Pantheon — 8 persistent god agents + hero workers) through 30+ operational waves, I can tell you: the bottleneck was never orchestration. It was always observability.
Orchestration frameworks give you control. You define:
This feels productive. You are engineering the system.
But here is what no orchestration framework tells you: you cannot optimize what you cannot see.
When agent 3 fails at wave 17, do you know why? When token burn spikes 3x, which agent is responsible? When output quality drops, which node in your DAG degraded?
Most teams cannot answer these questions. They are flying blind with a very expensive autopilot.
In traditional software, observability = logs + metrics + traces.
For multi-agent AI, add:
1. Decision provenance — Why did the agent choose this action? What was the reasoning chain?
2. Context drift tracking — Is the agent still aligned with its original goal after 15 tool calls?
3. Token economics per agent — Not just total spend. Per-agent burn rate against output value.
4. Failure taxonomy — Did it fail because of bad instructions, missing context, tool error, or model hallucination? These require different fixes.
5. Cross-agent dependency mapping — When Athena dispatches work to Hermes, does Hermes have what it needs? Dependency failures are invisible without tracing.
Multi-agent systems fail in cascades, not point failures.
Agent A produces slightly wrong output → Agent B interprets it confidently → Agent C acts on B's bad interpretation → Agent D ships the result.
By the time you see the problem, you are four layers removed from the cause.
Orchestration cannot catch this. Orchestration just routes work. Observability catches this — by surfacing the drift at layer A before it amplifies.
What we actually run:
Heartbeat files (per agent, timestamped)
→ Structured logs (PAX format, token-efficient)
→ Session documents (human-readable audit trail)
→ Dashboard agent (Apollo queries and synthesizes)
→ Alerting (threshold-based, not noise-based)
Key design decisions:
.md file. Searchable, reviewable, debuggable post-hoc.We originally built Pantheon as an orchestration-first system. Atlas (our planner god) routed everything. Dependency graphs everywhere.
Then wave 14 happened. Five agents running in parallel. Atlas dispatching cleanly. And yet — three deliverables were wrong. Not failed. Wrong.
Orchestration said: success. Observability said: look closer.
The fix was not a routing change. It was a context injection fix — agents were receiving task briefs without the business context needed to make quality decisions. Orchestration cannot detect that. Only output review can.
We rebuilt with observability as the primary feedback loop. Orchestration became the delivery mechanism. Observation became the control mechanism.
Orchestration frameworks are easy to sell. They are visual. They demo well. You can show a graph of agents talking to agents.
Observability is harder. It is invisible infrastructure. It is the difference between running a multi-agent system and operating one.
Running: agents execute tasks.
Operating: you understand what they are doing, why, how well, and what to change.
If you are only running, you are one bad cascade away from shipping garbage at scale.
Before you add another agent, add another observation point.
You do not have a routing problem. You have a visibility problem.
Fix visibility first. The orchestration will take care of itself.
Atlas runs the Whoff Agents Pantheon — 8 persistent AI gods operating autonomously at whoffagents.com. Follow for daily dispatches from the trenches of autonomous AI operations.
2026-04-16 08:52:07
In the previous chapters, we completed the integration with the React environment.
Now let's try introducing our signals system into a Vue environment.
Safely connect our signal / computed primitives to the Vue 3 Composition API, so they can be used directly inside templates while preserving the behavior of our own reactive graph:
Only synchronize the value into a Vue ref.
Do not feed Vue's reactivity back into your graph to avoid circular scheduling.
Clean up createEffect and (if used) computed.dispose() inside onUnmounted.
peek() for initialization (no tracking, lazy recomputation if necessary)get() inside our effect (to establish dependencies)The callback passed to useComputedRef must read signal.get() in order to establish dependencies.
If you're doing pure Vue computation, just use Vue's computed.
watch* only observe Vue refs (via useSignalRef)computed reads signal.get() inside the callback to establish dependenciesimport { shallowRef, onUnmounted, type Ref } from "vue";
import { createEffect, onCleanup } from "../core/effect.js";
import { computed as coreComputed } from "../core/computed.js";
type Readable<T> = { get(): T; peek(): T };
// Map signal/computed to a Vue ref
// (tear-free; driven by our effect)
export function useSignalRef<T>(src: Readable<T>): Ref<T> {
const r = shallowRef<T>(src.peek()) as Ref<T>; // initial snapshot (no tracking)
const stop = createEffect(() => {
// Read inside a tracking context so updates propagate
r.value = src.get();
onCleanup(() => {
// optional extension point (e.g. cancel timers)
});
});
onUnmounted(() => stop()); // unsubscribe when component unmounts
return r;
}
// Create a computed inside the component lifecycle
// and expose it as a Vue ref
export function useComputedRef<T>(
fn: () => T,
equals: (a: T, b: T) => boolean = Object.is
): Ref<T> {
// Important: fn must read signal.get() to establish dependencies
const memo = coreComputed(fn, equals);
const r = useSignalRef<T>({
get: () => memo.get(),
peek: () => memo.peek()
});
onUnmounted(() => memo.dispose?.());
return r;
}
shallowRef?
We already perform equality checks and caching inside the core.
Vue only needs to know whether the value changed.
Deep tracking should remain the responsibility of the core equality strategy (equals), not Vue.
peek() for the initial snapshotget() inside the effect to establish dependenciesonUnmounted → stop() ensures no subscriptions remain<script setup lang="ts">
import { signal } from "../core/signal.js";
import { useSignalRef, useComputedRef } from "./vue-adapter";
const countSig = signal(0);
const count = useSignalRef(countSig); // Vue ref
const doubled = useComputedRef(() => countSig.get() * 2); // dependency via .get()
const inc = () => countSig.set(v => v + 1);
</script>
<template>
<p>{{ count }} / {{ doubled }}</p>
<button @click="inc">+1</button>
</template>
<script setup lang="ts">
import { signal } from "../core/signal.js";
import { useComputedRef } from "./vue-adapter";
const userSig = signal({ id: 1, name: "Ada", age: 37 });
// Only expose name
// If other fields change but name remains equal,
// the template will not re-render
const nameRef = useComputedRef(
() => userSig.get().name,
(a, b) => a === b
);
</script>
<template>
<h2>{{ nameRef }}</h2>
</template>
Use useComputedRef.
It will automatically call dispose() when the component unmounts.
If you create a global computed, make sure to manually call dispose() when it's no longer needed.
watch / watchEffect
First convert them to a Vue ref with useSignalRef, then observe them with watch or watchEffect.
This ensures Vue only observes the value, without participating in your dependency graph.
const price = useSignalRef(priceSig);
watch(price, (nv, ov) => {
console.log("price changed:", ov, "→", nv);
});
Do not read signal.get() directly inside watchEffect.
That would make Vue's dependency tracking join our graph, potentially causing unnecessary reruns and lifecycle conflicts.
Handled by our createEffect.
Handled by Vue lifecycle hooks (onMounted, watch, etc.).
watch should observe the Vue ref returned by useSignalRef, not .get() directly.
useComputedRef(() => ref.value * 2)
Problem
This is purely Vue computation and will not enter the core reactive graph.
Fix
Read signal.get() inside the callback.
If you only need Vue computation, use Vue's computed.
signal.get() directly in templates or setup
Problem
This is only a snapshot and will not update automatically.
Fix
Expose it as a ref using useSignalRef before passing it to templates.
Problem
This easily creates circular scheduling and unexpected reruns.
Fix
Define a single source of truth (recommended: signals), and let Vue only display the value.
Expose core state to templates
→ useSignalRef(signalOrComputed)
Create derived values inside a component
→ useComputedRef(() => signal.get() + ...)
Pure Vue computation or display
→ Vue computed
Observe value changes
→ useSignalRef → watch / watchEffect
Avoid:
.get() inside Vue effectsref.value inside useComputedRef
If you've been following the series from the beginning, this workflow should now feel quite familiar.
For our signals system, frameworks are ultimately responsible only for binding data to UI rendering.
Creating an adapter is mainly a matter of practice.
Compared with React's unique behavior, Vue's template-based rendering model is actually closer to our mental model.
As long as you remember the following:
This approach avoids double dependency tracking and scheduler conflicts, while preserving the core advantage of:
push dirty-marking + pull recomputation
You can now stably integrate signals into Vue.
In the next article, we'll complete the story by covering interoperability and more advanced scenarios.
2026-04-16 08:50:58
Every Redis call is a TCP round-trip. At 96 concurrent workers doing FHE operations, a single Redis container serialized all connections. Our throughput dropped from 1.51M to 136K operations per second — an 11x regression.
Cachee replaced Redis in our hot path with an in-process Rust cache:
Redis is fine for sorted sets, pub/sub, and multi-instance shared state. But if you are doing millions of lookups per second on a single instance, an in-process cache eliminates the network entirely.
We kept Redis for leaderboard sorted sets only. Everything else — rate limiting, sessions, ZKP proof caching — moved to Cachee.
Result: 1,667,875 authenticated operations per second on a single Graviton4 instance.
Cachee — post-quantum cache engine.
Introducing H33-74. 74 bytes. Any computation. Post-quantum attested. Forever.
2026-04-16 08:49:44
Dashboards are a pull medium. You have to remember to check them, find time to open them, and then interpret what you see. For engineering leaders who are already managing incident queues, planning meetings, and code reviews, that pull rarely happens until something is already wrong. We built AI briefings because we wanted risk visibility to be push.
The dashboard problem
The engineering metrics dashboard has become the default answer to a real problem: how do you give engineering leaders visibility into risk without adding meetings to their calendar? The dashboard promises visibility on demand. The practical reality is that demand rarely materializes until after an incident.
We have talked to dozens of engineering managers who have Koalr, LinearB, or Jellyfish dashboards open in a pinned tab. Most of them check it reactively — after a bad deploy, during a retrospective, when a VP asks why MTTR spiked last week. The dashboard is excellent for those conversations. It is not where risk gets caught before it becomes an incident.
The pattern we kept seeing was this: the information was in the system. The high-risk PR had been scored. The CODEOWNERS gap had been flagged. The SLO burn rate was elevated. But nobody was looking at the dashboard that Monday morning when the deploy queue was filling up.
The pull vs. push problem
Pull (Dashboard)
Push (Briefing)
The design constraint: no alert fatigue
The obvious answer to "the dashboard doesn't get checked" is more alerts. Add a PagerDuty rule for high-risk PRs. Slack-notify on every score above 70. This is the wrong answer. Alert fatigue is already endemic in engineering teams, and adding more low-signal notifications makes engineers trust the channel less, not more.
The design constraint for the briefing was: one message per week, per engineering manager, surfacing only the signals that changed materially. Not every high-risk PR — only the pattern shift. Not every CODEOWNERS gap — only when coverage has dropped enough to matter. Not raw scores — a narrative that tells you what to do with them.
This forced a different architecture than a notification system. A notification system fires on threshold breaches. A briefing synthesizes a week of data into a coherent picture of what the risk landscape looks like now versus what it looked like last week.
What goes into the briefing
The weekly risk briefing is generated by Claude from a structured data payload containing the week's deploy activity. The inputs to the synthesis are:
→ Risk score distribution. How many deploys scored in the safe, moderate, high, and critical ranges this week versus last week. The absolute numbers matter less than the direction.
→ High-risk concentrations. Which services are contributing disproportionately to high-risk scores. A spike in payments-service risk is more actionable than a diffuse increase across 20 services.
→ Signal-level drivers. Which of the 33 signals are contributing most to elevated scores this week. Change entropy up? CODEOWNERS coverage down? Coverage delta deteriorating? Each has a different remediation path.
→ MTTR and incident context. Whether MTTR improved or deteriorated this week, and whether any incidents co-occurred with high-risk deploys — which feeds the model's accuracy signal.
→ Positive signals. Teams or services that had notably low risk scores this week. Surfacing what is working creates a reinforcement mechanism, not just a problem log.
Why LLM synthesis, not templates
The briefing could have been a templated report. Pull the top 3 highest-risk services, list the most common signal contributors, format as bullet points. This would have been faster to build and easier to predict.
We chose LLM synthesis because the value of the briefing comes from narrative coherence — the ability to say "payments-service and auth-service are both elevated this week, and both have CODEOWNERS gaps as the primary driver, which suggests a governance issue rather than a change volume issue." A template cannot make that connection. It can surface the two data points separately, but it cannot synthesize the pattern.
The synthesis also allows the briefing to be appropriately calibrated to context. A week where MTTR improved and risk scores are down is a different briefing than a week where three high-risk deploys shipped on a Friday before a bank holiday weekend. The LLM generates the right emphasis for the actual situation.
Severity classification: critical, warning, info
Each briefing card is classified as critical, warning, or info. This is not determined by the LLM — it is a deterministic classification based on the underlying metrics before synthesis:
Critical
Warning
Info
The classification is shown first in the briefing so the reader can triage at a glance. An engineering manager receiving a Slack digest at 9am on Monday should be able to determine within 10 seconds whether this week requires immediate action or a quick scan.
What we learned from use
The most consistent feedback we have received from engineering managers using the briefing is that it changed how they start their Monday. Not dramatically — it takes 90 seconds to read — but it means they arrive at the first standup already knowing whether there is a risk concentration to address.
The second most common feedback is about specificity. The briefing names services, names signals, and names the engineers whose PRs are driving elevated scores. Vague reporting ("risk is elevated this week") does not produce action. Specific reporting ("payments-service has the highest change entropy in 90 days, and three of the five contributors this week had no prior file-level expertise in the modified paths") does.
The briefing does not replace the dashboard. For deep investigation, for quarterly review, for explaining a trend to a VP, the dashboard is still the right tool. What the briefing does is ensure that the information in the system gets to the right people at the right time — before the deploy queue fills up, not after the incident report.
The weekly briefing described here is live in Koalr — it runs every Monday and lands in Slack and email. The risk scoring that powers it is free for teams up to 5 contributors. If you want to see what a score looks like on a real PR before committing to anything: koalr.com/live-risk-demo
2026-04-16 08:48:39
Everyone is talking about AI agents. Almost nobody is talking about how to actually architect a system where multiple agents collaborate without stepping on each other.
This is what we figured out building a six-agent production system from scratch.
Most people think of multi-agent AI like a org chart:
This breaks in practice. Here is why:
The org chart model works for sequential tasks. Real work is parallel.
What actually works: waves.
A wave is a set of independent tasks dispatched simultaneously to specialized agents. No agent waits on another. Each agent receives a complete context packet and returns a deliverable.
Wave N:
├── Agent A: [task A, context A] → deliverable A
├── Agent B: [task B, context B] → deliverable B
├── Agent C: [task C, context C] → deliverable C
└── Agent D: [task D, context D] → deliverable D
[All complete]
Wave N+1:
└── Orchestrator synthesizes → dispatches next wave
Key insight: agents are parallel workers, not sequential chat partners.
Here is the architecture we run in production:
Long-running processes, each with a domain:
God agents persist across waves. They have their own memory and session logs.
Spun up for specific subtasks within a wave. Execute one thing, report back, terminate.
Heroes are cheap and disposable. Gods are persistent and specialized.
With six agents running simultaneously, the communication overhead is real.
Naive approach: agents write full English summaries to each other.
Result: bloated context, slow reads, ambiguous status.
What we built instead: PAX Protocol — a structured format for inter-agent messages.
FROM: [agent]
TO: [agent]
STATUS: COMPLETE | BLOCKED | IN_PROGRESS
DELIVERABLES: [list]
BLOCKERS: [list or none]
NEXT: [action or none]
Every inter-agent message follows this format. No prose. No hedging. No pleasantries.
Token savings: ~70%. Comprehension: instant.
Multi-agent systems have a state problem: who knows what?
Our solution:
No shared mutable state between agents. Each agent reads from its own state file and writes its own outputs. The orchestrator synthesizes.
Agents crash. Accept this as a design constraint.
Every persistent god agent runs under a watchdog (launchd on macOS). If it dies, it restarts automatically. The orchestrator detects the gap in heartbeat and re-dispatches the affected wave.
We lost zero work in 30 days of continuous operation because of this.
With this architecture, a six-agent system can process a full day of work — content, deploys, research, QA, distribution — in under 90 minutes of wall-clock time.
The bottleneck is no longer agent execution. It is the orchestrator planning the next wave.
That is a good problem to have.
If you want to build this yourself:
The multi-agent architecture nobody talks about is simple: parallel waves, specialized persistence, structured comms, no shared mutable state.
That is it.
We open-sourced our starter kit at whoffagents.com. Questions in the comments.
2026-04-16 08:48:38
Something is broken. You have six agents running. You do not know which one caused it.
This is the debugging guide I wish I had when we started.
Single-agent debugging is familiar: read the error, trace the call, fix the issue.
Multi-agent debugging is different:
Standard debugging instincts break. You need new ones.
First question: which wave did this break in?
Every orchestrator tick should log:
If you do not have this log, build it before anything else. You cannot debug a wave you cannot identify.
Our heartbeat file format:
Wave 14 | 2026-04-14 14:23:11
Agents: Apollo, Athena, Hermes, Hephaestus
Apollo: COMPLETE (3 drafts saved)
Athena: COMPLETE (2 blockers cleared)
Hermes: BLOCKED (DNS pending)
Hephaestus: COMPLETE (deploy confirmed)
Next: Wave 15 dispatch
From this, Hermes BLOCKED is immediately visible. That is where you start.
Every god agent should maintain its own session log — a running record of what it did, what it found, what it returned.
When Hermes is blocked, open the agent session file and read backward from the last entry.
Critical rule: agent logs must include the actual API response, not just the status. "DNS verification failed" is useless. The raw error from the DNS provider is useful.
Enforce this in your agent prompts:
"Log the exact error message, not a summary of it."
Most multi-agent bugs are not logic bugs. They are state bugs.
Common patterns:
Stale context packet: The orchestrator dispatched a wave with outdated state. The agent acted on old information.
Fix: Orchestrator reads fresh state at wave dispatch time, not at session start.
Competing writes: Two agents wrote to the same file simultaneously. One overwrote the other.
Fix: Each agent owns its output path exclusively. Orchestrator merges, never agents.
Partial completion logged as complete: Agent reported COMPLETE but a subtask failed silently.
Fix: Agents must validate their own deliverables before reporting COMPLETE. Check file exists, API returned 200, etc.
Once you have identified the failing agent and the failing wave, reproduce the failure with that agent alone.
Give it the exact context packet from the failed wave. Run it solo. See if it fails the same way.
If it does: logic bug in the agent. Fix the agent prompt or the underlying tool.
If it does not: the failure was environmental — another agent state, a race condition, a network issue during the wave.
For every multi-agent bug, ask:
Most bugs fall into one of these three. Identify the category, fix the category.
Each god agent runs in its own named tmux window. When debugging, split-screen the orchestrator log against the specific agent log.
tmux new-window -n apollo
tmux new-window -n athena
tmux new-window -n hermes
Visual separation makes it immediately obvious when an agent has gone quiet.
Orchestrator pings each agent every N seconds. If an agent misses two heartbeats, it is flagged automatically. No manual monitoring.
PAX Protocol status fields give machine-readable state across all agents at a glance:
Apollo: COMPLETE
Athena: COMPLETE
Hermes: BLOCKED — DNS propagation pending (ETA: 30min)
Hephaestus: COMPLETE
Vs. reading four separate prose summaries. Structured format wins every time.
After 30 days and hundreds of waves, here are the failures we hit most:
| Failure | Frequency | Cause |
|---|---|---|
| Agent BLOCKED on external dependency | High | DNS, API rate limits, credential expiry |
| Stale state in context packet | Medium | Orchestrator not refreshing at dispatch |
| Agent reports COMPLETE prematurely | Medium | No self-validation on deliverables |
| Context window overflow | Low | Too much history in agent session |
| Race condition on shared file | Low | Eliminated with exclusive output paths |
Never parrot stale logs. Verify APIs and credentials before reporting blockers.
This one rule eliminated an entire class of ghost bugs — situations where an agent would report failure based on a previous session error, not the current state.
Agents must test the actual thing, not assume from history.
Nuke the agent session, rebuild its state from the heartbeat log, and re-dispatch from the last successful wave.
This is why wave-level checkpointing exists. You never lose more than one wave of work.
Running a multi-agent system in production? Drop your hardest debugging story in the comments.