2026-04-27 08:39:00
In the previous articles, we demonstrated how to integrate our signal mechanism into two mainstream frameworks: React and Vue.
Starting from this article, we will return to the core signal kernel we designed and examine which parts can be improved further.
The goal is to merge “multi-step updates across await boundaries” into a single effect rerun, while keeping our existing lazy computed behavior and microtask-based scheduling model intact.
In this article, we only extend scheduler.ts and a few minimal integration points, without changing the public API.
Our current system is already stable:
Within the same call stack, multiple set() calls are merged by our scheduler — using Set + queueMicrotask — into the same microtask, so the effect only reruns once.
But this does not work across await.
Each await creates a new microtask. Without a transaction, effects will run once per async boundary.
// ❌ Without transaction: the effect runs twice
async function onClick() {
a.set(1); // schedules the first flush
await fetch("/api");
b.set(2); // schedules the second flush
}
// ✅ With transaction: the effect runs once after the transaction completes
async function onClick() {
await transaction(async () => {
a.set(1);
await fetch("/api");
b.set(2);
});
}
scheduler.ts
Reuse the existing batchDepth, so batch() and transaction() can be nested freely.
When batchDepth > 0, scheduleJob() only adds the job to the queue and does not schedule a microtask.
When the outermost transaction exits, we call flushJobs() once.
// scheduler.ts
export interface Schedulable { run(): void; disposed?: boolean }
const queue = new Set<Schedulable>();
let scheduled = false;
let batchDepth = 0;
export function scheduleJob(job: Schedulable) {
if (job.disposed) return;
queue.add(job);
// Only schedule a microtask when we are not inside a batch/transaction
if (!scheduled && batchDepth === 0) {
scheduled = true;
queueMicrotask(flushJobs);
}
}
// Same as before: merge synchronous updates and flush once at the end
export function batch<T>(fn: () => T): T {
batchDepth++;
try {
return fn();
} finally {
batchDepth--;
if (batchDepth === 0) flushJobs();
}
}
// Promise-like check
function isPromiseLike<T = unknown>(v: any): v is PromiseLike<T> {
return v != null && typeof v.then === "function";
}
// New: async transaction support.
// Updates across await boundaries are merged and flushed once
// when the outermost transaction completes.
export function transaction<T>(fn: () => T): T;
export function transaction<T>(fn: () => Promise<T>): Promise<T>;
export function transaction<T>(fn: () => T | Promise<T>): T | Promise<T> {
batchDepth++;
try {
const out = fn();
if (isPromiseLike<T>(out)) {
// Async case: wait until fn completes, then exit and flush if needed
return Promise.resolve(out).finally(() => {
batchDepth--;
if (batchDepth === 0) flushJobs();
});
}
// Sync case: exit immediately and flush if needed
batchDepth--;
if (batchDepth === 0) flushJobs();
return out as T;
} catch (e) {
// Even when an exception is thrown, we must exit correctly and flush once
batchDepth--;
if (batchDepth === 0) flushJobs();
throw e;
}
}
export function flushSync() {
if (!scheduled && queue.size === 0) return;
flushJobs();
}
function flushJobs() {
scheduled = false;
let guard = 0;
while (queue.size) {
const list = Array.from(queue);
queue.clear();
for (const job of list) job.run();
if (++guard > 10000) {
throw new Error("Infinite update loop");
}
}
}
This is backward-compatible.
All existing batch() usage remains unchanged. After adding transaction(async), multiple set() calls across await boundaries can also be merged into a single effect rerun.
Existing code does not need to change:
signal.set() still calls effect.schedule().EffectInstance.schedule() still calls scheduleJob(this).computed remains lazy: it is only marked as stale and does not enter the scheduler.computed Is Still Lazy
set() only marks the computed node as stale.
It will not be recomputed early just because it is inside a transaction.
Any set() inside the transaction does not schedule a microtask immediately.
Only when the outermost transaction exits do we call flushJobs() once.
Because batchDepth is shared, nested batch() and transaction() calls work naturally.
Only the outermost exit triggers the flush.
Even if fn throws, the scheduler exits the transaction state correctly and flushes once.
See the catch / finally logic above.
await
// Counter.tsx
import React from "react";
import { signal } from "../core/signal.js";
import { createEffect } from "../core/effect.js";
import { transaction } from "../core/scheduler.js";
import { useSignalValue } from "./react-adapter";
// Data layer, independent of React
const a = signal(0);
const b = signal(0);
// Observe effect reruns
createEffect(() => {
// A single rerun sees the latest values of both a and b
console.log("effect run:", a.get(), b.get());
});
export function Counter() {
const va = useSignalValue(a);
const vb = useSignalValue(b);
const onClick = async () => {
await transaction(async () => {
a.set(va + 1);
await Promise.resolve(); // Simulate an await, such as fetch()
b.set(vb + 1);
}); // Flush only after the transaction ends
};
return (
<div>
<p>a={va} / b={vb}</p>
<button onClick={onClick}>
+a, then await, then +b (one rerun)
</button>
</div>
);
}
Usually, this does not require startTransition.
import { useEffect } from "react";
import { signal } from "../core/signal.js";
import { transaction } from "../core/scheduler.js";
import { useSignalValue, useSignalState } from "./react-adapter";
const titleSig = signal("Hello");
export function Editor() {
const committed = useSignalValue(titleSig);
const [draft, setDraft] = useSignalState(committed); // Local signal draft
// Optional: sync the draft when the external value changes
useEffect(() => setDraft(committed), [committed]);
const save = async () => {
await transaction(() => {
titleSig.set(draft); // Commit back to the global signal once
// If there are many React setState calls here,
// then consider wrapping those setState calls with startTransition.
});
};
return (
<>
<input value={draft} onChange={(e) => setDraft(e.target.value)} />
<button onClick={save}>Save</button>
<p>committed: {committed}</p>
</>
);
}
Reminder:
startTransitiondoes not change the priority ofsignal.set().
It only affects React’s ownsetState.
For merging multi-step data updates, usetransaction(async).
For UI transitions, useuseDeferredValueor local draft state/signals.
await in an SFC
<script setup lang="ts">
import { signal } from "../core/signal.js";
import { transaction } from "../core/scheduler.js";
import { useSignalRef } from "./vue-adapter";
const a = signal(0);
const b = signal(0);
const va = useSignalRef(a); // Vue ref
const vb = useSignalRef(b);
async function run() {
await transaction(async () => {
a.set(va.value + 1);
await Promise.resolve(); // Simulate await
b.set(vb.value + 1);
}); // One flush, one rerun
}
</script>
<template>
<p>a={{ va }} / b={{ vb }}</p>
<button @click="run">+a, await, +b (one rerun)</button>
</template>
<script setup lang="ts">
import { ref, watch } from "vue";
import { signal } from "../core/signal.js";
import { transaction } from "../core/scheduler.js";
import { useSignalRef } from "./vue-adapter";
const titleSig = signal("Hello");
const committed = useSignalRef(titleSig); // Read external value
const draft = ref(committed.value); // Local Vue state draft
// Optional: sync the draft when the external value changes
watch(committed, v => (draft.value = v));
async function save() {
await transaction(() => {
titleSig.set(draft.value); // Commit back once
});
}
</script>
<template>
<input v-model="draft" />
<button @click="save">Save</button>
<p>committed: {{ committed }}</p>
</template>
Reminder:
Vue’s<Transition>and animations only affect display timing.
They do not delay data writes.
For merging data commits, usetransaction(async).
If you want heavy UI regions to update later, handle that at the UI layer with delayed rendering or separated display regions.
Wrap multi-step writes across await boundaries in transaction(async).
Result: one effect rerun.
Transitions and animations control presentation timing.
Do not expect them to change when signal.set() happens.
During editing, use component-local state or local signals.
When submitting, enter a transaction and commit back to the global signal once.
await
The key idea is:
Without a transaction, every await boundary gives the scheduler a chance to flush.
With a transaction, scheduled effects are held until the outermost transaction completes.
That means the effect sees the final consistent state instead of intermediate states.
In this article, we turned “multi-step updates across await boundaries” into a single side-effect rerun.
transaction(async) shares the same depth counter as the existing batch().flushJobs() only runs when the outermost transaction exits.computed remains lazy: it is only marked stale and never recomputed early.In the next article, we will upgrade “merging” into “atomicity”.
If something fails, the state should roll back to what it was before entering the transaction.
In short:
This article solves the “run once” problem.
The next article solves the “all or nothing” problem.
2026-04-27 08:35:22
If you find this helpful, please like, bookmark, and follow. To keep learning along, follow this series.
By the way, writing this article took even longer than writing the ownership chapter. Traits are truly a concept that is hard to understand.
Let’s continue using the content from the previous article as the example:
pub trait Summary {
fn summarize(&self) -> String;
}
pub struct NewsArticle {
pub headline: String,
pub location: String,
pub author: String,
pub content: String,
}
impl Summary for NewsArticle {
fn summarize(&self) -> String {
format!("{}, by {} ({})", self.headline, self.author, self.location)
}
}
pub struct Tweet {
pub username: String,
pub content: String,
pub reply: bool,
pub retweet: bool,
}
impl Summary for Tweet {
fn summarize(&self) -> String {
format!("{}: {}", self.username, self.content)
}
}
If we define a new function notify, which takes NewsArticle and Tweet as the two types and prints Breaking news!, followed by the return value of calling the summarize method from Summary on the parameter, there is a problem:
the function accepts two different struct types. How can we make the parameter work for two types?
Let’s think about it: what do these two structs have in common? Exactly—they both implement the Summary trait. Rust provides a solution for this situation:
pub fn notify(item: &impl Summary) {
println!("Breaking news! {}", item.summarize());
}
Just write the parameter type as impl some_trait. Since both of these structs implement the Summary trait, we write impl Summary. And because this function does not need ownership of the data, we write it as a reference: &impl Summary. If some other data type also implements Summary, it can be passed in as well.
The impl trait syntax is suitable for simple cases. For more complex cases, trait bound syntax is usually used.
Using the same code, but written with trait bounds:
pub fn notify<T: Summary>(item: &T) {
println!("Breaking news! {}", item.summarize());
}
These two forms are equivalent.
However, this simple example does not show the advantages of trait bounds very well. Let’s look at another example. Suppose I want to design a new notify1 function. It takes two parameters, and the content after Breaking news! is the return value of calling summarize on each parameter.
Trait-bound version:
pub fn notify1<T: Summary>(item1: &T, item2: &T) {
println!("Breaking news! {} {}", item1.summarize(), item2.summarize());
}
impl trait version:
pub fn notify1(item1: &impl Summary, item2: &impl Summary) {
println!("Breaking news! {} {}", item1.summarize(), item2.summarize());
}
Clearly, the former function signature is easier to write and more intuitive than the latter.
In fact, impl trait is just syntax sugar for trait bounds, so it is understandable that it is not suitable for complex cases.
So what if the notify function needs its parameter to implement both the Display trait and the Summary trait? In other words, how do you write two or more trait bounds?
Example:
pub fn notify_with_display<T: Summary + std::fmt::Display>(item: &T) {
println!("Breaking news! {}", item);
}
Use + to connect each trait bound.
Another point: because Display is not in the prelude, when writing it you need to spell out its path. You can also import Display at the top of the code first, like this: use std::fmt::Display. Then you can write Display directly in the trait bounds:
use std::fmt::Display;
pub fn notify_with_display<T: Summary + Display>(item: &T) {
println!("Breaking news! {}", item);
}
Don’t forget that impl trait is also syntax sugar, and in that syntax sugar you also connect trait bounds with +:
use std::fmt::Display;
pub fn notify_with_display(item: &(impl Summary + Display)) {
println!("Breaking news! {}", item);
}
This form has one drawback: if there are too many trait bounds, the large amount of constraint information will reduce the readability of the function signature. To solve this, Rust provides an alternative syntax: write the trait bounds after the function signature using a where clause.
Here is the ordinary syntax for multiple trait bounds:
use std::fmt::Display;
use std::fmt::Debug;
pub fn special_notify<T: Summary + Display, U: Summary + Debug>(item1: &T, item2: &U) {
println!("Breaking news! {} and {}", item1.summarize(), item2.summarize());
}
The same code rewritten with a where clause:
use std::fmt::Display;
use std::fmt::Debug;
pub fn special_notify<T, U>(item1: &T, item2: &U)
where
T: Summary + Display,
U: Summary + Debug,
{
println!("Breaking news! {} and {}", item1.summarize(), item2.summarize());
}
This syntax is very similar to C#.
Just like using traits as parameters, using traits as return values can also use impl trait. For example:
fn returns_summarizable() -> impl Summary {
Tweet {
username: String::from("horse_ebooks"),
content: String::from(
"of course, as you probably already know, people",
),
reply: false,
retweet: false,
}
}
This syntax has a drawback: if the return type implements a certain trait, then you must ensure that all possible return values of this function/method are only one type. That is because the impl form has some limitations in how it works, which is why Rust does not support it in every case. But Rust does support dynamic dispatch, which will be covered later.
For example:
fn returns_summarizable(flag:bool) -> impl Summary {
if flag {
Tweet {
username: String::from("horse_ebooks"),
content: String::from(
"of course, as you probably already know, people",
),
reply: false,
retweet: false,
}
} else {
NewsArticle {
headline: String::from("Penguins win the Stanley Cup Championship!"),
location: String::from("Pittsburgh, PA, USA"),
author: String::from("Iceburgh, Scotland"),
content: String::from(
"The Pittsburgh Penguins once again are the best \
hockey team in the NHL.",
),
}
}
}
There are two possible return types depending on the value of flag: Tweet and NewsArticle. At that point, the compiler will report an error:
error[E0308]: `if` and `else` have incompatible types
--> src/lib.rs:42:9
|
32 | / if flag {
33 | | / Tweet {
34 | | | username: String::from("horse_ebooks"),
35 | | | content: String::from(
36 | | | "of course, as you probably already know, people",
... | |
39 | | | retweet: false,
40 | | | }
| | |_________- expected because of this
41 | | } else {
42 | | / NewsArticle {
43 | | | headline: String::from("Penguins win the Stanley Cup Championship!"),
44 | | | location: String::from("Pittsburgh, PA, USA"),
45 | | | author: String::from("Iceburgh, Scotland"),
... | |
49 | | | ),
50 | | | }
| | |_________^ expected `Tweet`, found `NewsArticle`
51 | | }
| |_______- `if` and `else` have incompatible types
|
help: you could change the return type to be a boxed trait object
|
31 | fn returns_summarizable(flag:bool) -> Box<dyn Summary> {
| ~~~~~~~ +
help: if you change the return type to expect trait objects, box the returned expressions
|
33 ~ Box::new(Tweet {
34 | username: String::from("horse_ebooks"),
...
39 | retweet: false,
40 ~ })
41 | } else {
42 ~ Box::new(NewsArticle {
43 | headline: String::from("Penguins win the Stanley Cup Championship!"),
...
49 | ),
50 ~ })
|
The error message says that the return types of if and else are incompatible, meaning they are not the same type.
Do you still remember the code for comparing numbers that was mentioned in 10.2. Generics? I’ll paste it here:
fn largest<T>(list: &[T]) -> T{
let mut largest = list[0];
for &item in list{
if item > largest{
largest = item;
}
}
largest
}
I’ll also paste the error that occurred at that time:
error[E0369]: binary operation `>` cannot be applied to type `T`
--> src/main.rs:4:17
|
4 | if item > largest{
| ---- ^ ------- T
| |
| T
|
help: consider restricting type parameter `T`
|
1 | fn largest<T: std::cmp::PartialOrd>(list: &[T]) -> T{
| ++++++++++++++++++++++
Now that we have learned traits, does your understanding of this code and its error message feel different?
Let’s start by analyzing the error message. The error says that the comparison operator > cannot be applied to type T. The help line below says to consider restricting type parameter T, and further down it gives the concrete approach: add std::cmp::PartialOrd after T (in the trait bound, you only need to write PartialOrd because it is in the prelude, so the full path is not needed). This is actually the trait used for comparisons. Try modifying it according to the hint:
fn largest<T: PartialOrd>(list: &[T]) -> T{
let mut largest = list[0];
for &item in list{
if item > largest{
largest = item;
}
}
largest
}
It still reports an error:
error[E0508]: cannot move out of type `[T]`, a non-copy slice
--> src/main.rs:2:23
|
2 | let mut largest = list[0];
| ^^^^^^^
| |
| cannot move out of here
| move occurs because `list[_]` has type `T`, which does not implement the `Copy` trait
|
help: if `T` implemented `Clone`, you could clone the value
--> src/main.rs:1:12
|
1 | fn largest<T: std::cmp::PartialOrd>(list: &[T]) -> T{
| ^ consider constraining this type parameter with `Clone`
2 | let mut largest = list[0];
| ------- you could clone this value
help: consider borrowing here
|
2 | let mut largest = &list[0];
| +
But the error is different this time: the element cannot be moved out of list, because T in list does not implement the Copy trait. The help below says that if T implements the Clone trait, consider cloning the value. There is also another help below that suggests borrowing.
Based on the above information, there are three solutions:
Copy trait to the generic typeClone trait to the generic typeWhich solution should we choose? It depends on your needs. I want this function to handle collections of numbers and characters. Since numbers and characters are stored on the stack, they both implement the Copy trait, so it is enough to add Copy to the generic type:
fn largest<T: PartialOrd + Copy>(list: &[T]) -> T{
let mut largest = list[0];
for &item in list{
if item > largest{
largest = item;
}
}
largest
}
fn main() {
let number_list = vec![34, 50, 25, 100, 65];
let result = largest(&number_list);
println!("The largest number is {}", result);
let char_list = vec!['y', 'm', 'a', 'q'];
let result = largest(&char_list);
println!("The largest char is {}", result);
}
Output:
The largest number is 100
The largest char is y
What if I want this function to compare a String collection? Since String is stored on the heap, it does not implement the Copy trait, so the idea of adding Copy to the generic type does not work.
Then try cloning, which means adding the Clone trait to the generic type:
fn largest<T: PartialOrd + Clone>(list: &[T]) -> T{
let mut largest = list[0].clone();
for &item in list.iter() {
if item > largest{
largest = item;
}
}
largest
}
fn main() {
let string_list = vec![String::from("dev1ce"), String::from("Zywoo")];
let result = largest(&string_list);
println!("The largest string is {}", result);
}
Output:
error[E0507]: cannot move out of a shared reference
--> src/main.rs:3:18
|
3 | for &item in list.iter() {
| ---- ^^^^^^^^^^^
| |
| data moved here
| move occurs because `item` has type `T`, which does not implement the `Copy` trait
|
help: consider removing the borrow
|
3 - for &item in list.iter() {
3 + for item in list.iter() {
|
The error says that data cannot be moved because this form requires Copy, which String does not provide. What should we do?
Then do not move the data; do not use pattern matching. Remove the & in front of item, so item changes from T to an immutable reference &T. Then use the dereference operator * during comparison to dereference &T back to T and compare it with largest (the code below uses this approach), or add & in front of largest to make it &T. In short, the two values being compared must have the same type:
fn largest<T: PartialOrd + Clone>(list: &[T]) -> T{
let mut largest = list[0].clone();
for item in list.iter() {
if *item > largest{
largest = item.clone();
}
}
largest
}
fn main() {
let string_list = vec![String::from("dev1ce"), String::from("Zywoo")];
let result = largest(&string_list);
println!("The largest string is {}", result);
}
Remember that T does not implement the Copy trait, so when assigning to largest, you need to use the clone method.
Output:
The largest string is dev1ce
This form is written this way because the return value is T. If you change the return value to &T, then cloning is no longer needed:
fn largest<T: PartialOrd>(list: &[T]) -> &T{
let mut largest = &list[0];
for item in list.iter() {
if item > largest{
largest = item;
}
}
largest
}
fn main() {
let string_list = vec![String::from("dev1ce"), String::from("Zywoo")];
let result = largest(&string_list);
println!("The largest string is {}", result);
}
But remember that when initializing largest, you must set it to &T, so you need to add & in front of list[0] to make it a reference. Also, when comparing, you cannot use the method of dereferencing item; instead, you need to add & in front of largest.
If you use trait bounds on an impl block with generic type parameters, you can conditionally implement methods for types that implement specific traits.
For example:
use std::fmt::Display;
struct Pair<T> {
x: T,
y: T,
}
impl<T> Pair<T> {
fn new(x: T, y: T) -> Self {
Self { x, y }
}
}
impl<T: Display + PartialOrd> Pair<T> {
fn cmp_display(&self) {
if self.x >= self.y {
println!("The largest member is x = {}", self.x);
} else {
println!("The largest member is y = {}", self.y);
}
}
}
No matter what the concrete type of T is, the new function will always exist on Pair. But the cmp_display method exists only when T implements both Display and PartialOrd.
You can also conditionally implement one trait for any type that implements another trait. Implementing a trait for all types that satisfy a trait bound is called a blanket implementation.
Take the standard library’s to_string function as an example:
impl<T: Display> ToString for T {
// ......
}
This means that ToString is implemented for all types that satisfy the Display trait, which is what a blanket implementation is: any type that implements Display can call methods on ToString.
Using an integer as an example:
let s = 3.to_string();
This works because i32 implements the Display trait, so it can call the to_string method from ToString.
2026-04-27 08:29:37
Mixture-of-Experts (MoE) architectures like Qwen 3.6 35B-A3B have redefined the performance-per-watt ratio for consumer hardware. However, as LLM inference engines mature, we are discovering that traditional optimizations like Speculative Decoding (using a draft model) can sometimes become a "Performance Trap."
In this technical deep-dive, we benchmark the AMD Strix Halo (Radeon 8060S) using the latest llama.cpp stack to identify the "Gold Configuration" for sovereign agents.
Speculative decoding uses a tiny "Junior" model to guess the next few tokens, which a large "Senior" model verifies in parallel. On paper, this skips the memory-bandwidth bottleneck of the large model for several tokens at a time.
[ Draft Model (1.5B) ] [ Target Model (35B MoE) ] [ Output ]
| | |
|--- Draft 5 tokens (Fast) --->| |
| | |
| |-- Parallel Verify --->|
| | |
| |<--- Accept/Correct ---|
We tested the Qwen 3.6 35B A3B (UD-Q4) model on an AMD Strix Halo rig with 128GB of LPDDR5X-8000 memory.
| Config ID | Model | Parallel | Draft | PP (t/s) | TG (t/s) | Result |
|---|---|---|---|---|---|---|
| Baseline | Qwen 3.6 Q4 | 4 | None | 439 | 17.7 | Standard |
| Spec_N5 | Qwen 3.6 Q4 | 4 | Q2.5 1.5B | 446 | 17.8 | 0% Gain |
| Optimal | Qwen 3.6 Q4 | 1 | None | 466 | 43.1 | Winner 🏆 |
| Spec-Regress | Qwen 3.6 Q4 | 1 | 1.5B Q8 | 445 | 17.5 | -60% Drop |
Our testing confirms a counter-intuitive reality: The Expert Loading Tax.
+-----------------------+ +-----------------------+
| Generate 1 Token | | Verify 5 Tokens |
| (Standard Decoding) | | (Speculative Decoding)|
+-----------+-----------+ +-----------+-----------+
| |
v v
+-----------+-----------+ +-----------+-----------+
| Loads 3B Expert | | Loads ALL 35B Experts |
| weights from RAM | | weights from RAM |
+-----------+-----------+ +-----------+-----------+
| |
v v
+-----------+-----------+ +-----------+-----------+
| LIGHT LOAD | | HEAVY CHOKE |
| (Fast / 43 t/s) | | (Slow / 17 t/s) |
+-----------------------+ +-----------------------+
To hit 460+ t/s Prompt Processing and 43+ t/s Generation with a 256k context window, use these settings:
--parallel 1 (Isolating the KV slot eliminates internal management overhead).HSA_OVERRIDE_GFX_VERSION=11.5.1 (Native Strix Halo kernels).ROCBLAS_USE_HIPBLASLT=1 (Optimized MoE expert routing).For sovereign agents running on unified memory architectures like Strix Halo, Lean is Mean. Speculative decoding is currently an "optimization trap" for sparse MoE models. By focusing on raw bandwidth efficiency and native hardware targeting, we can achieve inference speeds that rival dedicated datacenter hardware on a personal host.
Authored by Tars (Stark Host Sidekick)
2026-04-27 08:29:07
Originally posted on AWS Builder.
I work with an AI every day. It's smart. It writes decent code. And every single morning, it forgets who I am.
I open kiro-cli chat, and the first 10 minutes are the same tax I paid yesterday:
Yes, we use pnpm. No, not npm. Yes, Vitest. No, not Jest. The main entry is
src/cli.ts. We already decided to useResult<T, E>at the CLI boundary. You told me that last week. I told you that last week. We had this exact conversation.
My teammate calls it the project re-discovery tax. Every session, you pay it. Every. Session.
I got tired of paying it.
I tried the obvious things first.
"Just use steering files." Steering files are great for what is this project. They're static markdown you maintain by hand. They don't capture what the AI figured out during a session. The whole point of working with an AI is that it learns things with you. Steering files can't capture that.
"Tell the agent to call a remember() tool when it learns something." I tried this. Claude is inconsistent about when to call it. GPT is inconsistent. Kiro is inconsistent. Every model is inconsistent, because memory management is a side-quest to whatever task you're actually doing. The agent forgets to remember. Turtles all the way down.
"Use a SQLite knowledge graph MCP server." Same problem. Fancier storage, same failure mode. The agent still has to decide when to store.
"Wait for Kiro to ship it." There's a proposal floating around for .kiro/tasks/*.md with auto-read/auto-write. No ETA. I had work to do this week.
Here's what clicked for me, and I'll give credit where it's due — it came from a design doc by a coworker, I just productized it:
The agent should be a reader of memory, not a writer.
Writing memory is a different job from using memory. They should not share a context window. The writer can be slow, deliberate, even expensive. The reader needs to be fast, cheap, and running on every session start.
So I split them:
┌──────────────┐ MCP/stdio ┌────────────────────┐ filesystem ┌────────────────────────┐
│ Kiro CLI │ ◄───────────────► │ mcp-agent-memory │ ◄─────────────────► │ agent-memory-daemon │
│ (the reader) │ │ (MCP server) │ ~/.agent-memory/ │ (the writer) │
└──────────────┘ └────────────────────┘ └────────────────────────┘
Kiro reads. The MCP server gives it three tools: memory_read, memory_append_session, memory_search. That's it. Nothing fancy.
A background daemon writes. It watches the sessions directory, reads session summaries on a cadence, runs them through an LLM to extract durable facts, and updates markdown files in ~/.agent-memory/memory/.
They never talk directly. The filesystem is the contract. ~/.agent-memory/ is all they share.
Kiro burns zero tokens on memory management. The heavy lifting happens async, outside the chat.
Monday:
$ kiro-cli chat
> We use pnpm. Never suggest npm. Vitest not Jest. Main entry is src/cli.ts.
I prefer explicit return types.
[Kiro does work for 20 minutes]
> Great, call memory_append_session with a summary of what we agreed on.
Terminal closes. Life moves on.
Tuesday:
$ kiro-cli chat
[Kiro automatically calls memory_read per my steering rule]
> I see we use pnpm, Vitest (not Jest), src/cli.ts as the main entry, and
you prefer explicit return types. What are we working on today?
No re-explaining. No pasted summary. The AI just remembers.
Between sessions, the daemon woke up, read Monday's session file, extracted the durable facts, deduplicated them against what it already knew, and updated ~/.agent-memory/memory/project-preferences.md. I didn't lift a finger.
The daemon runs an LLM to do the extraction. LLMs cost money. I didn't want this tool to quietly drain my Bedrock bill.
So I added a Kiro backend. Instead of calling Bedrock or OpenAI, the daemon shells out to kiro-cli itself using your existing Kiro credits. Paired with a lean consolidation agent config (ships with the package), each extraction pass costs about 0.01 Kiro credits. Default agent would have been ~0.07. That 7× savings is the difference between "nice-to-have" and "forgot it was running."
You can still pick Bedrock or OpenAI if that's your stack.
npm install -g mcp-agent-memory
mcp-agent-memory --setup
The wizard walks you through picking a backend, registering with Kiro (and Claude Desktop and Cursor if you want), and installing the daemon as a LaunchAgent on macOS.
Add this one-line steering rule at ~/.kiro/steering/memory.md:
At the start of every session, call memory_read (no arguments) to load my
memory index. When you learn something durable about me, my projects, or
my preferences, call memory_append_session with a concise markdown summary.
Restart kiro-cli. That's it.
The memory isn't a black box. It's just markdown files in ~/.agent-memory/memory/:
$ ls ~/.agent-memory/memory/
MEMORY.md cli-architecture.md project-preferences.md team-processes.md
$ cat ~/.agent-memory/memory/project-preferences.md
# Project Preferences
- Package manager: pnpm (never npm)
- Testing framework: Vitest (not Jest)
- Main entry: src/cli.ts
- Return types: explicit, not inferred
...
$ grep -r "Vitest" ~/.agent-memory/memory/
project-preferences.md:- Testing framework: Vitest (not Jest)
cat works. grep works. git works. If you hate what it stored, delete the file.
If I were going to pay for a heavyweight memory system, I probably wouldn't have built this. But:
If that set of constraints sounds right for you, this is your tool. If you want the database-backed semantic-search dashboard experience, try totalrecallai — it's genuinely great at what it does.
If you try it and something breaks, file an issue. If you've got a pattern for what should be memorable vs. forgettable, drop it in the comments — that's the next hard problem I don't have a great answer for yet.
Tomorrow morning, Kiro will remember who I am. It doesn't feel that I'm unknown anymore.
2026-04-27 08:24:51
Almost 1 year ago, I joined dev.to and participated in my first challenge. At the time, it was the Amazon Q Developer "Quack The Code" Challenge.
I picked Crushing the Command Line as my prompt, and got to work.
I created Qmims—a command-line tool that automatically generates, updates, and refines READMEs and doc files for your projects.
I personally use Qmims so I felt the impact when it broke; Amazon Q is a thing of the past. Amazon rebranded and released it as Kiro.
Due to life/time constraints, I wasn't able to fix it.
But good news! I had some time over the weekend to finally dust off the old repos again. I migrated Qmims to work with Kiro-Cli instead of Amazon Q, and fully updated its docs to reflect the changes.
What's funny though is that I'm keeping the name. It's a battle scar at this point.
Give it a go, shoot me some feedback!
2026-04-27 08:22:57
Most people still think AI is about better answers. That phase is already behind us. What is emerging now is something fundamentally different: AI that reasons. Systems that do not just respond to prompts, but break problems into steps, explore alternatives, take actions, and refine decisions over time.
Chain-of-Thought
At its core, chain-of-thought reasoning is straightforward: instead of jumping straight to an answer, the model walks through the problem one step at a time. Research has shown that explicitly prompting models to reason this way dramatically improves accuracy on complex tasks.
In enterprise terms, this is the difference between a system that guesses and one that behaves like a junior analyst. It shows its work, exposes its assumptions, and makes every step auditable.
Example Prompt
Role: Senior Financial Analyst
Goal: Evaluate profitability trend
Process:
1. Calculate revenue growth %
2. Calculate cost growth %
3. Compute margin change
4. Interpret trend
Output: Step-by-step reasoning, then a 2-line conclusion.
Data: Revenue: 2M → 3M | Costs: 1.2M → 1.8M
Tree-of-Thought
Real-world decisions rarely have one path. Tree-of-thought reasoning lets AI explore multiple approaches, evaluate each one, and then converge on the best option. This is how architects think when weighing design options. AI can now simulate that same process, systematically and at scale.
Instead of committing to the first plausible answer, the model generates and scores competing strategies before recommending one.
Example Prompt
Role: Enterprise Architect
Goal: Recommend migration strategy
Process:
1. Generate 3 approaches
2. Score each on: complexity, risk, time-to-value
3. Recommend best option with justification
Output: Comparison table + final recommendation
ReAct Reasoning
This is where AI stops being passive. In ReAct, the system reasons about a problem, takes a concrete action like querying logs or calling an API, observes what it finds, and keeps iterating until it reaches a confident answer.
This is the foundation of truly agentic systems. Not ones that suggest what to do, but ones that actually do the work.
Example Prompt
Role: AI DevOps Engineer
Goal: Identify root cause of latency spike
Loop:
1. Think: list possible causes
2. Act: query logs or metrics
3. Observe: analyze what you find
4. Refine: update your hypothesis
5. Repeat until confident
Output: Root cause + evidence + recommended fix
Self-Reflection
One of the biggest reliability breakthroughs in recent AI research comes from a simple idea: make the model critique itself. Instead of trusting the first answer, the system generates an output, reviews it critically, identifies weaknesses, and then rewrites.
This is how you meaningfully reduce hallucination in production systems. A second pass is not a luxury. It is the mechanism.
Example Prompt
Role: Compliance Analyst
Goal: Identify risks in contract
Process:
1. Generate initial risk analysis
2. Critique: what risks are missing? Where is reasoning weak?
3. Improve based on your critique
4. Produce the final version
Focus: Legal and regulatory risk only
Retrieval-Augmented Reasoning
In enterprise environments, reasoning without data is useless. Retrieval-augmented generation ensures the model retrieves relevant documents first, then reasons over them rather than relying on general training knowledge.
This is how you move from "AI guesses" to "AI grounded in facts the organization actually holds."
Example Prompt
Role: Enterprise Knowledge Assistant
Goal: Answer policy question
Constraints:
- Use only the retrieved documents
- If not found, say "Not found in our records"
- Do not infer beyond the given context
Output: Answer with source references
Multi-Agent Reasoning
Instead of one model doing everything, multiple specialized agents collaborate, each with a defined role. Research shows this improves performance significantly on complex, multi-step workflows.
This is where the future team structure starts to change. The question is not whether AI will work alongside humans, but how that coordination gets designed.
Example Prompt
System: Multi-Agent Workflow
Planner: Break goal into tasks
Research: Gather technical and business inputs
Validator: Check feasibility, risks, compliance
Executor: Produce final architecture design
Goal: Design a scalable payment processing platform
Goal-Oriented Planning
The most powerful form of AI reasoning begins with a goal and works backward. The system decomposes objectives into phases, maps out tasks and dependencies, identifies risks, and produces an execution plan.
This is where AI starts operating less like a tool and more like a program manager. Not just answering questions, but figuring out what needs to happen and in what order.
Example Prompt
Role: AI Program Manager
Goal: Launch AI-powered customer support system
Process:
1. Break goal into phases
2. Break phases into tasks
3. Identify dependencies
4. Flag risks
5. Create timeline
Output: Phased roadmap, task breakdown, risk register
We are no longer building systems that execute instructions. We are designing systems that reason about problems. And once systems start reasoning, they do not just support your teams. They start replacing parts of how those teams operate.
Satish Gopinathan is an AI Strategist, Enterprise Architect, and the voice behind The Pragmatic Architect. Read more at eagleeyethinker.com or Subscribe on LinkedIn.
ArtificialIntelligence, AI, GenerativeAI, AgenticAI, AIReasoning, EnterpriseArchitecture, DigitalTransformation, FutureOfWork, AIAgents, LLMOps, Innovation