MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

The Difference Between a Business Plan and a Business by Igor Fishelev

2026-03-30 22:14:05


A business plan is a description of a world that does not exist yet. That is not a criticism – it is simply what a plan is. The trouble begins when people forget the distinction and start managing the document instead of the reality.
The controlled environment problem
Plans are built on assumptions. Customers will behave as the research suggests. Costs will land within the projected range. The right people will be hired on schedule. Competitors will do roughly what they have always done. Each of these assumptions is defensible on its own. Together, they describe conditions that are cleaner and more cooperative than any actual market has ever been.
The real business begins the moment something the plan did not account for appears – which tends to happen earlier than expected and more often than projected. A hire falls through. A customer segment that looked promising turns out to be less interested than the interviews suggested. A cost that was treated as fixed turns out not to be. Individually, none of these things are fatal. But they create a growing distance between the version of the business on paper and the one that is actually operating, and that distance rarely closes on its own.
Protecting the plan instead of the business
When reality and the plan diverge, the temptation is to protect the plan. To treat the gap as temporary, to adjust the numbers without revisiting the assumptions behind them, to keep presenting the original story to investors and partners while managing a different situation in practice. This feels like discipline. It is usually the opposite – a way of delaying the moment when the real problem gets addressed, in favour of preserving a narrative that is becoming less accurate over time.
The organisations that handle difficulty well are almost always the ones where someone was willing to say clearly, and early enough to matter, that the situation had changed and the approach needed to change with it. That sounds straightforward. In practice, when credibility has been staked on a particular version of the story, it requires a degree of honesty that is genuinely uncomfortable.
What a plan cannot do
A detailed, well-constructed plan can give the impression of competence without requiring the thing competence actually consists of – the ability to make reasonable decisions in conditions you did not foresee, with incomplete information and real consequences attached. Following a plan when it is working is not particularly difficult. The test comes when it stops working and something has to be figured out without a template.
This is why the way a leadership team reasons through an unfamiliar problem tells you considerably more than their financial projections do. Projections can be built to support almost any conclusion. Judgment under pressure is harder to construct and harder to fake.
What planning is actually good for
A serious plan, used honestly, gives you a baseline – something to measure against as the situation develops, a record of the assumptions you were making at the start that can be revisited when reality turns out differently. It creates a shared language inside an organisation for talking about direction and priorities. It forces decisions to be made explicitly rather than left vague until they become urgent.
But it is a starting point, not a substitute for the thing itself. The business is not the plan. The business is what happens when the plan meets conditions it was not designed for – and how the people running it respond when that happens. That part cannot be written in advance. It can only be navigated, and the quality of the navigation is what ultimately determines whether the business works or not.

Backend for Backend (BFB) Architecture Explained: The Missing Layer in Modern Systems

2026-03-30 22:13:17

The definition of Backend for Backend (BFB):

Backend for Backend (BFB) is a dedicated layer added to your system to bring together all the data related to a specific domain. Instead of letting every microservice call multiple services directly and deal with scattered information, the BFB layer gathers that data, organizes it, and exposes it through one clean backend endpoint.

It’s basically a backend that serves other backends, not the frontend. This makes it the opposite of BFF, which exists to shape data for UI or mobile apps. BFB sits in the middle of your backend ecosystem, letting different services pull reliable, consistent data from one place instead of depending on multiple service-to-service calls.

This approach is especially helpful when several microservices need the same kind of data, when the data is sensitive, or when you want to reduce complexity and avoid duplicating logic across your backend layers.

When Should You Use a Backend for Backend (BFB)?

You typically introduce a BFB layer when your system reaches a level of complexity where direct service-to-service communication becomes inefficient or hard to manage. Some of the most common cases include:

1. When a domain’s data is spread across multiple microservices and you need a complete, consolidated view of that domain.
If your workflow requires pulling a large set of related information from different sources — such as all customer-related data, or everything tied to a specific user — centralizing this logic in a BFB layer ensures accuracy, consistency, and easier consumption.

2. When the data you’re retrieving must be delivered together as a single, unified response.
In situations where the data points are tightly coupled and must arrive as one atomic dataset, the BFB layer helps you enforce a single point of coordination. This reduces fragility, prevents partial responses, and helps avoid creating multiple points of failure across your microservices.

3. When multiple backend services rely on the same domain-specific dataset.
If several services repeatedly need the same information, a BFB layer becomes the ideal shared source of truth. It eliminates duplicated logic, reduces network overhead, and prevents dependency spaghetti caused by services calling each other directly.

Benefits of Implementing a Backend for Backend (BFB)

Adopting a BFB layer introduces several architectural advantages that can significantly improve the structure and performance of a distributed backend system:

  1. Centralized failure handling (Single Point of Failure by design).
    By aggregating all domain-related data in one place, the system either returns a complete dataset or fails entirely — preventing partial or inconsistent responses that would otherwise result from multiple independent service calls.

  2. Unified caching strategy instead of scattered microservice-level caches.
    The BFB layer allows you to implement caching once, in a controlled and optimized location, rather than duplicating caching logic across numerous microservices. This reduces memory usage, code complexity, and cache synchronization issues.

  3. Ability to use database views or high-performance storage for aggregated queries.
    Because the BFB layer handles all domain-specific data retrieval, you can optimize it with database views, pre-computed aggregations, or high-speed data stores like MongoDB. This can dramatically improve read performance and reduce latency.

Write on Medium

  1. Controlled reshaping and transformation of outgoing data.
    The BFB layer can format, restructure, or enrich data before exposing it to other backend services, ensuring each service receives clean, consistent, and domain-aware output.

  2. Simplified development by centralizing domain data behind a clear, well-defined boundary.
    Developers no longer need to piece together information from multiple microservices. Instead, they rely on one well-named, well-structured backend source — greatly reducing complexity and improving productivity.

  3. Safe versioning for backward compatibility across backend services.
    A BFB layer can expose multiple versions of the same endpoint, ensuring that changes in data structure do not break dependent services. This allows smooth migrations, gradual rollouts, and long-term stability.

Example of Applying the Backend for Backend (BFB) Pattern

The diagram above illustrates a practical scenario where the BFB pattern becomes extremely useful. In this case, several microservices expose different pieces of customer-related information — such as favorites, reviews, and orders. At the same time, multiple other backend services require a complete, unified view of the customer domain to function correctly.

Instead of forcing each service to call multiple customer-related microservices individually, we introduce a dedicated customer-bfb service. This BFB layer aggregates all customer data from the underlying services and exposes it through a single, consistent interface.

To enhance performance, the customer-bfb service is backed by a high-speed storage solution like MongoDB for optimized reads and Redis for caching frequently accessed data. This setup significantly reduces latency, minimizes cross-service communication, and ensures that all backend consumers receive synchronized customer information as a single dataset.

When Should You Avoid Using the Backend for Backend (BFB) Pattern?

Although BFB offers strong benefits in systems that require aggregated domain data, there are cases where its use becomes unnecessary — or even counterproductive:

1. When the data does not need to be delivered as a single, unified dataset.
If the services consuming the data do not require all domain information together — for example, if customer data can be processed without needing reviews or favorites — then forcing aggregation may introduce more failures. In these cases, the BFB layer can become a bottleneck because a partial data issue in one service may lead to a complete failure in the aggregated response.

2. When a single point of failure is not acceptable for the workflow.
Some systems require resilience and independent data flows. If your architecture cannot tolerate a centralized component that controls the success or failure of combined data retrieval, a BFB layer would contradict the system’s reliability goals.

3. When you do not need full domain-level data consolidation.
If your services only require small, specific pieces of information — rather than a holistic view of the domain — adding a BFB layer adds complexity without real value. In such cases, simpler integration patterns or direct service-to-service calls are more efficient and easier to maintain.

Information About the Founder of Backend for Backend (BFB)

Abdelkadem AbuGhazaleh, a technology researcher and software engineer, is the founder of the Backend for Backend (BFB) architectural concept. He graduated from The Hashemite University in the Hashemite Kingdom of Jordan and has built a strong career in backend engineering and distributed systems design.

AbuGhazaleh is also the founder and lead instructor of Java Mastery Academy, where he teaches advanced Java and backend technologies. His initial research on the BFB pattern was completed in August 2025, and the first official publication of this work was released in December 2025, marking the introduction of the BFB architecture to the broader software engineering community.

I spent months trying to stop LLM hallucinations. Prompt engineering wasn't enough. So I wrote a graph engine in Rust.

2026-03-30 22:12:54

I started this project after reading about AIRIS, a cognitive agent from SingularityNET that learns by interacting with a Minecraft world. Not because I cared about Minecraft — but because of the principle: an AI that learns by doing, in a way you can actually observe and trace.

That got me thinking. If an agent can learn from a simulated physical environment, could you do something similar in text? Could you build a system that builds knowledge through direct interaction with users, step by step, and where every piece of that knowledge is inspectable?

I tried. And I failed. Several times.

The purity trap

My first attempt was absurdly ambitious. I wanted to build everything from scratch — zero external libraries, zero implicit behavior, zero randomness. Every component had to be fully deterministic and transparent. No shortcuts.

It sounds principled. In practice, it was a dead end. I couldn't use any library that had opaque internals or non-deterministic behavior, which meant rewriting basic infrastructure from nothing. The project got slow, fragile, and impossible to maintain. Conceptual purity was killing the actual product.

So I stepped back and asked a different question: the problem isn't external code — it's what kind of external code. I started allowing dependencies again, but only ones that are deterministic, have no implicit intelligence, and behave predictably. That was the first real turning point.

The second problem: architecture

Even after loosening the dependency rules, the project kept growing in the wrong direction. Too many components, unclear responsibilities, a fragmented codebase that was getting harder to reason about with every commit.

At some point I realized I was building the wrong thing. I was trying to make Kremis generate answers. But the actual problem was never generation — LLMs are already good at that. The problem was verification.

That's when the architecture flipped. Kremis became a sidecar: it doesn't produce responses, it validates them. It sits next to an LLM and checks whether what the model says is actually grounded in real data. The separation is strict — probabilistic inference on one side, deterministic logic on the other.

That restructuring is what made everything click.

What Kremis actually does

Kremis is a graph store written in Rust. You feed it structured data — entity, attribute, value triples — and it builds a deterministic graph. When you query it, you get back exactly what's in the graph. Nothing invented, nothing inferred.

The core engine has no randomness, no floating-point arithmetic, no pre-loaded knowledge. Same input, same output, every time. That constraint is what makes everything else trustworthy.

Quick example

Say Kremis is running locally. You ingest some facts:

curl -X POST http://localhost:8080/signals \
     -H "Content-Type: application/json" \
     -d '{
       "signals": [
         {"entity_id": 1, "attribute": "name", "value": "Alice"},
         {"entity_id": 1, "attribute": "role", "value": "engineer"},
         {"entity_id": 1, "attribute": "works_on", "value": "Kremis"},
         {"entity_id": 1, "attribute": "knows", "value": "Bob"},
         {"entity_id": 2, "attribute": "name", "value": "Bob"},
         {"entity_id": 2, "attribute": "role", "value": "designer"},
         {"entity_id": 2, "attribute": "works_on", "value": "Kremis"},
         {"entity_id": 3, "attribute": "name", "value": "Kremis"},
         {"entity_id": 3, "attribute": "type", "value": "project"}
       ]
     }'

Now an LLM generates six claims about Alice. Kremis checks each one:

[FACT]          Alice is an engineer.
[FACT]          Alice works on the Kremis project.
[FACT]          Alice knows Bob.
[NOT IN GRAPH]  Alice holds a PhD in machine learning from MIT.
[NOT IN GRAPH]  Alice previously worked at DeepMind as a research lead.
[NOT IN GRAPH]  Alice manages a cross-functional team of 8 people.

Three grounded. Three fabricated. No "87% confidence" — just a binary answer.

Validation works by looking up the entity node, fetching its properties, and comparing against the claims. The repo includes a demo script that runs this whole flow — Python, standard library only. Pass --ollama to use a local model instead of mock claims.

Why not just a SQL table?

I considered it. But I didn't want to write a new query for every possible claim an LLM might generate. A graph gives you relationship traversal without that overhead.

That matters when the question isn't "what's Alice's role?" but "does Alice know someone who works on project X?" or "what connects these two entities?" Those are graph questions.

The data model is EAV (Entity, Attribute, Value). Signals attach properties to entity nodes, ordered ingestion creates edges from co-occurrence. You get a connected structure you can query for properties, traversals, paths, intersections, and related context.

MCP integration

Kremis ships with an MCP server. If you use Claude Desktop, Cursor, or anything that speaks Model Context Protocol, you can point it at a running Kremis instance and the assistant queries the graph directly.

{
  "mcpServers": {
    "kremis": {
      "command": "/path/to/kremis-mcp",
      "env": {
        "KREMIS_URL": "http://localhost:8080",
        "KREMIS_API_KEY": "your-key-here"
      }
    }
  }
}

No API auth? Omit KREMIS_API_KEY.

The assistant gets 9 tools — ingest, lookup, traverse, path, intersect, status, properties, retract, hash. Instead of hallucinating about your data, it can just look it up.

What about RAG and vector DBs?

I tried the usual stack before building this. System prompts, careful prompt engineering, vector databases. None of it solved the core issue: retrieval can be accurate and the model still invents details that aren't there.

Vector DBs answer "find me documents similar to this query." That's useful for retrieval. But Kremis answers a different question: "is this specific fact in my data, yes or no?" Those are two different problems, and I got tired of pretending they're the same one.

Confidence scores didn't help either. An "87% confidence" doesn't tell me whether Alice has a PhD or not. I wanted a binary answer, and that's what Kremis gives.

What this is not

Kremis doesn't "understand" anything. The name means "cognitive substrate", but the system is much simpler than that sounds. It stores structure from signals it has processed. No intelligence. No reasoning. Just a graph.

It's also alpha software — currently v0.17.4. The API works, but I'm still making breaking changes before v1.0. Pin your version.

Architecture

Three members in one Rust workspace:

  • kremis-core — pure library, no async, no network, no side effects. The graph engine. Every function is deterministic.
  • kremis — CLI and HTTP API (axum). The binary that runs the server.
  • kremis-mcp — MCP bridge over stdio.

Persistence is either in-memory or via redb for ACID transactions and crash safety. There's also a Docker image. Apache 2.0.

Try it

git clone https://github.com/TyKolt/kremis.git
cd kremis
cargo build --release
cargo run -p kremis -- init
cargo run -p kremis -- ingest -f examples/sample_signals.json -t json

Then in another terminal:

cargo run -p kremis -- server

Then run the demo:

python examples/demo_honesty.py

Or, if you want to use a local model through Ollama:

python examples/demo_honesty.py --ollama

The repo is at github.com/TyKolt/kremis. Full docs at kremis.mintlify.app.

RAG handles retrieval. Kremis handles verification. I spent months conflating the two before I realized they need separate tools.

Disclosure: An initial draft of this article was generated with AI assistance. The technical content, architecture decisions, project history, and opinions are entirely my own. All code examples are from the actual repository.

I built an AI bookkeeping agent that reached the AWS semifinals from 10,000+ entries

2026-03-30 22:06:31

Table Of Contents

  • The architecture
  • The categorisation engine
  • Few-shot learning that actually improves over time
  • Handling real-world bank statements
  • Batched processing with concurrency control
  • Double-entry done right
  • What I learned
  • The numbers

Every month, I sit down with bank statements from multiple clients and manually assign each transaction to the correct nominal code — a process called transaction categorisation.

It takes hours. There are 166 standard UK nominal codes, five VAT rate categories, and endless edge cases. "AMAZON MARKETPLACE" could be office supplies, stock purchases, or a personal expense depending on the client. Multiply that across hundreds of transactions per client, per month, and you start to understand why 75% of CPAs are expected to retire in the next decade with fewer graduates replacing them.

So I built LedgerAgent - an AI-powered bookkeeping agent that categorises bank transactions automatically using Amazon Bedrock. It reached the semifinals of the AWS 10,000 AIdeas competition (top ~1,000 from over 10,000 entries) in the EMEA Commercial Solutions category.

Here it is in action:

Here's how it works under the hood.

The architecture


Stack: React 19 + Express + 8 AWS services (Bedrock, DynamoDB, S3, SQS, Lambda, API Gateway, EventBridge, Cognito)

LedgerAgent uses 8 AWS services working together:

Browser (React 19)
    │
    ├── Cognito JWT auth
    │
Express Server (port 3001)
    │
    ├── Amazon Bedrock ──── Claude 3.5 Haiku (categorisation)
    │                       Claude 3.5 Sonnet (receipt OCR)
    ├── DynamoDB ─────────── Client vault (transactions, learned patterns)
    ├── S3 ───────────────── File storage (uploads, receipts, backups)
    ├── SQS ──────────────── Async job queue (large batches)
    │     │
    │     └── Lambda ─────── Serverless batch processor
    │
    ├── API Gateway ──────── REST endpoint for job status
    └── EventBridge ──────── Daily DynamoDB → S3 backup

The frontend is React 19 with Vite and Tailwind. The backend is Express running on Node.js 20. All AI inference runs through Amazon Bedrock - Claude 3.5 Haiku for transaction categorisation (fast and cheap) and Claude 3.5 Sonnet for receipt OCR (multimodal image understanding).

The key design decision was using DynamoDB as a persistent "vault" for each client. Every accounting practice manages multiple clients, and each client has their own transaction history, confirmed categorisations, and learned patterns. DynamoDB's pay-per-request billing made this economical - so I'm not paying for idle capacity between categorisation runs.

The categorisation engine

The core of LedgerAgent is the chartOfAccounts.mjs service. It loads two data files at startup:

  • nominal_codes.json — 166 UK standard accounting codes (from 1001 Fixed Assets through to 9999 Suspense)
  • global_rules.json — 365 vendor-to-category mapping rules built from my experience coding thousands of real transactions

The system prompt establishes a UK bookkeeper persona with the full code reference. When a transaction comes in, the buildUserMessage function constructs the prompt:

// Conceptual flow — simplified
function buildUserMessage(transaction, confirmedExamples) {
  // 1. Transaction details (date, description, amount)
  // 2. Any previously confirmed categorisations for this client
  //    injected as few-shot examples
  // 3. Request structured JSON response with
  //    account_code, account_name, confidence, reasoning

  // The full prompt includes the 166 UK nominal codes
  // and 365 vendor-to-category rules as system context
}

Bedrock returns a structured JSON response with the nominal code, account name, a confidence level (high, medium, or low), and a reasoning string explaining the decision. The confidence scoring was essential - it tells me which transactions I can trust and which need manual review.

Few-shot learning that actually improves over time

This is the part I'm most proud of. When I review a categorisation and confirm it's correct (or manually correct it), that decision gets saved to the client's confirmedExamples array in DynamoDB:

// Conceptual flow — the key insight is per-client learning
// When a user confirms "AMAZON MARKETPLACE → 7502 Stationery",
// that decision is stored against the client in DynamoDB.
//
// Next time we categorise for that client, confirmed examples
// are injected into the prompt as few-shot context.
//
// Max 50 examples per client, deduplicated by description.
// This means a retail client and a tech consultancy categorise
// the same vendor differently — because their confirmed
// examples are different.

The next time I categorise transactions for that same client, those confirmed examples are injected into the Bedrock prompt as few-shot context. The model sees: "Last time you saw AMAZON MARKETPLACE for this client, it was coded to 7502 Stationery & Printing."

This creates a per-client learning loop. A retail client's Amazon purchases get categorised differently from a tech consultancy's Amazon purchases - because the confirmed examples are client-specific. After confirming 20-30 transactions, accuracy jumps noticeably because the model has real context about how this particular business operates.

Handling real-world bank statements

UK bank CSVs are a mess. Every bank uses different column names, different date formats, and different ways of representing debits and credits. The csvParser.mjs service handles this with intelligent column detection:

// Simplified from csvParser.mjs
function detectColumns(headers) {
  const map = {};
  headers.forEach((h, i) => {
    const lower = h.toLowerCase().trim();
    if (/date|trans.*date|posted|value.*date/.test(lower)) map.date = i;
    if (/description|narrative|details|memo|payee/.test(lower)) map.desc = i;
    if (/^amount$|^value$|^sum$|^total$/.test(lower)) map.amount = i;
    if (/debit|dr|money.*out|paid.*out/.test(lower)) map.debit = i;
    if (/credit|cr|money.*in|paid.*in/.test(lower)) map.credit = i;
  });
  return map;
}

It handles three different amount formats: a single amount column (negative for debits), separate debit and credit columns, and amounts with pound signs and comma formatting. This means I can upload a Lloyds statement, a Barclays statement, and an HSBC statement without any manual configuration.

Batched processing with concurrency control

For large bank statements (100+ transactions), hitting Bedrock sequentially would take minutes. LedgerAgent uses a parallel worker pool with concurrency of 3:

// Conceptual flow — concurrency-controlled batch processing
// Transactions are processed in parallel chunks (concurrency of 3)
// to balance speed against Bedrock rate limits.
//
// For batches over 100 transactions, the async pipeline kicks in:
// Express → SQS queue → Lambda picks up job → Bedrock AI → DynamoDB
// Frontend polls API Gateway for completion status.

For even larger batches, the async pipeline kicks in - transactions get sent to SQS, picked up by a Lambda function, processed against Bedrock, and results are written back to DynamoDB. The frontend polls for completion via API Gateway.

Double-entry done right

One thing that surprised me during development: most "AI bookkeeping" demos I've seen online produce a single-entry list of categorised transactions. That's not bookkeeping - it's just labelling. Real bookkeeping requires double-entry, where every transaction creates two ledger entries that must balance.

In LedgerAgent, the bank account (nominal code 1200) acts as the contra account for every transaction:

Transaction type Bank account (1200) Categorised account
Money out Credit Debit
Money in Debit Credit

The trial balance splits automatically at the code 4000 boundary - codes below 4000 go on the Balance Sheet (assets, liabilities, equity), codes 4000 and above go on the Profit & Loss (income, expenses). Total debits must always equal total credits.

This sounds basic to anyone with accounting training, but getting an AI system to consistently produce balanced double-entry output required careful prompt engineering and validation logic.

What I learned


Key takeaways:
  • Domain knowledge is the moat - not the AI wrapper
  • Few-shot learning beats fine-tuning when per-client variation is high
  • Confidence scoring changes the entire review workflow

Domain knowledge is the moat. The 166 nominal codes, 365 vendor rules, VAT rate handling, and double-entry logic aren't things you can prompt-engineer from scratch. They come from years of sitting with bank statements. Any developer can connect to Bedrock — few can tell you that a Deliveroo transaction for a sole trader should be coded to 7901 (Staff Welfare) not 7400 (Travel & Subsistence) unless it was a client entertainment expense, in which case it's 7601 (Entertaining).

Few-shot learning beats fine-tuning for this use case. I considered fine-tuning a model on accounting data, but the per-client variation is too high. A retail business and a tech consultancy categorise the same vendors completely differently. Dynamic few-shot context from confirmed examples handles this naturally.

Confidence scoring changes the workflow. Without confidence scores, you'd have to review every single categorisation. With them, I can filter to "low confidence" transactions and review only the 10-15% that genuinely need human judgement. The rest can be confirmed in bulk.

The numbers

  • 166 UK nominal codes mapped
  • 365 vendor-to-category rules
  • 5,860 lines of code across 39 source files
  • 8 AWS services integrated
  • Top ~1,000 from 10,000+ entries in AWS AIdeas

LedgerAgent is currently a tool I use for my own practice, but I'm planning to open it up to other small accountancy firms. If you're an accountant drowning in manual transaction categorisation, or a developer building fintech tools, I'd like to hear from you.

Connect with me on X/Twitter to discuss AI in Fintech!

If you're interested in the code or want to connect, check out the repository and my profile:

Check out my GitHub Profile

Built with React 19, Express, Amazon Bedrock (Claude 3.5 Haiku + Sonnet), DynamoDB, S3, SQS, Lambda, API Gateway, EventBridge, and Cognito.

I Designed a Memory System for Claude Code — 'Forgetting' Was the Hardest Part

2026-03-30 22:05:30

Everyone talks about making AI remember things. Handoff prompts. System instructions. Memory files. The implicit assumption is always the same: the problem is that AI forgets, so the solution is to make it remember more.

After weeks of using Claude Code on a moderately complex project, I discovered this framing is backwards. The hardest part of AI memory management isn't remembering — it's forgetting.

Where Handoff Prompts Break Down

Handoff prompts work well for short-term continuity. Summarize the previous session, paste it into a new one, keep going. Simple and effective.

The cracks appear when a project runs for weeks. Here's what actually happened.

On day 3, I told Claude: "Use approach A." On day 4: "Actually, switch to approach B." The handoff prompt captures the latest state — B. So far, correct.

A week later, a new context made approach A relevant again. But by then, A had been overwritten in the handoff summary. Not just the instruction — the reasoning behind why A existed in the first place was gone. A handoff prompt preserves the snapshot, not the history.

The second failure mode is bloat. If you stuff every past decision and instruction into the handoff, it grows unbounded. Old directives that you've mentally revoked sit alongside current ones. You know the old directive is dead because you issued a newer one. The AI doesn't — it treats both as equally valid.

Human Memory and AI Memory Are Structurally Inverted

The root cause clicked when I realized human and AI memory operate on opposite principles.

Humans are recency-biased. What you said three months ago is background noise. What you said yesterday is your current position. This is the recency effect from serial position theory in cognitive psychology — a universal human trait, not a bug.

AI has no recency bias. Line 1 of CLAUDE.md carries the same weight as line 100. A feedback note from last week is weighted identically to one from today. There is no temporal decay, no "that was then, this is now" mechanism.

This asymmetry causes real problems in long-running projects. I accumulated writing style feedback over time: "Write more politely" (week 1) and "Too polite, be more direct" (week 2). Both instructions stayed active in memory. Claude alternated between them based on context, but its selection criteria didn't match mine. To me, "be direct" was obviously the current rule. To Claude, both rules were simultaneously valid.

Humans run their memory on a forgetting-first architecture. AI runs on a forgetting-never architecture. Handoff prompts don't bridge this gap.

A 3-Layer Memory Architecture

Claude Code has an auto-memory feature that generates memory files from conversations and loads them into subsequent sessions. Used naively, it accumulates files indefinitely with equal weighting — reproducing the exact problem above.

I restructured memory into three layers.

Layer 1: Active (Drives Behavior)

Only memories indexed in MEMORY.md belong to this layer. Claude Code automatically loads MEMORY.md at session start, so Layer 1 entries are the only ones that actively shape behavior.

# Memory Index (Active Only)
> Archive protocol: Archived memories → MEMORY_ARCHIVED.md (reference only)

- [feedback_writing_style.md](feedback_writing_style.md) — Write directly, avoid AI-typical phrasing
- [feedback_article_focus.md](feedback_article_focus.md) — One topic per article, no scope creep
- [user_profile.md](user_profile.md) — Hardware × AI specialist, bilingual

Target: 20–40 entries. Beyond that, the bloat problem described below kicks in. This layer defines who the AI is right now.

Layer 2: Archived (Evidence, Not Instruction)

Memories that were once active but should no longer guide behavior. Managed in a separate MEMORY_ARCHIVED.md.

---
name: old_writing_policy
status: archived
archived_at: 2026-03-30
archived_reason: Superseded by direct-style policy. Kept as evidence.
---

The key: files are never deleted. The reasoning behind past decisions has future value. What gets removed is the decision-making authority, not the record.

Layer 3: Vectorized Reference (Search-Only)

When memories exceed ~50 files, the curation cost of Layer 1 rises. The plan (still in progress) is to vectorize older memories into a vector DB like ChromaDB for semantic search.

The critical design constraint: Layer 3 data must never drive decisions. It's reference material, not instruction.

Why? Cosine similarity returns content-similar results, but it can't distinguish "this memory is still valid" from "this memory was revoked two weeks ago." Even with status: archived metadata in the vector store, ranking algorithms don't reliably surface metadata alongside semantic similarity.

The risk is that the AI retrieves an archived instruction from Layer 3 and treats it as current guidance. So Layer 3 is explicitly "read-only for context" — never "act on this."

Why "Forgetting" Is the Hardest Design Problem

The most time-consuming part of this architecture was the Layer 1 → Layer 2 transition: the act of making the AI forget.

Deletion achieves forgetting, but destroys the audit trail. A month later, you want to know what the previous policy was and why it existed. It's gone.

Setting active: false in the same file is dangerous. The AI reads the full file content. Even with a flag saying "ignore this," the instruction text enters the context window. Once it's in context, it exerts influence — flags notwithstanding.

The solution was physical separation from the context window. Remove the entry from MEMORY.md. Move it to MEMORY_ARCHIVED.md. Claude Code only auto-loads MEMORY.md, so archived files are invisible at session start. The archived content can still be accessed on demand, but it doesn't passively enter the context.

The human bottleneck is the same as with email folders: "I might still need this" prevents migration. Layer 1 bloats, contradictory instructions accumulate, and you're back to the original problem.

The heuristic that worked: archive when the intent is covered by a newer instruction. Not "might I need this again?" but "is this instruction's purpose served by something currently active?" If yes, archive. If no, keep.

What This Revealed About the Human Side

The biggest shift in my thinking: AI memory management is fundamentally a human design problem, not an AI capability problem.

Handoff prompts ask the AI to "remember." Memory files tell the AI to "reference this." Both approaches treat AI memory as the variable to optimize.

What actually needed to happen was for the human to design the memory structure. What gets remembered, what gets forgotten, what gets preserved as evidence — these are decisions only the human can make. The AI can't determine instruction priority because priority lives in the human's head.

Metacognition — the ability to monitor and control your own cognitive processes — transfers directly to AI memory management. You need to maintain awareness of what you've told the AI, which of those instructions are still valid, and how they interact.

It's overhead. But it's overhead that pays for itself. "Why is Claude behaving differently than last time?" and "I've told it this three times already" — these symptoms disappear when the memory structure is intentionally designed rather than accumulated by default.

Implementation Template

Here's the directory structure I'm using with Claude Code. The concept applies to other tools, though the implementation details differ.

project_root/
├── CLAUDE.md              # Project instructions (near-immutable)
└── .claude/
    └── memory/
        ├── MEMORY.md              # Layer 1 index
        ├── MEMORY_ARCHIVED.md     # Layer 2 index
        ├── feedback_*.md          # User feedback memories
        ├── user_*.md              # User profile memories
        └── project_*.md           # Project state memories

Memory file frontmatter:

---
name: writing_style
description: Writing directive — direct style, no AI phrasing
type: feedback
---

Archive additions:

---
name: old_policy
status: archived
archived_at: 2026-03-30
archived_reason: Superseded by new policy. Retained as evidence.
---

Keep MEMORY.md entries under 150 characters each. The index is a pointer, not the content — details go in individual files.

What's Next

Layer 3 vectorization is still a design concept. The automation I want: memories not referenced for N sessions get flagged as archive candidates. That would cut the human curation cost significantly.

The other missing piece: contradiction detection. When adding a new memory, check it against existing Layer 1 entries for conflicts. Two instructions that contradict each other make AI behavior unpredictable — catching that at write time would prevent a class of problems I've hit repeatedly.

Both of these are solvable engineering problems. The harder part — recognizing that AI memory is a design problem, not a scaling problem — is the insight that changed how I work with Claude Code.

Characterization of a plane in the space

2026-03-30 22:04:13

α\alphaα is a plane in the space and we've been told in school that its equation in canonical form is:

ax+by+cz+d=0 ax+by+cz+d=0 ax+by+cz+d=0

But this doesn't tell us nothing about the plane characteristics and no-one explained us how this equation is formed.

Plane formation

A plane in the space could be identified by a point and 2 non-parallel vectors.

P=(x0,y0,z0)∈αu‾,v‾∈α P=(x_0,y_0,z_0)\in \alpha \\ \\ \underline{u},\underline{v}\in \alpha P=(x0,y0,z0)αu,vα

u‾,v‾\underline{u},\underline{v}u,v have magnitude, direction, and orientation. It's possible to write:

u‾=PQv‾=PR \underline{u}=PQ \\ \underline{v}=PR u=PQv=PR

Every other point XXX that belongs to α\alphaα could be written as a linear combination of these elements starting from the origin:

OX=OP+s∗u‾+t∗v‾X=(x,y,z)∈αs∈Rt∈R OX=OP+s*\underline{u}+t*\underline{v} \\ X=(x,y,z)\in\alpha \\ s\in R \\ t\in R OX=OP+su+tvX=(x,y,z)αsRtR

This clarifies why it is possible to say:

A plane in the space could be identified by a point and 2 non-parallel vectors.

Parametric equations

XXX can be described by its parametric equations:

x=x0+m1∗s+m2∗ty=y0+n1∗s+n2∗tz=z0+p1∗s+p2∗tu‾=(m1,n1,p1)v‾=(m2,n2,p2) x=x_0+m_1*s+m_2*t \\ y=y_0+n_1*s+n_2*t \\ z=z_0+p_1*s+p_2*t \\ \underline{u}=(m_1,n_1,p_1) \\ \underline{v}=(m_2,n_2,p_2) x=x0+m1s+m2ty=y0+n1s+n2tz=z0+p1s+p2tu=(m1,n1,p1)v=(m2,n2,p2)

The goal is to solve this system over sss and ttt reducing it to a linear equation that is satisfied for the points belongings to the plane.

A bit of math

t=x−x0−m1∗sm2t=y−y0−n1∗sn2t=z−z0−p1∗sp2 t=\frac{x-x_0-m_1*s}{m_2} \\ t=\frac{y-y_0-n_1*s}{n_2} \\ t=\frac{z-z_0-p_1*s}{p_2} t=m2xx0m1st=n2yy0n1st=p2zz0p1s

Equal over ttt :

x−x0−m1∗sm2=y−y0−n1∗sn2x−x0−m1∗sm2=z−z0−p1∗sp2 \frac{x-x_0-m_1*s}{m_2}=\frac{y-y_0-n_1*s}{n_2} \\ \frac{x-x_0-m_1*s}{m_2}=\frac{z-z_0-p_1*s}{p_2} m2xx0m1s=n2yy0n1sm2xx0m1s=p2zz0p1s

ttt is gone, let's target sss

x−x0m2−m1m2∗s=y−y0n2−n1n2∗sx−x0m2−m1m2∗s=z−z0p2−p1p2∗s \frac{x-x_0}{m_2}-\frac{m_1}{m_2}*s=\frac{y-y_0}{n_2}-\frac{n_1}{n_2}*s \\ \frac{x-x_0}{m_2}-\frac{m_1}{m_2}*s=\frac{z-z_0}{p_2}-\frac{p_1}{p_2}*s m2xx0m2m1s=n2yy0n2n1sm2xx0m2m1s=p2zz0p2p1s
x−x0m2−y−y0n2=(m1m2−n1n2)∗sx−x0m2−z−z0p2=(m1m2−p1p2)∗s \frac{x-x_0}{m_2}-\frac{y-y_0}{n_2}=(\frac{m_1}{m_2}-\frac{n_1}{n_2})*s \\ \frac{x-x_0}{m_2}-\frac{z-z_0}{p_2}=(\frac{m_1}{m_2}-\frac{p_1}{p_2})*s m2xx0n2yy0=(m2m1n2n1)sm2xx0p2zz0=(m2m1p2p1)s
A=x−x0m2B=y−y0n2C=m1m2D=n1n2E=z−z0p2F=p1p2 A=\frac{x-x_0}{m_2} \\ B=\frac{y-y_0}{n_2} \\ C=\frac{m_1}{m_2} \\ D=\frac{n_1}{n_2} \\ E=\frac{z-z_0}{p_2} \\ F=\frac{p_1}{p_2} A=m2xx0B=n2yy0C=m2m1D=n2n1E=p2zz0F=p2p1
A−B=s∗(C−D)A−E=s∗(C−F) A-B=s*(C-D) \\ A-E=s*(C-F) AB=s(CD)AE=s(CF)
s=A−BC−Ds=A−EC−F s=\frac{A-B}{C-D} \\ s=\frac{A-E}{C-F} s=CDABs=CFAE
(A−B)(C−F)=(A−E)(C−D)AC−AF−BC+BF=AC−AD−CE+DE−AF−BC+BF=−AD−CD+DEAF−AD+BC−BF−CE+DE=0A∗(F−D)+B∗(C−F)+E∗(D−C)=0 (A-B)(C-F)=(A-E)(C-D) \\ AC-AF-BC+BF=AC-AD-CE+DE \\ -AF-BC+BF=-AD-CD+DE \\ AF-AD+BC-BF-CE+DE=0 \\ A*(F-D)+B*(C-F)+E*(D-C)=0 (AB)(CF)=(AE)(CD)ACAFBC+BF=ACADCE+DEAFBC+BF=ADCD+DEAFAD+BCBFCE+DE=0A(FD)+B(CF)+E(DC)=0
x−x0m2(p1p2−n1n2)+y−y0n2(m1m2−p1p2+z−z0p2∗(n1n2−m1m2)=0 \frac{x-x_0}{m_2}(\frac{p_1}{p_2}-\frac{n_1}{n_2})+\frac{y-y_0}{n_2}(\frac{m_1}{m_2}-\frac{p_1}{p_2}+\frac{z-z_0}{p_2}*(\frac{n_1}{n_2}-\frac{m_1}{m_2})=0 m2xx0(p2p1n2n1)+n2yy0(m2m1p2p1+p2zz0(n2n1m2m1)=0
a=p1p2−n1n2m2b=m1m2−p1p2n2c=n1n2−m1m2p2 a=\frac{\frac{p_1}{p_2}-\frac{n_1}{n_2}}{m_2} \\ b=\frac{\frac{m_1}{m_2}-\frac{p_1}{p_2}}{n_2} \\ c=\frac{\frac{n_1}{n_2}-\frac{m_1}{m_2}}{p_2} a=m2p2p1n2n1b=n2m2m1p2p1c=p2n2n1m2m1
a∗(x−x0)+b∗(y−y0)+c∗(z−z0)=0ax+by+cz−ax0−by0−cz0=0d=−ax0−by0−cz0ax+by+cz+d=0 a*(x-x_0)+b*(y-y_0)+c*(z-z_0)=0 \\ ax+by+cz-ax_0-by_0-cz_0=0 \\ d=-ax_0-by_0-cz_0 \\ ax+by+cz+d=0 a(xx0)+b(yy0)+c(zz0)=0ax+by+czax0by0cz0=0d=ax0by0cz0ax+by+cz+d=0
w‾=(a,b,c) \underline{w}=(a,b,c) w=(a,b,c)

This is the directional vector of the plane, orthogonal to it.

Geometry

To prove that w‾\underline{w}w is orthogonal to the plane, it should be orthogonal to both u‾\underline{u}u and v‾\underline{v}v .

The scalar product should be 0:

w‾⋅u‾=0w‾⋅v‾=0 \underline{w}\cdot\underline{u}=0 \\ \underline{w}\cdot\underline{v}=0 wu=0wv=0
w‾⋅u‾=a∗m1+b∗n1+c∗p1=p1p2−n1n2m2∗m1+m1m2−p1p2n2∗n1+n1n2−m1m2p2∗p1 \underline{w}\cdot\underline{u}=a*m_1+b*n_1+c*p_1=\frac{\frac{p_1}{p_2}-\frac{n_1}{n_2}}{m_2}*m_1+\frac{\frac{m_1}{m_2}-\frac{p_1}{p_2}}{n_2}*n_1+\frac{\frac{n_1}{n_2}-\frac{m_1}{m_2}}{p_2}*p_1 wu=am1+bn1+cp1=m2p2p1n2n1m1+n2m2m1p2p1n1+p2n2n1m2m1p1
w‾⋅u‾=p1p2m1m2−n1n2m1m2+m1m2n1n2−p1p2n1n2+n1n2p1p2−m1m2p1p2=0 \underline{w}\cdot\underline{u}=\frac{p_1}{p_2}\frac{m_1}{m_2}-\frac{n_1}{n_2}\frac{m_1}{m_2}+\frac{m_1}{m_2}\frac{n_1}{n_2}-\frac{p_1}{p_2}\frac{n_1}{n_2}+\frac{n_1}{n_2}\frac{p_1}{p_2}-\frac{m_1}{m_2}\frac{p_1}{p_2}=0 wu=p2p1m2m1n2n1m2m1+m2m1n2n1p2p1n2n1+n2n1p2p1m2m1p2p1=0
w‾⋅v‾=a∗m2+b∗n2+c∗p2=p1p2−n1n2m2∗m2+m1m2−p1p2n2∗n2+n1n2−m1m2p2∗p2=0 \underline{w}\cdot\underline{v}=a*m_2+b*n_2+c*p_2=\frac{\frac{p_1}{p_2}-\frac{n_1}{n_2}}{m_2}*m_2+\frac{\frac{m_1}{m_2}-\frac{p_1}{p_2}}{n_2}*n_2+\frac{\frac{n_1}{n_2}-\frac{m_1}{m_2}}{p_2}*p_2=0 wv=am2+bn2+cp2=m2p2p1n2n1m2+n2m2m1p2p1n2+p2n2n1m2m1p2=0