2026-03-26 15:56:46
The first time I watched a transition burn a full generation budget and still land on the wrong side of the edit, I knew the problem wasn’t quality — it was commitment. I was paying for the expensive answer before I had any evidence that the prompt had pointed the model in the right direction.
That’s what pushed me toward think frames. I wanted a cheap exploratory pass that could argue with itself before the pipeline spent real compute. Instead of generating one expensive candidate and hoping, I now generate a handful of lightweight sketches, score them, and only let the winner graduate to full-quality generation.
This is the part that felt obvious only after I built it: video generation needs scratch paper. LLMs have a place to reason before they answer; my generator didn’t. Think frames are the missing margin notes.
The idea came from a simple mismatch. A full keyframe is irreversible in the only way that matters: once I’ve paid for it, I’ve already committed to the path. If the transition is wrong, the loss isn’t just a bad frame — it’s wasted budget and a dead end in the chain.
The naive fix is to generate more full-quality candidates and pick the best one. I’ve done that. It works in the same way buying more lottery tickets works: you increase your odds by multiplying cost.
That is not the kind of engineering I enjoy defending.
Think frames changed the shape of the problem. I keep the exploration cheap, vary the prompt and commitment strength slightly, score the results with the same reward machinery I trust elsewhere, and then spend the expensive pass only on the winning path. The important shift is that the pipeline no longer asks, “Which full render is best?” It asks, “Which direction deserves to become a full render?”
Here’s the architecture in one pass:
flowchart TD
sourceFrame[Source frame] --> plan[Transition plan]
plan --> thinkGen[Generate think frames]
thinkGen --> score[Score candidates]
score --> pick[Pick winning path]
pick --> fullGen[Full-quality generation]
fullGen --> output[Final keyframe]```
That small detour is the whole trick. It gives the generator room to be wrong cheaply, which is exactly what the expensive stage needs.
## How I built the exploratory pass
I kept the implementation deliberately narrow. The think-frame module is not a second generator and not a separate product surface. It is a pre-generation layer that sits in front of the existing keyframe flow and feeds it better evidence.
The core comment at the top of `lib/think-frames.ts` says what the module is for, and I kept it that direct because the code has to earn its keep:
```typescript
/**
* Think Frames — Lightweight Exploratory Pre-Generation
*
* Inspired by DeepGen's "think tokens" (learnable intermediate representations
* injected between VLM and DiT).
*
* Before committing to a full-quality keyframe generation, this module generates
* lightweight "think frames" — quick low-inference-step sketches that explore
* different transition paths. These are scored by the Reward Mixer, and only
* the winning path proceeds to full-quality generation.
*/
That framing matters because it keeps the module honest. I’m not trying to make the sketch look good. I’m trying to make it informative.
The first design choice was to stop making every exploratory frame fight the same battle. In buildThinkFramePrompts, I vary the focus across five buckets: character, environment, mood, composition, and atmosphere. Each one gets its own suffix so the prompt explores a different preservation priority instead of collapsing everything into one mushy compromise.
const FOCUS_SUFFIXES: Record<ThinkFrame["focus"], string> = {
character: "Focus on maintaining character identity, facial features...",
environment: "Focus on maintaining environment, lighting, and color palette...",
mood: "Focus on maintaining mood, atmosphere, and tonal continuity.",
composition: "Focus on maintaining spatial composition, framing...",
atmosphere: "Focus on maintaining texture details, material appearance...",
}
I like this pattern because it makes the exploration legible. If a candidate wins, I know what kind of preservation it was good at. If it loses, I know which dimension failed without pretending the model made a single all-purpose judgment.
The tradeoff is obvious: I’m constraining the search space on purpose. That means I may miss a weird but useful hybrid path. But in exchange I get five interpretable probes instead of one vague guess, and for this pipeline that is the better bargain.
The second choice was to generate the candidates in parallel. I didn’t want the exploration pass to become a little queue of regrets. The module fans out the think frames together, then ranks the settled results after the fact.
const generationResults = await Promise.allSettled(
prompts.map((p, idx) =>
generator({
sourceImageUrl,
prompt: p.prompt,
strength: p.strength,
seed: baseSeed + idx,
})
)
)
That Promise.allSettled detail is doing real work. I wanted the cohort to survive partial failure. If one probe fails, the others still tell me something, and I don’t throw away a useful exploration round just because one branch misbehaved.
The non-obvious part is the seed progression. I offset the seed by index so each candidate gets a distinct path without turning the whole system into uncontrolled variation. The point is controlled diversity, not chaos with a nicer label.
A fixed threshold sounds tidy until you stare at a mediocre cohort. If every candidate lands around 0.65, an absolute cutoff can tell you all of them are bad and leave you nowhere. That’s too blunt for a selection step that is supposed to decide the least-wrong path.
So I use group-relative normalization in the reward mixer. The score is not just “is this candidate good?” It is “how does this candidate compare to the rest of this batch?” That’s the part that matters when the whole cohort is imperfect, which is often the real world.
The normalization function is compact, and I kept it that way because the idea should be easy to inspect:
/**
* Normalize an array of values using group-relative normalization:
* normalized[i] = (value[i] - mean) / (std + epsilon)
*
* This is the core of GRPO: candidates are scored relative to their peers
* rather than against absolute thresholds.
*/
export function normalizeGroupRelative(values: number[]): number[] {
if (values.length === 0) return []
if (values.length === 1) return [0]
const mean = values.reduce((s, v) => s + v, 0) / values.length
const variance = values.reduce((s, v) => s + (v - mean) ** 2, 0) / values.length
const std = Math.sqrt(variance)
return values.map((v) => (v - mean) / (std + EPSILON))
}
A note on what these scores actually are: normalizeGroupRelative returns z-scores — mean-centered, standard-deviation-scaled values that are unbounded in both directions. A single candidate always gets a score of zero. A cohort produces scores that tell you how far each candidate sits from the group mean, not where it lands on a fixed 0–1 scale. The reward weights below are coefficients on these relative distances, not percentages of a bounded composite.
What surprised me here was how much this changes the feel of selection. The pipeline stops acting like a judge with a single hard line and starts acting like a scout comparing several imperfect routes through the same terrain.
The limitation is that relative ranking only works if the cohort is meaningful. If all the probes are identical, the normalization has nothing interesting to say. That is why the focus variations and seed offsets matter so much: they make the batch worth comparing.
Think frames are only useful if the scoring surface can tell the difference between “looks plausible” and “preserves the right things.” I already had a multi-signal reward mixer for candidate scoring, so I reused that structure instead of inventing a separate heuristic just for exploration.
The mixer evaluates five signals: visual drift, color harmony, motion continuity, composition stability, and narrative coherence. The default weights are explicit:
export const DEFAULT_REWARD_WEIGHTS: RewardWeights = {
visualDrift: 0.30,
colorHarmony: 0.25,
motionContinuity: 0.15,
compositionStability: 0.15,
narrativeCoherence: 0.15,
}
I like that this makes the selection policy visible. Visual similarity matters most, but it doesn’t get to bully everything else. Color, motion, composition, and narrative continuity all still get a vote.
The important detail is that the mixer does not need every signal to be present. It skips nulls and renormalizes the remaining weights, which keeps the scorer from falling apart when one signal is unavailable. That makes the think-frame pass resilient in exactly the places I care about: partial evidence is still evidence.
Think frames are not a side quest. They are the front door to a three-stage progressive pipeline that I use to keep quality from collapsing into a single expensive guess.
The stage boundaries are spelled out in lib/progressive-pipeline.ts:
/**
* Stage 1 — Alignment (Generate): Think frames → select → full gen
* Stage 2 — Refinement (Diagnose & Adjust): Fix weakest signals → re-gen
* Stage 3 — Recovery (Last Resort): Aggressive fallback → always accept
*/
That structure matters because it gives me a place to be cautious before I become expensive. Stage 1 is where the think frames live. If the best probe looks good enough, I continue. If the result is weak, later stages can diagnose and adjust instead of blindly retrying the same mistake.
The pipeline config reflects that same philosophy:
export const DEFAULT_PIPELINE_CONFIG: PipelineConfig = {
stage1Threshold: 0.70,
stage2Threshold: 0.60,
thinkFrameCount: 3,
...
}
I’m intentionally not pretending the thresholds are magical. They are just gates that separate “continue exploring” from “move forward with what we have.” The think-frame pass reduces how often I have to spend full-quality compute just to discover the prompt was off by a mile.
I didn’t build this because it sounds elegant. I built it because full-quality generation is the expensive part, and I was tired of paying for expensive uncertainty.
Think frames let me spend a little to learn a lot. The exploration pass is lightweight by design, and the winning path is the only one that gets promoted. That means I can inspect several candidate directions without paying full price for every one of them.
The practical difference is not subtle. A cohort of cheap sketches gives me a chance to reject a bad transition before I’ve committed to a full render. That is the kind of savings that shows up as fewer wasted generations and fewer dead-end branches in the chain.
I had to resist the temptation to optimize the wrong thing. A think frame is not supposed to be a nice preview. It is supposed to be a diagnostic artifact. If it becomes too polished, it starts hiding the very mistakes I want to catch early.
That’s why the module varies strength as part of the exploration. I’m not only changing the prompt; I’m also changing how hard the image-to-image step clings to the source. That gives me a cheap way to probe the tradeoff between preservation and creativity before I commit to the final pass.
The benefit is that I can see which path preserves identity, which one keeps composition stable, and which one drifts too far. The downside is that exploratory frames are intentionally rough, so they are not meant for human review as finished artifacts. They are for the machine that has to decide where to spend next.
What I appreciate most is that think frames made the pipeline less superstitious. Before, the generator had to guess and the budget had to trust it. Now I have a cheap cohort, a real scorer, and a selection step that chooses the best path from a small set of interpretable alternatives.
That's a better deal than hoping the first expensive pass gets lucky. I’m no longer asking the model to be right on the first expensive try. I’m asking it to show me its working notes first, then I spend the real budget on the note that actually makes sense.
And that, more than anything, is why think frames earned their place: they turn video generation from a single throw of the dice into a short conversation before the bill arrives.
🎧 Listen to the audiobook — Spotify · Google Play · All platforms
🎬 Watch the visual overviews on YouTube
📖 Read the full 13-part series with AI assistant
2026-03-26 15:51:55
Most content systems do not break at the draft step. They break one layer later, when the team still has to prove that the right version reached the right surface without losing the original job of the article.
That is the practical angle here. The point is not that AI can generate another draft. The point is what the workflow has to guarantee after the draft exists.
If you are designing publishing or content tooling, this kind of problem shows up as a product issue long before it shows up as a writing issue. A fluent article can still be the wrong article, the wrong version, or the wrong release state.
The technical problem behind real estate content workflow automation is rarely "how do we generate more text?" The harder problem is system design: how do you preserve source truth, create platform-specific variants, and verify that the public result actually matches the intent of the workflow?
EstatePass is a useful case study because the public site exposes two related operating surfaces. On one side, EstatePass highlights 2,500+ practice questions for learners preparing for the licensing exam. On the other, EstatePass publicly highlights 75+ free agent tools for real estate professionals. That combination makes the product interesting as a publishing pipeline problem, not just as a writing tool.
In other words, the value question is not simply whether AI can draft. It is whether the workflow can carry context from source to channel without degrading quality.
If you are evaluating real estate content workflow automation, the real design requirement is this: generation has to remain subordinate to orchestration. The draft layer only helps when the system also knows:
A surprising number of teams still miss that last part. They automate the draft, partially automate distribution, and then leave verification as a vague manual step. That creates dashboards that say "done" when the public page is still broken, incomplete, or misaligned.
Once a workflow spans multiple channels, the fragile points become predictable.
If grounding is shallow, later drafts lose specificity. The system starts generating fluent but unsupported claims because the source material never had enough useful detail.
Many teams still confuse adaptation with copy-paste plus minor edits. In practice, Medium, Substack, a company blog, HackerNoon, and community blogs all need different framing, different openings, and often different levels of explanation.
If the workflow waits until after publishing to inspect quality, the expensive error has already occurred. At that point, the team is doing cleanup, not prevention.
Draft created is not published. Published in an admin panel is not publicly live. Publicly live is not the same as complete, indexable, and on-strategy.
That fourth failure mode is the one that most reliably destroys trust in a pipeline. Once people stop believing the success signal, every automated gain gets discounted.
A stronger architecture around real estate content workflow automation usually includes five explicit layers:
The public EstatePass pages around exam prep, practice questions, state-specific exam prep, agent tools, and listing description tool are useful because they make the grounding layer concrete. The product is not starting from abstract claims. It is starting from pages that reveal audience, positioning, and public capability language.
Grounding sounds like a prompt detail until you watch what happens without it. Without a stable source layer, the system starts over-inferencing product capabilities, mixing exam-prep language with agent-growth language, and flattening platform differences that actually matter.
In a workflow like this, grounding is doing at least three jobs:
That is why the source layer cannot just be random site fragments. Navigation text, slogans, or pricing snippets do not provide enough semantic weight to anchor good content. The workflow needs page-level meaning, not scraps.
One architectural choice matters more than it first appears: keep a canonical version that owns the deepest explanation.
The canonical layer should carry:
Then platform variants can transform that source instead of imitating it blindly. This is where weak systems often fail. They either flatten every channel into one article, or they generate every channel independently and lose consistency. Neither scales well.
A better system lets the canonical piece hold the dense explanation while Medium, Substack, and other channel variants reshape the framing for their own audience expectations.
Operator-style prompting is not just "more detailed instructions." It changes the contract between the orchestration layer and the model.
Instead of saying "write an article," the prompt can specify:
That matters because many strategic errors happen before the first word of the draft. If the system does not enforce those constraints, the output can sound polished while still being wrong for the brand, wrong for the channel, or wrong for the search intent.
Verification is often treated as a human QA chore. That is understandable, but it is also expensive and unreliable once publishing volume increases.
A stronger pipeline defines destination-specific success criteria up front. For example:
That is the difference between workflow theater and workflow design. The system either knows what "landed" means, or it does not.
Mature pipelines also need recovery logic. When one platform fails and another succeeds, the workflow has to decide whether to retry, hold the batch, replace the topic, or mark the item for manual review.
Without that logic, the system usually falls into one of three bad habits:
Recovery is not a side concern. It determines whether the pipeline can keep operating over time without polluting analytics and editorial decisions.
AI lowers the cost of the draft layer. That shifts the real competitive edge upward into coordination. The better systems are not simply the ones that write more. They are the ones that make reuse, correction, adaptation, and verification cheaper than starting over.
That is why searches around real estate crm workflow automation, real estate content creation workflow, real estate workflow technology, real estate workflow system increasingly point to the same question: how do you build a content workflow that remains controllable after the first draft? The answer usually has less to do with prompting genius and more to do with architecture discipline.
If you are building or assessing a system around real estate content workflow automation, ask:
These are not implementation trivia. They are the questions that determine whether the workflow can scale without losing trust.
EstatePass is interesting here because the public site already suggests a multi-surface publishing logic. The exam-prep side, visible through exam prep, practice questions, and state-specific exam prep, needs search-oriented, learner-friendly explanation. The agent-tool side, visible through agent tools and listing description tool, needs operator-oriented framing and practical workflow use cases.
That split creates a real architecture requirement. If the system does not preserve channel boundaries, the content starts mixing exam-prep language and agent-ops language in ways that weaken both. This is exactly the kind of problem that orchestration should solve.
The future of AI publishing systems is probably not decided by who can produce the most text the fastest. It is more likely to be decided by who can preserve context across the whole pipeline: source truth, audience boundary, platform fit, acceptance logic, and retry safety.
In that sense, the most valuable part of real estate content workflow automation is not the generation model. It is the architecture that tells the model what job it is actually doing.
Once a team expects repeatable output across channels, the draft is no longer the product. The workflow is the product. The architecture behind real estate content workflow automation determines whether automation creates leverage or just scales cleanup.
The useful shift is to treat orchestration, verification, and release-state checks as first-class product features. Once draft speed improves, those layers become the parts people actually trust or distrust.
That is the part worth building for first.
2026-03-26 15:47:52
Introduction
Linux is an open-source operating system that is widely used in software development, servers, and cybersecurity. Unlike Windows, Linux relies heavily on a command-line interface (CLI), which allows users to interact with the system using text commands. Learning Linux basics is important because it helps users understand how systems work behind the scenes and improves efficiency when working on technical tasks.
The Command Line Interface and the Shell
The command line interface (CLI) is a text-based environment where users type commands to perform operations such as navigating files, creating directories, and managing the system.
The shell is the program that interprets the commands entered by the user. The most common shell in Linux is Bash. When a command is entered, the shell processes it and communicates with the operating system to execute it.
Navigating the Linux File System
Linux uses a hierarchical file system that starts from the root directory (/). Important commands include:
pwd: shows the current directory
ls: lists files and folders
cd: changes directories
Example:
cd Documents
cd ..
File and Directory Management
Creating files and folders:
mkdir foldername
touch filename
Deleting:
rm filename
rmdir foldername
Copying and moving:
cp file1 file2
mv file1 file2
Working with Files
Viewing files:
cat filename
less filename
Editing files:- nano filename
Writing to files:
echo "Hello" > file.txt
echo "World" >> file.txt
Searching for Files and Content
find . -name "filename"
grep "text" filename
These commands help locate files and search within them.
File Permissions
Linux controls access through permissions:
Read (r)
Write (w)
Execute (x)
To change permissions:
chmod +x script.sh
To view permissions:
ls -l
Basics of Networking
Networking allows computers to communicate.
Useful commands:
ip a (shows IP address)
ping google.com (tests connectivity)
Key concepts include IP addresses, routers, and DNS.
Package Management
Software installation in Ubuntu is done using:
*sudo apt update
*sudo apt install package-name
Example:
sudo apt install git
2026-03-26 15:45:16
Your AI agent can analyze market data, generate trading strategies, and even write smart contracts. But when it comes time to actually execute a trade or pay for premium API access? It hits a wall. Most AI agents can think about money, but they can't touch it.
This gap between AI decision-making and financial execution is where many automation dreams break down. You end up manually copying addresses, approving transactions, and babysitting what should be autonomous processes. Meanwhile, your agent sits idle, waiting for human intervention to complete tasks it could handle end-to-end.
AI agents need wallets the same way they need access to files, APIs, and databases—as tools to accomplish their goals. But traditional wallet integration is either too restrictive (requiring manual approval for every transaction) or too dangerous (giving agents full access to your funds with no safety nets).
WAIaaS bridges this gap with a self-hosted Wallet-as-a-Service that gives AI agents controlled access to blockchain operations. Instead of choosing between safety and automation, you get both: agents can execute transactions programmatically while operating within policies you define.
The platform exposes wallet functionality through both a TypeScript SDK and Python SDK, making it easy to integrate with any AI agent framework. Whether you're building with LangChain, CrewAI, or Claude's MCP protocol, your agents can now handle the complete workflow from analysis to execution.
Let's walk through integrating WAIaaS with an AI agent. First, install the SDK and start a local WAIaaS instance:
npm install @waiaas/sdk
npm install -g @waiaas/cli
waiaas init # Create data directory + config.toml
waiaas start # Start daemon (sets master password on first run)
waiaas quickset --mode mainnet # Create wallets + MCP sessions in one step
Once your daemon is running, you can create a client connection:
import { WAIaaSClient } from '@waiaas/sdk';
const client = new WAIaaSClient({
baseUrl: 'http://127.0.0.1:3100',
sessionToken: process.env.WAIAAS_SESSION_TOKEN,
});
// Check balance
const balance = await client.getBalance();
console.log(`${balance.balance} ${balance.symbol}`);
// Send native token
const tx = await client.sendToken({
to: 'recipient-address...',
amount: '0.1',
});
console.log(`Transaction: ${tx.id}`);
The SDK provides clean abstractions over WAIaaS's REST API, handling authentication, error management, and transaction lifecycle automatically. Your agent code focuses on business logic rather than blockchain mechanics.
Here's a practical example: an AI agent that monitors its operating budget and automatically tops up when needed. This agent can execute DeFi swaps, check balances, and even pay for its own API calls using the x402 protocol.
import { WAIaaSClient, WAIaaSError } from '@waiaas/sdk';
const client = new WAIaaSClient({
baseUrl: process.env['WAIAAS_BASE_URL'] ?? 'http://localhost:3100',
sessionToken: process.env['WAIAAS_SESSION_TOKEN'],
});
// Step 1: Check wallet balance
const balance = await client.getBalance();
console.log(`Balance: ${balance.balance} ${balance.symbol} (${balance.chain}/${balance.network})`);
// Step 2: Send tokens
const sendResult = await client.sendToken({
type: 'TRANSFER',
to: 'recipient-address',
amount: '0.001',
});
console.log(`Transaction submitted: ${sendResult.id} (status: ${sendResult.status})`);
// Step 3: Poll for confirmation
const POLL_TIMEOUT_MS = 60_000;
const startTime = Date.now();
while (Date.now() - startTime < POLL_TIMEOUT_MS) {
const tx = await client.getTransaction(sendResult.id);
if (tx.status === 'COMPLETED') {
console.log(`Transaction confirmed! Hash: ${tx.txHash}`);
break;
}
if (tx.status === 'FAILED') {
console.error(`Transaction failed: ${tx.error}`);
break;
}
await new Promise(resolve => setTimeout(resolve, 1000));
}
The agent can handle the complete transaction lifecycle: checking balances, submitting transactions, and monitoring for confirmation. Error handling is built-in through the WAIaaSError class, which provides structured error codes like INSUFFICIENT_BALANCE or POLICY_DENIED.
If you're working in Python with frameworks like LangChain or AutoGPT, the Python SDK provides the same functionality with familiar async patterns:
pip install waiaas
from waiaas import WAIaaSClient
async with WAIaaSClient("http://localhost:3100", "wai_sess_xxx") as client:
balance = await client.get_balance()
print(balance.balance, balance.symbol)
Both SDKs expose the same core methods: getBalance(), sendToken(), getTransaction(), listTransactions(), and signTransaction(). They also support advanced features like the x402 HTTP payment protocol, where agents can automatically pay for API calls by including payment headers.
WAIaaS implements a 3-layer security model that gives agents autonomy while protecting your funds. When you create a session for an agent, you're issuing time-limited credentials with specific permissions. The agent can execute approved transactions immediately, while larger amounts trigger delays and notifications.
Session tokens use JWT HS256 encoding and include built-in rate limiting and TTL controls. You can set absolute lifetime limits, renewal caps, and spending thresholds. If an agent goes rogue or gets compromised, you can revoke its session without touching the underlying wallet.
The platform also supports Account Abstraction through ERC-4337, enabling gasless transactions and smart account features. Your agents can operate on multiple chains without managing gas tokens, making cross-chain workflows seamless.
WAIaaS includes 14 DeFi protocol providers integrated: aave-v3, across, dcent-swap, drift, erc8004, hyperliquid, jito-staking, jupiter-swap, kamino, lido-staking, lifi, pendle, polymarket, and zerox-swap. Agents can swap tokens, provide liquidity, stake assets, and even trade prediction markets—all through the same SDK interface.
For example, executing a Jupiter swap on Solana is as simple as:
curl -X POST http://127.0.0.1:3100/v1/actions/jupiter-swap/swap \
-H "Content-Type: application/json" \
-H "Authorization: Bearer wai_sess_<token>" \
-d '{
"inputMint": "So11111111111111111111111111111111111111112",
"outputMint": "EPjFWdd5AufqSSqeM2qN1xzybapC8G4wEGGkZwyTDt1v",
"amount": "1000000000"
}'
The SDK wraps these protocol interactions in clean method calls, so your agent doesn't need to understand the underlying DEX mechanics.
If you're using Claude Desktop, WAIaaS provides 45 MCP tools for seamless integration. The tools cover everything from basic wallet operations to advanced DeFi positions management. Claude can check balances, execute swaps, monitor transaction status, and even manage cross-chain bridging.
Setting up MCP integration is straightforward:
waiaas mcp setup --all
This automatically registers all your wallets with Claude Desktop, providing instant access to blockchain operations through natural language commands.
Ready to give your AI agent a wallet? Here's the fastest path:
npm install -g @waiaas/cli && waiaas init --auto-provision
waiaas start (runs on http://127.0.0.1:3100)waiaas quickset --mode mainnet
npm install @waiaas/sdk or pip install waiaas
Your agent now has programmatic access to multi-chain wallets with built-in safety controls.
The WAIaaS SDK gives your AI agents the financial tools they need to operate autonomously while keeping your funds secure. Whether you're building trading bots, payment processors, or autonomous DAOs, the combination of programmatic control and policy-based security opens up new possibilities for AI-driven financial applications.
Ready to give your AI agents a wallet? Check out the complete documentation and examples at https://github.com/minhoyoo-iotrust/WAIaaS, or visit https://waiaas.ai to learn more about the platform's capabilities.
2026-03-26 15:43:53
You've collected data and you have a model in mind — maybe a Gaussian, maybe a coin flip. But the model has parameters, and you need to find the values that best explain what you observed. How?
Maximum Likelihood Estimation (MLE) answers this with a deceptively simple idea: choose the parameters that make your observed data most probable. By the end of this post, you'll implement MLE from scratch for three distributions, understand why we always work with log-likelihoods, and see how MLE connects to more advanced algorithms like EM.
Let's start with the simplest possible case. You flip a coin 100 times and get 73 heads. What's the coin's bias?
import numpy as np
import matplotlib.pyplot as plt
# Observed data: 73 heads out of 100 flips
n_heads = 73
n_tails = 27
n_total = n_heads + n_tails
# Compute likelihood for every possible bias value
theta_values = np.linspace(0.01, 0.99, 200)
likelihoods = theta_values**n_heads * (1 - theta_values)**n_tails
# The MLE is simply the proportion of heads
theta_mle = n_heads / n_total
plt.figure(figsize=(8, 4))
plt.plot(theta_values, likelihoods / likelihoods.max(), 'b-', linewidth=2)
plt.axvline(x=theta_mle, color='r', linestyle='--', label=f'MLE: θ = {theta_mle:.2f}')
plt.xlabel('θ (coin bias)')
plt.ylabel('Likelihood (normalised)')
plt.title('Likelihood Function for a Coin Flip')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
print(f"MLE estimate: θ = {theta_mle:.2f}")
Run this and you'll see: the likelihood function peaks at $\theta = 0.73$ — exactly the proportion of heads. That peak is the Maximum Likelihood Estimate.
You just performed MLE. The idea is intuitive: if 73 out of 100 flips were heads, the most plausible bias is 0.73. Now let's understand the machinery behind it.
This distinction trips up almost everyone. Here's the key:
$P(\text{data} \mid \theta)$
$\mathcal{L}(\theta \mid \text{data})$
Same formula, different perspective. When we observe 73 heads and plot $\theta^{73}(1-\theta)^{27}$ as a function of $\theta$, we're computing the likelihood — it tells us which parameter values are most consistent with what we saw.
For a single coin flip with bias $\theta$:
where $x = 1$ for heads, $x = 0$ for tails.
For $n$ independent flips, the joint likelihood is the product:
where $k$ is the total number of heads.
Watch what happens when you multiply many small probabilities:
# Each flip has probability around 0.73
# Multiplying 100 of them together...
product = 0.73**73 * 0.27**27
print(f"Raw likelihood: {product:.2e}") # Astronomically small!
The raw likelihood is something like $10^{-35}$. With thousands of data points, you'll hit numerical underflow — the computer rounds to exactly zero. This is why we use log-likelihood.
Taking the logarithm converts products into sums:
Since $\log$ is monotonically increasing, maximising the log-likelihood gives the same answer as maximising the likelihood. But sums are numerically stable and much easier to differentiate.
To find the maximum, take the derivative and set it to zero:
Solving:
The MLE for a coin is simply the proportion of heads. This confirms what our intuition told us.
Coins are nice, but most real data is continuous. Let's apply MLE to the Gaussian (Normal) distribution, where we need to estimate two parameters: the mean $\mu$ and standard deviation $\sigma$.
For $n$ observations from $\mathcal{N}(\mu, \sigma^2)$:
Let's implement this from scratch:
from math import log, pi
def normal_log_likelihood(data, mu, sigma):
"""Compute log-likelihood of data under a Normal distribution."""
n = len(data)
ll = -0.5 * n * log(2 * pi) - n * log(sigma)
ll -= 0.5 * sum((x - mu)**2 / sigma**2 for x in data)
return ll
Here's a vectorised version that drops the constant $-\frac{n}{2}\log(2\pi)$ (it doesn't affect the location of the maximum):
def normal_log_likelihood_fast(data, mu, sigma):
"""Vectorised log-likelihood (ignoring constant offset)."""
n = len(data)
residuals = data - mu
return -0.5 * (n * np.log(sigma**2) + np.sum(residuals**2 / sigma**2))
Setting the partial derivatives to zero gives us the familiar formulas:
The MLE for the mean is the sample mean, and the MLE for the variance is the sample variance (with $n$ in the denominator, not $n-1$).
Sometimes you can't solve analytically. In those cases, you can use numerical optimisation. We minimise the negative log-likelihood (since optimisers minimise by default):
from scipy.optimize import minimize
# Generate data from N(1, 1)
np.random.seed(42)
data = np.random.normal(loc=1.0, scale=1.0, size=10_000)
# Start from a bad guess
x0 = np.array([0.5, 2.0]) # [mu_guess, sigma_guess]
result = minimize(
lambda params: -normal_log_likelihood_fast(data, params[0], params[1]),
x0,
method='nelder-mead',
options={'xatol': 1e-8}
)
print(f"True: μ = 1.000, σ = 1.000")
print(f"MLE: μ = {result.x[0]:.3f}, σ = {result.x[1]:.3f}")
print(f"Analytic: μ = {data.mean():.3f}, σ = {data.std():.3f}")
The numerical optimiser converges to the same answer as the analytical solution. This is reassuring — and the numerical approach generalises to models where no closed-form solution exists.
With two parameters, the likelihood becomes a surface. Let's plot it:
mu_range = np.linspace(0.5, 1.5, 100)
sigma_range = np.linspace(0.7, 1.3, 100)
MU, SIGMA = np.meshgrid(mu_range, sigma_range)
LL = np.zeros_like(MU)
for i in range(len(mu_range)):
for j in range(len(sigma_range)):
LL[j, i] = normal_log_likelihood_fast(data, MU[j, i], SIGMA[j, i])
plt.figure(figsize=(8, 6))
plt.contourf(MU, SIGMA, LL, levels=30, cmap='viridis')
plt.colorbar(label='Log-Likelihood')
plt.plot(data.mean(), data.std(), 'r*', markersize=15, label='MLE')
plt.xlabel('μ')
plt.ylabel('σ')
plt.title('Log-Likelihood Surface for Normal Distribution')
plt.legend()
plt.show()
The contour plot shows a single, clear peak — the log-likelihood for the Normal distribution is concave, so the MLE is guaranteed to be the global maximum. Not all distributions are this well-behaved.
Now let's tackle a distribution with multiple parameters. A multinomial distribution models $k$ possible outcomes (like a loaded die), each with probability $p_1, p_2, \ldots, p_k$ where $\sum p_i = 1$.
For an observation of $x_1, x_2, \ldots, x_k$ counts:
Implementation:
from math import log, factorial
def multinomial_log_likelihood(obs, probs):
"""Compute log-likelihood for a single multinomial observation."""
n = sum(obs)
# Multinomial coefficient: n! / (x1! * x2! * ... * xk!)
log_coeff = log(factorial(n)) - sum(log(factorial(x)) for x in obs)
# Probability term: sum(xi * log(pi)), skip zero counts to avoid log(0)
log_prob = sum(x * log(p) for x, p in zip(obs, probs) if x > 0)
return log_coeff + log_prob
If you've read the EM algorithm tutorial, this function should look familiar — it's the exact likelihood function the EM algorithm uses internally to compute soft assignments.
With a three-state multinomial ($k=3$), we have two free parameters (since $p_3 = 1 - p_1 - p_2$). Let's search over a grid:
# Generate data from a 3-state multinomial: P = [0.5, 0.2, 0.3]
np.random.seed(42)
true_probs = [0.5, 0.2, 0.3]
data = np.random.multinomial(1, true_probs, size=100) # 100 single-draw experiments
def total_log_likelihood(data, probs):
"""Sum log-likelihood across all observations."""
return sum(multinomial_log_likelihood(obs, probs) for obs in data)
# Grid search over (p1, p2), with p3 = 1 - p1 - p2
best_ll = -np.inf
best_probs = None
for p1 in np.arange(0.05, 0.95, 0.05):
for p2 in np.arange(0.05, 0.95 - p1, 0.05):
p3 = 1 - p1 - p2
if p3 > 0:
ll = total_log_likelihood(data, [p1, p2, p3])
if ll > best_ll:
best_ll = ll
best_probs = [p1, p2, p3]
# Compare with the analytical MLE (sample proportions)
sample_probs = data.sum(axis=0) / data.sum()
print(f"True: P = [{true_probs[0]:.2f}, {true_probs[1]:.2f}, {true_probs[2]:.2f}]")
print(f"Grid MLE: P = [{best_probs[0]:.2f}, {best_probs[1]:.2f}, {best_probs[2]:.2f}]")
print(f"Analytic: P = [{sample_probs[0]:.2f}, {sample_probs[1]:.2f}, {sample_probs[2]:.2f}]")
Once again, the MLE turns out to be the sample proportions — count how often each outcome occurred and divide by the total.
Notice a pattern across all three distributions:
| Distribution | Parameters | MLE |
|---|---|---|
| Bernoulli |
$\theta$ (bias) |
Proportion of successes |
| Normal | $\mu, \sigma$ |
Sample mean, sample std |
| Multinomial | $p_1, \ldots, p_k$ |
Sample proportions |
MLE often gives you the "obvious" answer. But the framework matters because: (1) it proves why these are optimal, (2) it generalises to complex models where intuition fails, and (3) it connects to algorithms like EM and MCMC that handle cases where direct maximisation isn't possible.
MLE can overfit. If you flip a coin 3 times and get 3 heads, the MLE says $\theta = 1.0$ — the coin always lands heads. With small data, consider Bayesian approaches that incorporate prior beliefs.
The MLE for variance divides by $n$, not $n-1$. This makes it biased — it systematically underestimates the true variance. The unbiased estimator uses $n-1$ (Bessel's correction). For large $n$ the difference is negligible, but it matters for small samples.
Always use log-likelihood instead of raw likelihood. With even 100 data points, the raw likelihood will underflow to zero. Our vectorised implementation avoids this by working entirely in log-space.
The Normal distribution has a concave log-likelihood, so there's a single global maximum. But more complex models (mixture models, neural networks) may have multiple local maxima. The EM algorithm only guarantees convergence to a local maximum, which is why multiple initialisations are important.
What if you can't observe everything? If someone secretly picks one of two coins for each experiment and you only see the outcomes, you can't directly compute the MLE — you'd need to sum over all possible hidden variable configurations.
This is exactly the problem the EM algorithm solves: it alternates between estimating the hidden variables (E-step) and maximising the likelihood (M-step). MLE is the building block that EM relies on.
Maximum Likelihood was formalised by Ronald Aylmer Fisher in his landmark 1922 paper "On the Mathematical Foundations of Theoretical Statistics", published in Philosophical Transactions of the Royal Society.
Fisher was 31 years old, working at Rothamsted Experimental Station analysing agricultural data. He needed a principled way to estimate parameters from data, and he wasn't satisfied with the existing methods (particularly Karl Pearson's method of moments).
His key insight: among all possible parameter values, choose the one that makes the observed data most probable. He called this the "optimum" and later the "maximum likelihood" estimate.
"The method here put forward is the most general so far developed for the systematic treatment of the problems of estimation."
— Fisher (1922)
Fisher argued that a good estimator should be:
$n \to \infty$, the estimate converges to the true valueHe proved that MLE satisfies all three properties under regularity conditions. No other general estimation method can do better asymptotically — this is the Cramér-Rao lower bound.
Fisher defined the likelihood function as:
And the MLE as:
The score function (gradient of the log-likelihood) is:
At the MLE, $S(\hat{\theta}) = 0$. The Fisher Information measures how much information the data carries about $\theta$:
The variance of the MLE is bounded by the inverse Fisher Information:
This is the Cramér-Rao bound, and MLE asymptotically achieves it.
Bishop's Pattern Recognition and Machine Learning (2006), Chapter 2, provides an excellent modern treatment of MLE in the context of machine learning. Key insights include:
MLE is the foundation of modern statistical learning:
| Method | Relationship to MLE |
|---|---|
| Logistic Regression | MLE of Bernoulli parameters given features |
| Linear Regression (OLS) | MLE under Gaussian noise assumption |
| EM Algorithm | MLE with latent (hidden) variables |
| Neural Network Training | MLE via gradient descent on cross-entropy loss |
| MCMC | Bayesian alternative when MLE isn't enough |
The interactive notebook includes exercises:
$n$ and verify the $1/\sqrt{n}$ convergence rateMLE finds the parameter values that make the observed data most probable under a given statistical model. You write the likelihood function (the probability of the data as a function of the parameters), then find the parameters that maximise it. MLE is the most widely used estimation method in statistics and machine learning.
The likelihood is a product of probabilities, which can become astronomically small for large datasets and cause numerical underflow. Taking the logarithm converts products into sums, which are numerically stable and easier to differentiate. The maximum occurs at the same parameter values because the logarithm is a monotonic function.
No. MLE can be biased for small samples. A classic example is the MLE of variance, which divides by n instead of (n-1) and systematically underestimates the true variance. However, MLE is asymptotically unbiased: the bias vanishes as the sample size grows, and MLE achieves the lowest possible variance among consistent estimators.
The likelihood surface can have multiple local maxima, especially for complex models like mixture models. Gradient-based optimisation may converge to a local maximum depending on the starting point. Common solutions include running the optimisation from multiple random starts, using the EM algorithm (for latent variable models), or using global optimisation methods.
For linear regression with normally distributed errors, MLE and ordinary least squares give exactly the same parameter estimates. Minimising the sum of squared residuals is equivalent to maximising the Gaussian likelihood. MLE is more general because it works with any probability distribution, not just the Gaussian.
2026-03-26 15:41:22
On March 10, 2026, Aave — the largest DeFi lending protocol by TVL — liquidated $27 million in wstETH collateral from 34 innocent users. No hacker was involved. No flash loan. No exploit contract. The protocol simply misconfigured its own oracle and ate its own users alive.
The culprit? A parameter desynchronization in Aave's Correlated Asset Price Oracle (CAPO) system that undervalued wstETH by 2.85% — just enough to make perfectly healthy positions look underwater. Automated liquidation bots did the rest in minutes.
This article dissects exactly what went wrong, traces the bug to two desynchronized state variables, and extracts five oracle safety patterns that could have prevented this — patterns every protocol running automated risk management needs to implement yesterday.
Aave's CAPO system exists to solve a real problem: price manipulation for correlated assets. When you use wstETH (wrapped staked ETH) as collateral, its value is tightly correlated to ETH — but not identical. The wstETH/ETH exchange rate drifts upward over time as staking rewards accrue.
CAPO enforces a protective cap on this exchange rate, preventing an attacker from artificially inflating the wstETH/ETH ratio to borrow more than they should. It does this by:
snapshotRatio — the last known-good exchange ratesnapshotTimestamp — when that ratio was recordedIn theory, this is sound defensive engineering. In practice, a subtle implementation bug turned this safety system into a weapon against the users it was supposed to protect.
Here's the exact failure sequence:
Chaos Labs' Edge Risk engine (an off-chain automated system) determined the CAPO maximum price should be updated to 1.1933947 wstETH/ETH. The actual market rate at this moment was higher — approximately 1.2285 ETH.
When the update transaction hit the smart contract, an on-chain constraint kicked in: snapshotRatio can only increase by a maximum of 3% every 3 days. The proposed increase exceeded this limit.
The contract dutifully capped the snapshotRatio at approximately 1.1919 — the maximum allowed increase from the previous snapshot.
Here's the critical bug: while snapshotRatio was capped at 1.1919, the snapshotTimestamp updated as if the full target ratio (1.2282) had been applied. The timestamp jumped forward to match the seven-day reference window used in the calculation.
Two variables that must stay synchronized — ratio and timestamp — were now out of sync.
With the timestamp artificially advanced but the ratio artificially held back, the CAPO system computed a maximum allowable exchange rate of approximately 1.1939 wstETH/ETH.
The real market rate: ~1.228 ETH.
The undervaluation: 2.85%.
That 2.85% was enough. Leveraged wstETH positions that were safely collateralized at the real exchange rate suddenly appeared underwater when priced against CAPO's deflated maximum. Automated liquidation bots — which don't ask questions — pounced.
In minutes, 10,938 wstETH was forcibly liquidated across 34 accounts. External liquidators pocketed an estimated 499 ETH in profit.
A hack requires an external attacker. You can blame them, hunt them, sometimes recover funds. This was an auto-immune attack — the protocol's own safety mechanism destroyed value it was designed to protect.
Three factors made this particularly damaging:
1. Speed of automated liquidation: By the time anyone noticed the misconfiguration, bots had already completed the liquidations. There was no circuit breaker, no cooldown period, no human-in-the-loop checkpoint.
2. Trust in the safety system itself: CAPO was specifically designed to prevent oracle manipulation. Users trusted that their wstETH collateral was being fairly priced because CAPO existed. The safety system's failure was invisible until positions were already gone.
3. No bad debt, but massive user harm: Aave's protocol remained solvent — it didn't accrue bad debt. But 34 users lost positions worth $27M due to a configuration error, not market conditions. The protocol was "fine" while its users were wrecked.
Aave's team responded within hours:
The governance proposal to fully reimburse affected users is currently under discussion. But the damage to trust in automated oracle systems extends far beyond this one incident.
The bug was a desynchronization between two coupled state variables. This is a classic pitfall when updating multi-variable state:
// ❌ VULNERABLE: Non-atomic coupled update
function updateSnapshot(uint256 newRatio, uint256 newTimestamp) internal {
// Ratio gets capped...
uint256 cappedRatio = Math.min(newRatio, maxAllowedIncrease());
snapshotRatio = cappedRatio;
// ...but timestamp updates unconditionally
snapshotTimestamp = newTimestamp; // BUG: desynced from capped ratio
}
// ✅ SAFE: Atomic coupled update
function updateSnapshot(uint256 newRatio, uint256 newTimestamp) internal {
uint256 cappedRatio = Math.min(newRatio, maxAllowedIncrease());
if (cappedRatio < newRatio) {
// Ratio was capped — timestamp must reflect the CAPPED value's
// growth timeline, not the proposed value's reference window
uint256 adjustedTimestamp = calculateTimestampForRatio(cappedRatio);
snapshotRatio = cappedRatio;
snapshotTimestamp = adjustedTimestamp;
} else {
snapshotRatio = newRatio;
snapshotTimestamp = newTimestamp;
}
}
Rule: If two state variables are mathematically coupled, they must update atomically and consistently. If one is capped/modified, the other must adjust proportionally.
No legitimate market movement justifies instant mass liquidation of correlated-asset positions:
// Delay liquidation of correlated-asset positions after oracle updates
uint256 public constant ORACLE_UPDATE_COOLDOWN = 15 minutes;
mapping(address => uint256) public lastOracleUpdate;
modifier liquidationAllowed(address asset) {
require(
block.timestamp >= lastOracleUpdate[asset] + ORACLE_UPDATE_COOLDOWN,
"Oracle update cooldown active"
);
_;
}
A 15-minute cooldown after any oracle parameter change gives the team time to verify the update didn't introduce pricing errors — before bots can liquidate against the new price.
If an oracle update would change collateral valuation by more than a threshold, pause and require manual confirmation:
uint256 public constant MAX_PRICE_DEVIATION = 200; // 2% in basis points
function updateOraclePrice(uint256 newPrice) external {
uint256 currentPrice = getLatestPrice();
uint256 deviation = calculateDeviation(currentPrice, newPrice);
if (deviation > MAX_PRICE_DEVIATION) {
emit PriceDeviationAlert(currentPrice, newPrice, deviation);
// Require governance multisig to confirm
pendingPriceUpdate = PendingUpdate(newPrice, block.timestamp);
return; // Don't apply automatically
}
_applyPriceUpdate(newPrice);
}
The Aave CAPO incident produced a 2.85% deviation from market price. A 2% circuit breaker would have caught it.
Run a secondary oracle path that validates the primary before it can trigger liquidations:
function getValidatedPrice(address asset) public view returns (uint256) {
uint256 primaryPrice = primaryOracle.getPrice(asset);
uint256 shadowPrice = shadowOracle.getPrice(asset);
uint256 deviation = calculateDeviation(primaryPrice, shadowPrice);
require(
deviation <= MAX_ORACLE_DIVERGENCE,
"Oracle divergence detected — liquidations paused"
);
return primaryPrice;
}
If Aave had cross-validated CAPO's computed price against a direct Chainlink wstETH/ETH feed, the 2.85% divergence would have triggered an alert instead of liquidations.
Track liquidation velocity and pause if it exceeds normal bounds:
uint256 public liquidationCount;
uint256 public liquidationWindowStart;
uint256 public constant MAX_LIQUIDATIONS_PER_HOUR = 10;
uint256 public constant LIQUIDATION_WINDOW = 1 hours;
modifier liquidationRateCheck() {
if (block.timestamp > liquidationWindowStart + LIQUIDATION_WINDOW) {
liquidationCount = 0;
liquidationWindowStart = block.timestamp;
}
liquidationCount++;
require(
liquidationCount <= MAX_LIQUIDATIONS_PER_HOUR,
"Liquidation rate exceeded — manual review required"
);
_;
}
34 liquidations hitting within minutes of an oracle update is an anomaly signal. Rate limiting liquidations after oracle changes provides a second layer of defense.
CAPO was designed to prevent oracle manipulation. Instead, it became the mechanism of harm. Every safety mechanism you add introduces new failure modes — and those failures are often more dangerous because they're trusted by default.
The bug wasn't purely on-chain or purely off-chain. It emerged from the interaction between Chaos Labs' off-chain risk engine and the on-chain CAPO contract. Integration testing across this boundary is critical.
If your liquidation system is fully automated, your safety checks must be too. A human can't outrun a liquidation bot. Circuit breakers, cooldowns, and deviation checks must be on-chain and automatic.
Protocol solvency metrics can mask user harm. Aave remained solvent while 34 users lost $27M in positions. Monitoring protocol health is necessary but not sufficient — you need user-level impact monitoring too.
Aave's DAO treasury compensation is the right immediate response, but the systemic fix requires architectural changes: atomic state updates, liquidation cooldowns, and cross-oracle validation. Paying for the damage is not the same as preventing it.
DeFi protocols are building increasingly sophisticated oracle systems to handle edge cases — correlated assets, liquid staking derivatives, yield-bearing tokens. Each layer of sophistication introduces new coupling points, new state variables, and new desynchronization risks.
The Aave CAPO incident is a preview of what happens as these systems grow more complex. The attack surface isn't just external (flash loan manipulation, oracle frontrunning). It's internal — configuration errors, parameter mismatches, and state desynchronization in the safety systems themselves.
The protocols that survive the next generation of DeFi complexity won't be the ones with the most sophisticated oracles. They'll be the ones that treat their own safety systems with the same paranoia they apply to external threats.
This analysis is based on publicly available incident reports and governance discussions. The code patterns shown are illustrative implementations — adapt to your protocol's architecture and have them independently audited before deployment.