2026-03-17 06:39:11
Hello readers,
I built a proof-of-concept application I call TokenGate. It’s a high performance async/threaded event bus, with control mechanisms designed to be extremely minimalist.
The core concept is to produce parallelism in concurrent operations through async token gathering and coordinated threading workers.
Here's what "TokenGate" uses to thread an operation:
# -- Python 3.12 -- #
from token_system import task_token_guard
from operations_coordinator import OperationsCoordinator
# 1. Decorated standard synchronous function for threading
@task_token_guard(operation_type='string_ops', tags={'weight': 'light'})
def string_operation_task(task_data):
# This function is now threaded
return result
# 2. Starts the coordinator (through a running loop)
coordinator = OperationsCoordinator()
coordinator.start()
# 3. finally or an exception stops on close
coordinator.stop()
Task tokens are generated by using a wrapped decorator.
Here's some test results on operations in a "release mechanism" that dispatches batches of mixed tasks incrementally:
CONCURRENCY BURST: Medium x8 | release 1464 (8 tasks)
======================================================================
Submit spread (barrier jitter): 0.19ms
Overall wall-clock: 0.009045s
Min task duration: 0.007818s
Max task duration: 0.008432s
Mean task duration: 0.008148s
Stdev (clustering indicator): 0.000218s
Duration per task (tight clustering = true concurrency):
Task 00: 0.007928s
Task 01: 0.008000s
Task 02: 0.008136s
Task 03: 0.008209s
Task 04: 0.008432s
Task 05: 0.008300s
Task 06: 0.008362s
Task 07: 0.007818s
Serial estimate (sum): 0.065186s
Actual wall-clock: 0.009045s
Concurrency ratio: 7.21x (concurrent)
CONCURRENCY BURST [Medium x8 | release 1464] PASSED
======================================================================
CONCURRENCY WINDOW: Sustained mixed releases (30s)
======================================================================
Releases: 1484
Total tasks: 11872
Overall wall-clock: 30.070291s
Min task duration: 0.001157s
Max task duration: 0.105874s
Mean task duration: 0.014970s
Stdev (clustering indicator): 0.025983s
Serial estimate (sum): 177.728067s
Actual wall-clock: 30.070291s
Sustained concurrency ratio: 5.91x (concurrent)
CONCURRENCY WINDOW [Sustained mixed releases (30s)] PASSED
CONCURRENCY SUITE COMPLETE.
(Concurrency ratios of up to 7.21x were witnessed on an 8 core CPU with ~32 dynamic workers in ideal conditions, which is roughly 90% of the 8x concurrent operation ceiling.)
I've tested a wide variety of normally threaded operations with result delivery as expected.
It's still a just a proof, however I've used it in various side-projects with good results.
For anyone interested here's my project on GitHub (with proofs):
Repo link - https://github.com/TavariAgent/Py-TokenGate
2026-03-17 06:38:05
A deep-dive into building a sandbox orchestrator that gives AI agents their own isolated machines. Firecracker microVMs, snapshot restore, and why 28ms matters.
tags: go, opensource, ai, devops
I've been building AI agents that generate and execute code. The agents write Python scripts, run data analysis, generate charts, process files. Standard stuff in 2026.
The problem I kept hitting: where does that code actually run?
I tried Docker. It works, but containers share the host kernel. When the runc CVEs dropped in 2024-2025 (CVE-2024-21626, then three more in 2025), I started thinking harder about what "isolation" actually means when an AI is writing arbitrary code on my machine.
I tried E2B. Great product, but my data was leaving my machine. For an internal tool processing company data, that was a non-starter.
So I built ForgeVM. A single Go binary that orchestrates isolated sandboxes. This article is about the hardest part: getting Firecracker microVMs to boot in 28ms.
Firecracker is AWS's microVM manager. It's what powers Lambda and Fargate. Open source, written in Rust, runs on KVM.
The key insight: Firecracker is not QEMU. QEMU emulates an entire PC with hundreds of devices. Firecracker emulates exactly 4 devices:
That's it. No USB, no GPU, no sound card, no PCI bus. This minimal device model is why it's fast and why the attack surface is tiny.
Each Firecracker microVM gets:
A guest exploit can't reach the host because there's a hardware boundary (KVM) between them. Compare that to Docker where a kernel vulnerability affects every container on the host.
Here's the thing though. Booting a Firecracker microVM from scratch takes about 1 second. That includes:
1 second is fine for long-running workloads. It's not fine when your AI agent needs to run print(1+1) and return the result in a chat interface. Users notice 1 second of latency.
I needed sub-100ms. Ideally sub-50ms.
Firecracker supports snapshotting a running VM's complete state to disk. This includes:
When you restore from a snapshot, Firecracker doesn't boot a kernel. It doesn't run init. It doesn't start your agent. It memory-maps the snapshot file, loads the CPU state, and resumes execution from exactly where it left off.
The VM doesn't know it was ever stopped. From the guest's perspective, time just skipped forward.
Here's what this looks like in practice:
# First spawn (cold boot) - ~1 second
1. Start Firecracker process
2. Boot kernel + rootfs
3. Wait for guest agent to signal ready
4. Pause the VM
5. Snapshot memory + CPU + devices to disk
6. Resume the VM, hand it to the user
# Every subsequent spawn - ~28ms
1. Copy the snapshot files (copy-on-write, nearly instant)
2. Start new Firecracker process with --restore-from-snapshot
3. VM resumes exactly where the snapshot was taken
4. Guest agent is already running, already ready
The 28ms breaks down roughly as:
ForgeVM's Firecracker provider manages the snapshot lifecycle. Here's the simplified flow:
func (f *FirecrackerProvider) Spawn(ctx context.Context, opts SpawnOptions) (string, error) {
// Check if we have a snapshot for this image
snap := f.getSnapshot(opts.Image)
if snap != nil {
// Fast path: restore from snapshot (~28ms)
return f.restoreFromSnapshot(ctx, snap, opts)
}
// Slow path: cold boot + create snapshot (~1s)
vm, err := f.coldBoot(ctx, opts)
if err != nil {
return "", err
}
// Wait for guest agent to be ready
f.waitForAgent(ctx, vm)
// Pause VM and snapshot
f.pauseVM(ctx, vm)
f.createSnapshot(ctx, vm)
f.resumeVM(ctx, vm)
return vm.ID, nil
}
The snapshot files are per-image. First time someone spawns python:3.12, it cold-boots, snapshots, and every subsequent python:3.12 spawn restores in 28ms. Different images get different snapshots.
You can't share a single snapshot file across multiple running VMs because each VM writes to memory. The solution is copy-on-write:
This means 50 running VMs from the same snapshot share most of their memory pages. Only the pages that each VM actually wrote are unique. Memory efficient.
Each Firecracker VM runs a custom agent binary (forgevm-agent) as PID 1. The agent:
os/exec
The protocol is simple:
[4 bytes: message length][JSON payload]
Request:
{"type": "exec", "command": "python3 /app/main.py", "workdir": "/workspace"}
Response (streamed):
{"type": "stdout", "data": "hello world\n"}
{"type": "exit", "code": 0}
vsock is important here. It's a virtio socket, not TCP/IP. The guest has no network stack visible to the host. There's no IP address, no port, no routing. Just a direct kernel-to-kernel channel. This eliminates an entire class of network-based attacks.
I actually built a Docker provider too. ForgeVM has a provider interface, and Docker is one of the backends. Here's the honest comparison:
Docker containers:
Firecracker microVMs:
gVisor (via Docker provider with runsc runtime):
In ForgeVM, you switch between these with one config change:
providers:
default: "firecracker" # or "docker"
docker:
runtime: "runc" # or "runsc" for gVisor
Same API. Same SDKs. Same pool mode. Different isolation level.
For development, I use Docker (runs on my Mac). For production, Firecracker. The application code doesn't know or care which provider is active.
This is the part I'm most proud of and it has nothing to do with Firecracker specifically.
Traditional sandbox tools: 1 user = 1 VM (or container). If you have 100 concurrent users, you need 100 VMs. At 512MB each, that's 50GB of RAM just for sandboxes.
ForgeVM's pool mode: 1 VM serves up to N users. Each user gets a logical "sandbox" with its own workspace directory (/workspace/{sandbox-id}/). The orchestrator:
WorkDir to the user's workspacescopedPath() to prevent directory traversal// scopedPath prevents user A from accessing user B's workspace
func scopedPath(vmID, sandboxID, path string) string {
if vmID == "" {
return path // 1:1 mode, no scoping
}
base := "/workspace/" + sandboxID
cleaned := filepath.Clean(filepath.Join(base, path))
if !strings.HasPrefix(cleaned, base+"/") && cleaned != base {
return base // traversal attempt, return base
}
return cleaned
}
100 users, 20 VMs instead of 100. 60% less infrastructure.
The security trade-off is real: pool mode gives you directory-level isolation, not kernel-level. Users in the same VM share a kernel. For internal tools where you trust the users but want to isolate the AI-generated code from the host, this is fine. For multi-tenant public platforms, you'd want the optional per-user UID and PID namespace hardening on top.
Some benchmarks from my development machine (AMD Ryzen 7, 32GB RAM, NVMe SSD):
| Operation | Time |
|---|---|
| Firecracker cold boot | ~1.1s |
| Firecracker snapshot restore | ~28ms |
| Docker container start (alpine) | ~180ms |
| Docker container start (python:3.12) | ~450ms |
| Exec "echo hello" (Firecracker) | ~3ms |
| Exec "echo hello" (Docker) | ~8ms |
| Exec "python3 -c 'print(1)'" (Firecracker) | ~45ms |
| File write 1MB (Firecracker, vsock) | ~12ms |
| File write 1MB (Docker, tar copy) | ~25ms |
| Sandbox destroy (Firecracker) | ~15ms |
| Sandbox destroy (Docker) | ~50ms |
The Firecracker exec latency is lower because vsock is a direct kernel channel, while Docker exec creates a new exec instance and attaches via the Docker daemon.
Start with Docker, not Firecracker. I built the Firecracker provider first because I was excited about 28ms boots. But 80% of people trying ForgeVM don't have KVM available (Mac users, CI/CD, cloud VMs without nested virt). The Docker provider should have been day one.
The guest agent protocol should have been gRPC, not custom JSON. The length-prefixed JSON protocol works fine but I'm essentially maintaining a custom RPC framework. gRPC over vsock would have given me streaming, error codes, and code generation for free.
Pool mode security should have been built-in from the start. The directory-level isolation works, but per-user UIDs and PID namespace isolation should be default-on, not optional. I'm retrofitting this now.
git clone https://github.com/DohaerisAI/forgevm && cd forgevm
./scripts/setup.sh
./forgevm serve
from forgevm import Client
client = Client("http://localhost:7423")
with client.spawn(image="python:3.12") as sb:
result = sb.exec("print('hello from a 28ms sandbox')")
print(result.stdout)
MIT licensed. Single binary. No telemetry. No cloud.
GitHub: github.com/DohaerisAI/forgevm
If you made it this far and found this useful, a star on GitHub genuinely helps with discoverability. Happy to answer questions in the comments about the Firecracker internals, the provider architecture, or the pool mode design.
2026-03-17 06:37:10
The Idea
Most AI tutors are still built around one-way explanation. They deliver information, but they do not really force the learner to explain, defend, or retrieve what they know.
TeachBack flips that dynamic. It is a real-time voice learning app where the user chooses a study topic, a learning mode, and an AI persona, then enters a live conversation with the tutor. Instead of passively consuming answers, the user has to talk through concepts out loud while the AI listens, challenges weak reasoning, asks follow-up questions, and scores the session at the end.
The goal is to make learning active rather than passive.
The Stack
TeachBack is built around Google’s AI and cloud tooling.
For the live conversation layer, I used the Gemini Live API with gemini-2.5-flash-native-audio-preview-12-2025. That powers the real-time voice session: the user speaks, Gemini responds with audio, and both sides are transcribed live during the session. This is what gives the app its conversational feel.
For the non-live tasks, I used gemini-2.5-flash. That model handles things like topic generation, study material preparation, and fallback scoring when I need a structured evaluation outside the live loop.
The backend is built with FastAPI and the Google GenAI SDK (google-genai). The frontend is built with React + Vite, using the Web Audio API to capture microphone input and stream PCM audio over WebSocket to the backend. On the cloud side, the app is deployed on Google Cloud Run, preset study content is stored in Google Cloud Storage, and deployment is automated through Cloud Build and a small deploy.sh script.
How It Works
The app starts with a preset study trail. Each trail contains prepared grounding material plus the original source PDFs. When a user selects a trail, the backend prepares a session and the frontend opens a live WebSocket connection for the tutoring run.
From there, the browser captures mic audio, encodes it as PCM, and sends it to the backend. The backend acts as the bridge between the browser and Gemini Live, forwarding user audio, receiving model audio/transcripts, handling session events, and returning everything back to the UI in real time.
I also built several tutoring behaviors on top of that live loop:
Four learning modes
Explain Mode for Feynman-style explanation
Socratic Mode for guided questioning
Recall Mode for conversational retrieval practice
Teach Mode for interactive instruction
Three personas with distinct voices and behaviors
Curious Kid
Skeptical Peer
Tough Professor
Each persona changes not just the wording of the tutor, but the tone, pacing, and conversational style of the session.
The session booth includes two types of grounding:
an expandable prepared-material panel showing the normalized source-of-truth text the tutor is using
an in-browser PDF viewer so the original source documents can also be inspected directly during the session
That second piece turned out to be important because it makes the system much easier to trust.
The Most Interesting Feature
One of the most interesting parts of the project is the interruption sidecar.
During a live session, the main tutoring flow can pause while a focused correction flow takes over. That correction path can gather clarification, resolve a misunderstanding, and then feed the result back into the main tutoring session so the lesson continues with the right context.
What made this technically interesting was not just showing a new UI window. The hard part was preserving state cleanly across the pause and resume boundary: stopping the main agent at the right moment, disabling the main mic path, collecting the correction context, relaying that information back into the active session, and then resuming without making the whole conversation feel broken.
That interruption-and-recovery behavior is a big part of what I think real-time AI tutors need in order to feel genuinely useful.
What Was Hard
The hardest part was making the real-time session architecture stable.
Once you move beyond static prompt/response interactions, the hard problems change. It becomes much more about:
audio transport
sample-rate handling
turn boundaries
transcript accuracy
playback coordination
session lifecycle
interruption state
One of the biggest lessons from this project was that the model is only part of the challenge. The surrounding systems are where most of the engineering complexity shows up.
I spent a lot of time making sure the app could survive real multi-turn conversation, keep the transcript faithful to what was actually being answered, and recover cleanly when the session needed to pause or redirect.
Why I Built It This Way
I wanted the project to feel like a real learning product rather than just a technical demo of voice AI.
That meant focusing on things like:
grounded study material instead of vague free-form chatting
learning modes based on real pedagogical techniques
personas that feel different to interact with
source visibility so the user can inspect what the tutor is grounded on
correction and interruption behavior so the session can adapt instead of continuing blindly
In other words, I wanted the app to show what a voice-native learning companion could actually feel like when built around understanding rather than just answer generation.
Try It
The app is live here:
https://teachback-ig3hrbcina-uc.a.run.app
Tested in Chrome and Safari.
Built for the Gemini Live Agent Challenge.
Try it
The app is live at https://teachback-ig3hrbcina-uc.a.run.app. Works in Chrome and Safari.
Built for the Gemini Live Agent Challenge. #GeminiLiveAgentChallenge
2026-03-17 06:29:48
Large financial transaction exports can easily overwhelm traditional REST APIs.
When datasets reach hundreds of thousands or even millions of records, generating export files entirely in memory becomes inefficient and sometimes unstable.
Many financial platforms provide some version of an “Export transactions” feature for reconciliation, accounting, tax preparation, or compliance reporting.
At small scale, this is straightforward: query the transactions, generate a file, and return it to the client. Problems start appearing when those exports grow large.
This post explores a practical streaming pattern for handling those exports efficiently while keeping memory usage predictable.
The architectural approach discussed here is also described in more detail in my research paper:
Streaming REST APIs for Large Financial Transaction Exports from Relational Databases
A typical export endpoint might look something like this:
@GET
@Path("/transactions/export")
@Produces("text/csv")
public Response exportTransactions(@QueryParam("accountId") String accountId) {
List<Transaction> transactions = transactionRepository.findAllTransactions(accountId);
String csv = generateCsv(transactions);
return Response.ok(csv)
.header("Content-Disposition", "attachment; filename=transactions.csv")
.build();
}
This pattern works well when datasets for the requested range are small. However, as the dataset grows, major problems begin to appear.
For large exports, this can lead to:
When exports reach hundreds of thousands or millions of records, these issues become noticeable very quickly.
Instead of building the entire export file first, a more scalable approach is to stream the data from the database directly to the HTTP response.
The idea is simple:
Conceptually the pipeline looks like this:
Database → API → Format Encoder → HTTP Response → Browser
Each transaction flows through this pipeline and is delivered to the client immediately.
The server never needs to hold the full dataset in memory.
At a high level, a streaming export pipeline avoids assembling the full export file in memory by moving records through a continuous response path.
Each record flows through the pipeline independently.
Because records are processed sequentially, the server never needs to hold the full dataset in memory.
The first step is ensuring the database driver retrieves rows incrementally instead of loading the entire result set.
Using JDBC, this can be achieved with a forward-only cursor and a fetch size.
try (
Connection connection = dataSource.getConnection();
PreparedStatement stmt = connection.prepareStatement(
"SELECT date, description, amount FROM transactions WHERE account_id = ?",
ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY
)
) {
stmt.setString(1, accountId);
stmt.setFetchSize(1000);
try (ResultSet rs = stmt.executeQuery()) {
while (rs.next()) {
Date date = rs.getDate("date");
String description = rs.getString("description");
BigDecimal amount = rs.getBigDecimal("amount");
processRow(date, description, amount);
}
}
}
This allows the application to process transactions one row at a time, keeping memory usage predictable even for very large datasets.
Once rows are processed incrementally, they can be written directly to the HTTP response stream.
JAX-RS provides a convenient mechanism for this using StreamingOutput.
Here is a simplified example of a streaming export endpoint:
@GET
@Path("/transactions/export")
@Produces("text/csv")
public Response exportTransactions(
@QueryParam("accountId") String accountId,
@QueryParam("startDate") String startDate,
@QueryParam("endDate") String endDate) {
StreamingOutput stream = output -> {
try (
Connection conn = dataSource.getConnection();
PreparedStatement stmt = conn.prepareStatement(
"SELECT date, description, amount " +
"FROM transactions " +
"WHERE account_id = ? " +
"AND date BETWEEN ? AND ? "+
"ORDER BY date",
ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY
)
) {
stmt.setString(1, accountId);
stmt.setDate(2, java.sql.Date.valueOf(startDate));
stmt.setDate(3, java.sql.Date.valueOf(endDate));
stmt.setFetchSize(1000);
try (
ResultSet rs = stmt.executeQuery();
PrintWriter writer = new PrintWriter(output)
) {
while (rs.next()) {
writer.print(rs.getDate("date"));
writer.print(",");
writer.print(rs.getString("description"));
writer.print(",");
writer.println(rs.getBigDecimal("amount"));
writer.flush();
}
}
}
};
return Response.ok(stream)
.header("Content-Disposition", "attachment; filename=transactions.csv")
.build();
}
Once the response begins streaming, the browser starts downloading the file immediately.
There is no need to wait for the entire dataset to be processed.
Financial platforms often support multiple export formats depending on the client system being used.
Common examples include:
A clean way to support these formats is to separate the export pipeline from the encoding logic.
For example:
public interface ExportEncoder {
void start(OutputStream outputStream) throws IOException;
void writeTransaction(Transaction transaction) throws IOException;
void finish() throws IOException;
}
Each format can then implement its own encoder while the streaming pipeline remains unchanged.
This makes the export system easy to extend as new formats are required. This separation also keeps transport logic independent from file-format concerns, which makes testing and maintenance simpler.
To understand the impact of streaming, it helps to compare how memory behaves in the two approaches.
Traditional implementations load the entire dataset first.
List<Transaction> transactions = repository.findAllTransactions();
generateCSV(transactions);
Memory usage grows with dataset size because the full collection of transactions must be held in memory.
With streaming, rows are processed one at a time.
while (rs.next()) {
encode(rs);
writeToResponse();
}
Only a small working set is required for the current fetch batch and output buffer.
As a result:
Streaming exports provide several practical advantages.
Memory usage remains low because transactions are processed incrementally rather than stored in large collections.
Users also experience faster response times because the download begins immediately instead of waiting for the server to build the entire export file.
Most importantly, the API infrastructure remains stable even when multiple large exports are requested concurrently.
For platforms that frequently generate large transaction exports, this architecture can significantly improve reliability and scalability.
While streaming exports solve many scalability issues, there are a few practical considerations when implementing them in production systems.
Database timeouts
Long-running exports may require increased query timeouts depending on the database configuration.
Connection management
Streaming queries keep database connections open while data is being processed. Connection pool sizes should account for this behavior. In systems with frequent exports, it may also be useful to isolate export traffic from latency-sensitive request paths.
Backpressure
If the client download speed is slow, the server may block while writing to the response stream. Proper thread management is important to avoid tying up request threads unnecessarily.
Export limits
Some platforms enforce export limits or pagination windows to prevent excessively large exports from overwhelming infrastructure.
Even with these considerations, streaming remains one of the most effective techniques for handling large dataset exports.
Exporting large datasets is one of those features that seems trivial at first but becomes challenging as systems scale.
Streaming the export path from database to HTTP response is a simple and effective way to handle large exports at scale.
By processing transactions incrementally and delivering them directly to the client, APIs can handle large exports efficiently without excessive memory consumption.
In systems where large exports are common, adopting a streaming architecture can often be the difference between an export feature that works occasionally and one that scales reliably.
Although this article focuses on financial transaction exports, the same streaming approach can be applied to any API that returns large datasets—such as reporting endpoints, audit logs, analytics exports, or bulk data downloads.
2026-03-17 06:23:05
My AI agent was blind.
It could read text, write code, call APIs — but the moment I asked it to work with a webpage, it hit a wall. "Go check if this landing page looks broken." "Tell me what the pricing page says now." "Monitor this competitor's homepage for changes." All blocked.
The obvious fix: give it a browser. The actual experience: install Puppeteer, debug the Chrome binary path, hit memory limits in Lambda, watch it break on every third-party CDN that detects headless browsers. An afternoon of yak-shaving every time.
I built SnapAPI to fix this.
A REST API that wraps a headless browser. You send a URL, you get back a screenshot, PDF, or structured page data. No Puppeteer, no containers, no Chrome binary management.
Three lines of Python vs. a weekend of DevOps:
import requests
resp = requests.get(
"https://snapapi.tech/v1/analyze",
params={"url": "https://example.com"},
headers={"X-API-Key": "YOUR_KEY"}
)
data = resp.json()
print(data["title"]) # "Example Domain"
print(data["text_summary"]) # "This domain is for use in illustrative examples..."
This is the workhorse for AI pipelines. Instead of dumping raw HTML into a context window, I use /v1/analyze to get a clean, token-efficient JSON summary:
curl "https://snapapi.tech/v1/analyze?url=https://news.ycombinator.com" \
-H "X-API-Key: YOUR_KEY"
Response:
{
"url": "https://news.ycombinator.com",
"title": "Hacker News",
"description": "Links to stuff",
"headings": [
{ "level": 1, "text": "Hacker News" }
],
"links": [
{ "text": "new", "href": "https://news.ycombinator.com/newest" },
{ "text": "past", "href": "https://news.ycombinator.com/front" },
{ "text": "comments", "href": "https://news.ycombinator.com/newcomments" }
],
"text_summary": "Ask HN: What are you working on? | 312 comments\nShow HN: ...",
"load_time_ms": 847
}
Feed that text_summary to GPT-4 instead of the raw HTML. You go from 150k tokens of angle brackets to 2k tokens of actual content.
AI agents that can see screenshots can verify things that text parsing misses: broken layouts, missing images, visual regressions, forms that didn't render.
curl "https://snapapi.tech/v1/screenshot?url=https://snapapi.tech&width=1280&height=800&format=png" \
-H "X-API-Key: YOUR_KEY" \
--output page.png
Parameters worth knowing:
full_page=true — captures the entire scrollable page, not just the viewportdark_mode=true — renders in dark mode (useful for testing)block_ads=true — blocks ad scripts before capturewait_for_selector=.main-content — waits for a specific element before shootingdelay=1000 — waits N milliseconds after load (for JS-heavy SPAs)I pipe screenshots directly to GPT-4V: "Does this page look broken? What changed since yesterday?"
When you're monitoring a competitor's entire pricing page, checking 50 product pages for freshness, or building a dataset:
curl -X POST "https://snapapi.tech/v1/batch" \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://competitor-a.com/pricing",
"https://competitor-b.com/pricing",
"https://competitor-c.com/pricing"
],
"endpoint": "analyze",
"params": {}
}'
Response:
{
"total": 3,
"succeeded": 3,
"failed": 0,
"duration_ms": 2841,
"results": [
{ "url": "https://competitor-a.com/pricing", "title": "Pricing — CompA", "text_summary": "..." },
{ "url": "https://competitor-b.com/pricing", "title": "Plans — CompB", "text_summary": "..." },
{ "url": "https://competitor-c.com/pricing", "title": "Pricing — CompC", "text_summary": "..." }
]
}
One API call. Three pages. No rate limit juggling, no thread management.
AI research assistant: Agent gets asked "what does Company X's product page say?" — calls /v1/analyze, feeds structured JSON to the LLM instead of raw HTML. Works reliably even on JS-heavy SPAs.
Automated visual regression: Cron job calls /v1/screenshot on a set of pages after every deploy. Screenshots stored in S3. If diff score exceeds threshold, Slack alert fires. Cost: ~$0.001/screenshot.
Competitive monitoring: Weekly job batches competitor pricing and feature pages. LLM diffs the extracted text against last week's version. Email alert on any meaningful change.
OG image generation: /v1/render takes raw HTML and returns a screenshot. Feed it a styled HTML template, get back a 1200×630 social share image. No canvas, no serverless Chrome, no font loading headaches.
/v1/pdf — generates a PDF from any URL. Useful for reports, invoices, archival. Supports custom margins, landscape mode, background printing.
/v1/metadata — lightweight, fast metadata pull (title, og:image, canonical, favicon) without a full render. Use when you just need basic page info without executing JavaScript.
Free tier requires no credit card. API key in your dashboard within 30 seconds.
# 1. Sign up at https://snapapi.tech
# 2. Copy your API key from the dashboard
# 3. Try it:
curl "https://snapapi.tech/v1/analyze?url=https://example.com" \
-H "X-API-Key: YOUR_KEY"
Full docs at snapapi.tech/docs.
If your AI pipeline currently handles URLs by fetching raw HTML and dumping it into the context window — this is a direct upgrade. Structured output, real rendering, lower token cost, higher reliability.
2026-03-17 06:20:04
Empecé con Kiro hace unos meses, en octubre de 2025, durante un hackathon llamado Kiroween. Fue básicamente mi punto de partida para empezar a probar Kiro como IDE y entender qué estaba intentando hacer AWS con esta propuesta. Mi objetivo era probar Kiro y ver en qué se diferenciaba de otras herramientas que ya estaba usando.
En aquel entonces tampoco había demasiadas guías sobre Kiro. Así que casi todo lo que hice fue exploratorio: abrir proyectos, probar el vibecoding, entender eso del spec-driven development y ver cómo respondía el IDE.
Con el paso de los meses y metiéndome más en la comunidad de Kiro, empecé a descubrir todo lo que realmente hay detrás del IDE. No es solamente un editor con IA. Hay todo un sistema alrededor de cómo gestionar el contexto de la IA, cómo configurar el comportamiento del IDE y cómo gobernar lo que hace.
Ahí es donde empiezan a aparecer conceptos como:
Si ya estás usando Kiro, o estás pensando en usarlo, y aun no sabes como encajan todas estas piezas, este artículo es para ti.
Si todavía no tienes Kiro instalado, aquí tienes los enlaces:
Con eso deberías poder tener el IDE funcionando en unos minutos.
A partir de aquí es donde empieza lo interesante.
Los Steerings son simplemente archivos Markdown donde defines instrucciones, reglas o contexto que quieres que Kiro tenga en cuenta cuando trabaja en tu proyecto.
No solamente sirven para dar instrucciones. También pueden contener información estructural sobre tu proyecto.
Por ejemplo:
Uno de los casos más comunes es pedirle a Kiro que genere los Foundational Steerings.
Cuando haces esto, Kiro escanea tu proyecto y genera varios archivos dentro del directorio de configuración de Kiro. Normalmente incluyen:
Todo esto se genera usando el contexto que Kiro detecta en el repositorio. Es un buen punto de partida porque te da una base de documentación que la IA puede usar cuando trabaja en el código.
Además de los foundational steerings, hay varios tipos que son bastante útiles añadir en proyectos reales.
Por ejemplo:
Reglas de trabajo en el proyecto
Cómo trabaja el equipo en ese repositorio. Esto puede incluir cosas como:
Esto ayuda mucho a que Kiro no empiece a generar cosas que se salgan del estilo del proyecto.
Estilo de documentación
Puedes definir cómo se documenta el código, qué secciones deben aparecer, o cómo se escriben los READMEs.
Convenciones de arquitectura
Si tu proyecto sigue un patrón concreto, hexagonal, event driven, clean architecture, etc., tener eso definido en un steering ayuda bastante a mantener consistencia.
Cuanto más claro le dejes a Kiro cómo funciona tu proyecto, mejores resultados vas a obtener.
Los Hooks son otra pieza bastante potente dentro de Kiro. Básicamente son prompts que se ejecutan automáticamente cuando ocurre un evento.
Unos ejemplos de esos eventos son:
Un ejemplo bastante simple:
Si estás modificando archivos Markdown, puedes tener un hook que:
Otro ejemplo:
Si estás modificando código, podrías tener un hook que genere automáticamente documentación sobre los cambios que se han hecho.
Esto abre la puerta a automatizar bastantes cosas dentro del flujo de desarrollo.
Los hooks tienen distinta maneras de dispararse: hook triggers
Mi recomendación es empezar siempre con hooks manuales.
Primero pruebas qué hacen, verificas que el resultado es el esperado y te aseguras de que no están haciendo nada raro.
Cuando ya estás contento con el resultado entonces sí puedes empezar a automatizarlos.
Recuerda que los hooks lanzan una tarea en Kiro, si tienes muchos hooks automatizados eso puede acarrear un coste elevado de creditos. Esa fue una de las primeras lecciones que aprendi con los hooks.
Los MCPs son otra de las piezas importantes dentro de Kiro y de la IA en general.
Para explicarlo de una manera muy sencilla, un MCP es básicamente como una API para que la IA lo use. Los MCPs permiten conectar la IA con servicios externos. Le da acceso a información o herramientas que normalmente no estarían dentro del contexto del proyecto.
Algunos ejemplos bastante comunes:
Puedes conectar Kiro con documentación, herramientas externas o sistemas internos del equipo y hacer que la IA pueda trabajar con todo eso.
La parte no tan buena de los MCPs es que añade contexto dentro de la sesión de IA. Y cuantos mas MCPs tengas activados mas contexto va a consumir, causando que te cueste mas las tareas.
Muchas veces no necesitas tener todos los MCPs activos todo el tiempo. Y aquí es donde entran los Kiro Powers.
Los powers permiten encapsular tareas o dominios que se repiten mucho. Includas tareas que usen MCPs.
Imaginate que tienes un MCP de la base de datos Supabase. Pero para esta tarea sabes que no vas a necesitar la base de datos. Si lo encapsulas con un Power, Kiro lo invocaria solo cuando necesites la base de datos, liberandote el contexto que usaria ese MCP cuando no lo uses.
Hace tiempo escribir un articulo sobre la creacion de mi primer Power (esta en Ingles, sorry): Construyendo mi primer Kiro Power - Posthog Observability.
A principios de este año, Kiro anuncio los agentes customizables o sub-agentes. Los agentes ya existian dentro de Kiro, esto lo habréis visto si ya habéis hecho el spec-driven development en el IDE o el modo plan con el CLI. Ambos usan agentes que ya están definidos dentro de Kiro, para ejecutar una tipo especifico de tarea.
Los agents tienen una structura y un prompt propio que lo hace comportarse de la manera especificada.
Lo que hacen es que tienen una estructura y un prompt interno en el cual solamente aparece el agente. Tiene ciertas herramientas en las cuales están habilitadas o no. Y pueden tener selecionado un modelo.
Por ejemplo, el agente del plan se comporta como un guia para planificar una tarea dada, te hace preguntas acerca de la tarea y te hace un plan detallado. Este agente tiene la herramienta de escribir ficheros deshabilitada, ya que solamente es un plan el que vas a hacer, no vas a hacer la implementación.
El plan es un agente que ya estaba pre-definido por Kiro. Ahora tú puedes hacer los tuyos propios, lo que significa que puedes llevar esto al siguiente nivel. O cuatro niveles mas alla. Y por que cuatro? porque con los sub-agentes puedes tener cuatro agentes corriendo en paralelo.
Los agentes que siempre uso son agentes que estén relacionados con lo que estás haciendo en el proyecto como tal.
Por ejemplo, tengo un proyecto que es de TypeScript, pues tengo un agente que es experto en TypeScript. Tiene un prompt especifico de como trabajar como un experto de TypeScript. Y le doy a habilitar tools de write y read.
Otro ejemplo que tengo es para tareas de planificación, arquitectura y toma de decisiones.
Normalmente, si le preguntas al mejor modelo que tengas, te puede dar una opción que sea bastante buena, pero lo ideal es que tengas varias opciones, ya que la IA es no-deterministica y puede alucinar.
A la hora de hacer planificación, arquitectura, toma de decisiones, asuntos en los cuales sea más crear un plan o un diseño, y no una implementación, en la vida real te juntas con un equipo, haces una lluvia de ideas y tienes diferentes opiniones por parte del equipo.
¿Cómo llevariamos esto dentro de Kiro? Tienes agentes customizables que tienen un prompt específico, por ejemplo en planificar. Como en los agentes customizables tú puedes decir qué modelo está usando, le puedes dar un modelo distinto para el mismo tipo de agente. Por ejemplo: Opus, Sonnet, Haiku y Auto, cuatro modelos.
A la hora de hacer un plan, le pido a Kiro que use mi Consejo de Agentes en paralelo. Kiro lanza estos cuatro modelos corriendo en paralelo haciéndome un plan independiente cada uno. Una vez terminan, el agente por defecto me compila esos cuatro planes, cogiendo lo mejor de cada uno.
Esto es algo que cambia el modelo de hacer una planificación completamente. Haces un brainstorming de varios modelos y te aseguras de tener siempre lo mejor de lo mejor. A lo mejor un modelo puede alucinar, pero que cuatro alucinen a la vez es más difícil.
La parte que me llamó más la atención de Kiro es el spec-driven development que ya estaba incluido dentro del IDE.
Para los desarolladores este no es un concepto del todo nuevo, es parecido a definir un ticket, darle unos requerimientos, sacar el diseño y luego implementar el ticket basado en distintas sub-tareas.
El spec-mode de Kiro te crea unos requerimientos, un diseño y genera una lista de tareas, todo en ficheros markdown, en los cuales tú los puedes revisar y tener contexto de tu aplicación o tu repositorio.
Si os interesa que escriba un articulo mas centrado en Spec-Driven Development, dejadme un comentario con vuestro interes.
En este artículo hemos visto lo que son los Steerings, Hooks, MCPs, Powers y los agentes customizables, y hemos visto por encima el spec-driven development.
Todas estas funcionalidades dentro de Kiro se define como el contexto de la IA. Si habeis escuchado hablar del Context-Driven Development, es este contexto al que se refieren.
Este articulo es el resultado de un aprendizaje que he hecho a lo largo de los meses. Y el aprendizaje es continuo, me sigo encontrando nuevas funcionalidades, como por ejemplo los skills, que no los he mencionado, pero es algo que también está incluido dentro de Kiro y es se han hecho muy famosos recientemente.
Si no estás usando Kiro, te recomendaría que lo pruebes, tanto el IDE como por la línea de comandos, el CLI, y que pruebes el spec-driven development.
Y si te interesa saber mas sobre Kiro, el Context-Driven Development y de Spec-Driven Development, o quieres comentar tu experiencia con Kiro, no dudes en contáctarme o escríbelo en los comentarios.