MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

Threading Async Together

2026-03-17 06:39:11

Hello readers,

I built a proof-of-concept application I call TokenGate. It’s a high performance async/threaded event bus, with control mechanisms designed to be extremely minimalist.

The core concept is to produce parallelism in concurrent operations through async token gathering and coordinated threading workers.

Here's what "TokenGate" uses to thread an operation:

# -- Python 3.12 -- #
from token_system import task_token_guard
from operations_coordinator import OperationsCoordinator

# 1. Decorated standard synchronous function for threading
@task_token_guard(operation_type='string_ops', tags={'weight': 'light'})
def string_operation_task(task_data):
    # This function is now threaded
    return result

# 2. Starts the coordinator (through a running loop)
coordinator = OperationsCoordinator()
coordinator.start()

# 3. finally or an exception stops on close
coordinator.stop()

Task tokens are generated by using a wrapped decorator.

Here's some test results on operations in a "release mechanism" that dispatches batches of mixed tasks incrementally:

CONCURRENCY BURST: Medium x8 | release 1464 (8 tasks)
======================================================================
  Submit spread (barrier jitter): 0.19ms
  Overall wall-clock:             0.009045s
  Min task duration:              0.007818s
  Max task duration:              0.008432s
  Mean task duration:             0.008148s
  Stdev (clustering indicator):   0.000218s

  Duration per task (tight clustering = true concurrency):
    Task 00: 0.007928s  
    Task 01: 0.008000s  
    Task 02: 0.008136s  
    Task 03: 0.008209s  
    Task 04: 0.008432s  
    Task 05: 0.008300s  
    Task 06: 0.008362s  
    Task 07: 0.007818s  

  Serial estimate (sum):  0.065186s
  Actual wall-clock:      0.009045s
  Concurrency ratio:      7.21x  (concurrent)

CONCURRENCY BURST [Medium x8 | release 1464] PASSED
======================================================================
CONCURRENCY WINDOW: Sustained mixed releases (30s)
======================================================================
  Releases:                       1484
  Total tasks:                    11872
  Overall wall-clock:             30.070291s
  Min task duration:              0.001157s
  Max task duration:              0.105874s
  Mean task duration:             0.014970s
  Stdev (clustering indicator):   0.025983s

  Serial estimate (sum):          177.728067s
  Actual wall-clock:              30.070291s
  Sustained concurrency ratio:    5.91x  (concurrent)

CONCURRENCY WINDOW [Sustained mixed releases (30s)] PASSED

CONCURRENCY SUITE COMPLETE.

(Concurrency ratios of up to 7.21x were witnessed on an 8 core CPU with ~32 dynamic workers in ideal conditions, which is roughly 90% of the 8x concurrent operation ceiling.)

I've tested a wide variety of normally threaded operations with result delivery as expected.

It's still a just a proof, however I've used it in various side-projects with good results.

For anyone interested here's my project on GitHub (with proofs):

Repo link - https://github.com/TavariAgent/Py-TokenGate

How I built sandboxes that boot in 28ms using Firecracker snapshots

2026-03-17 06:38:05

A deep-dive into building a sandbox orchestrator that gives AI agents their own isolated machines. Firecracker microVMs, snapshot restore, and why 28ms matters.
tags: go, opensource, ai, devops

I've been building AI agents that generate and execute code. The agents write Python scripts, run data analysis, generate charts, process files. Standard stuff in 2026.

The problem I kept hitting: where does that code actually run?

I tried Docker. It works, but containers share the host kernel. When the runc CVEs dropped in 2024-2025 (CVE-2024-21626, then three more in 2025), I started thinking harder about what "isolation" actually means when an AI is writing arbitrary code on my machine.

I tried E2B. Great product, but my data was leaving my machine. For an internal tool processing company data, that was a non-starter.

So I built ForgeVM. A single Go binary that orchestrates isolated sandboxes. This article is about the hardest part: getting Firecracker microVMs to boot in 28ms.

What Firecracker actually is

Firecracker is AWS's microVM manager. It's what powers Lambda and Fargate. Open source, written in Rust, runs on KVM.

The key insight: Firecracker is not QEMU. QEMU emulates an entire PC with hundreds of devices. Firecracker emulates exactly 4 devices:

  • virtio-block (disk)
  • virtio-net (network)
  • serial console
  • 1-button keyboard (just to stop the VM)

That's it. No USB, no GPU, no sound card, no PCI bus. This minimal device model is why it's fast and why the attack surface is tiny.

Each Firecracker microVM gets:

  • Its own Linux kernel
  • Its own root filesystem
  • Its own network namespace
  • Communication with the host via vsock (virtio socket, not TCP)

A guest exploit can't reach the host because there's a hardware boundary (KVM) between them. Compare that to Docker where a kernel vulnerability affects every container on the host.

The cold boot problem

Here's the thing though. Booting a Firecracker microVM from scratch takes about 1 second. That includes:

  1. Firecracker process starts (~50ms)
  2. Load kernel into memory (~100ms)
  3. Kernel boots, init runs (~500ms)
  4. Guest agent starts and signals ready (~200ms)

1 second is fine for long-running workloads. It's not fine when your AI agent needs to run print(1+1) and return the result in a chat interface. Users notice 1 second of latency.

I needed sub-100ms. Ideally sub-50ms.

The snapshot trick

Firecracker supports snapshotting a running VM's complete state to disk. This includes:

  • Full memory contents (the entire RAM, written to a file)
  • CPU register state (instruction pointer, stack pointer, all registers)
  • Device state (virtio queues, serial port state)

When you restore from a snapshot, Firecracker doesn't boot a kernel. It doesn't run init. It doesn't start your agent. It memory-maps the snapshot file, loads the CPU state, and resumes execution from exactly where it left off.

The VM doesn't know it was ever stopped. From the guest's perspective, time just skipped forward.

Here's what this looks like in practice:

# First spawn (cold boot) - ~1 second
1. Start Firecracker process
2. Boot kernel + rootfs
3. Wait for guest agent to signal ready
4. Pause the VM
5. Snapshot memory + CPU + devices to disk
6. Resume the VM, hand it to the user

# Every subsequent spawn - ~28ms
1. Copy the snapshot files (copy-on-write, nearly instant)
2. Start new Firecracker process with --restore-from-snapshot
3. VM resumes exactly where the snapshot was taken
4. Guest agent is already running, already ready

The 28ms breaks down roughly as:

  • ~5ms: Firecracker process startup
  • ~8ms: mmap the memory snapshot file
  • ~10ms: restore CPU and device state
  • ~5ms: vsock reconnection and ready signal

How I implemented it in Go

ForgeVM's Firecracker provider manages the snapshot lifecycle. Here's the simplified flow:

func (f *FirecrackerProvider) Spawn(ctx context.Context, opts SpawnOptions) (string, error) {
    // Check if we have a snapshot for this image
    snap := f.getSnapshot(opts.Image)

    if snap != nil {
        // Fast path: restore from snapshot (~28ms)
        return f.restoreFromSnapshot(ctx, snap, opts)
    }

    // Slow path: cold boot + create snapshot (~1s)
    vm, err := f.coldBoot(ctx, opts)
    if err != nil {
        return "", err
    }

    // Wait for guest agent to be ready
    f.waitForAgent(ctx, vm)

    // Pause VM and snapshot
    f.pauseVM(ctx, vm)
    f.createSnapshot(ctx, vm)
    f.resumeVM(ctx, vm)

    return vm.ID, nil
}

The snapshot files are per-image. First time someone spawns python:3.12, it cold-boots, snapshots, and every subsequent python:3.12 spawn restores in 28ms. Different images get different snapshots.

The copy-on-write detail

You can't share a single snapshot file across multiple running VMs because each VM writes to memory. The solution is copy-on-write:

  1. The base snapshot is read-only
  2. Each new VM gets a CoW overlay for both the memory file and the rootfs
  3. Writes go to the overlay, reads fall through to the base
  4. On destroy, delete the overlay. Base snapshot stays pristine.

This means 50 running VMs from the same snapshot share most of their memory pages. Only the pages that each VM actually wrote are unique. Memory efficient.

The guest agent

Each Firecracker VM runs a custom agent binary (forgevm-agent) as PID 1. The agent:

  • Listens on vsock for commands from the host
  • Executes commands via os/exec
  • Handles file read/write/list/delete operations
  • Streams stdout/stderr back to the host in real-time
  • Uses a length-prefixed JSON protocol over the vsock connection

The protocol is simple:

[4 bytes: message length][JSON payload]

Request:

{"type": "exec", "command": "python3 /app/main.py", "workdir": "/workspace"}

Response (streamed):

{"type": "stdout", "data": "hello world\n"}
{"type": "exit", "code": 0}

vsock is important here. It's a virtio socket, not TCP/IP. The guest has no network stack visible to the host. There's no IP address, no port, no routing. Just a direct kernel-to-kernel channel. This eliminates an entire class of network-based attacks.

Why not just Docker?

I actually built a Docker provider too. ForgeVM has a provider interface, and Docker is one of the backends. Here's the honest comparison:

Docker containers:

  • Boot: ~200-500ms
  • Isolation: Linux namespaces + cgroups + seccomp
  • Attack surface: Shared host kernel. Every syscall from the container hits the real kernel.
  • KVM needed: No
  • Runs on: Linux, Mac, Windows

Firecracker microVMs:

  • Boot: ~28ms (snapshot) / ~1s (cold)
  • Isolation: KVM hardware virtualization. Separate kernel per sandbox.
  • Attack surface: Minimal VMM with 4 devices. Guest kernel is a separate kernel.
  • KVM needed: Yes
  • Runs on: Linux with /dev/kvm

gVisor (via Docker provider with runsc runtime):

  • Boot: ~300-800ms
  • Isolation: User-space kernel intercepts syscalls. ~70 host syscalls exposed.
  • Attack surface: Much smaller than Docker, larger than Firecracker.
  • KVM needed: No
  • Runs on: Linux

In ForgeVM, you switch between these with one config change:

providers:
  default: "firecracker"  # or "docker"
  docker:
    runtime: "runc"        # or "runsc" for gVisor

Same API. Same SDKs. Same pool mode. Different isolation level.

For development, I use Docker (runs on my Mac). For production, Firecracker. The application code doesn't know or care which provider is active.

Pool mode: the resource trick

This is the part I'm most proud of and it has nothing to do with Firecracker specifically.

Traditional sandbox tools: 1 user = 1 VM (or container). If you have 100 concurrent users, you need 100 VMs. At 512MB each, that's 50GB of RAM just for sandboxes.

ForgeVM's pool mode: 1 VM serves up to N users. Each user gets a logical "sandbox" with its own workspace directory (/workspace/{sandbox-id}/). The orchestrator:

  1. Routes all exec calls to the shared VM but sets WorkDir to the user's workspace
  2. Rewrites all file paths through scopedPath() to prevent directory traversal
  3. Tracks user count per VM and creates new VMs when capacity is full
  4. Destroys VMs only when all users have left
// scopedPath prevents user A from accessing user B's workspace
func scopedPath(vmID, sandboxID, path string) string {
    if vmID == "" {
        return path  // 1:1 mode, no scoping
    }
    base := "/workspace/" + sandboxID
    cleaned := filepath.Clean(filepath.Join(base, path))
    if !strings.HasPrefix(cleaned, base+"/") && cleaned != base {
        return base  // traversal attempt, return base
    }
    return cleaned
}

100 users, 20 VMs instead of 100. 60% less infrastructure.

The security trade-off is real: pool mode gives you directory-level isolation, not kernel-level. Users in the same VM share a kernel. For internal tools where you trust the users but want to isolate the AI-generated code from the host, this is fine. For multi-tenant public platforms, you'd want the optional per-user UID and PID namespace hardening on top.

Numbers

Some benchmarks from my development machine (AMD Ryzen 7, 32GB RAM, NVMe SSD):

Operation Time
Firecracker cold boot ~1.1s
Firecracker snapshot restore ~28ms
Docker container start (alpine) ~180ms
Docker container start (python:3.12) ~450ms
Exec "echo hello" (Firecracker) ~3ms
Exec "echo hello" (Docker) ~8ms
Exec "python3 -c 'print(1)'" (Firecracker) ~45ms
File write 1MB (Firecracker, vsock) ~12ms
File write 1MB (Docker, tar copy) ~25ms
Sandbox destroy (Firecracker) ~15ms
Sandbox destroy (Docker) ~50ms

The Firecracker exec latency is lower because vsock is a direct kernel channel, while Docker exec creates a new exec instance and attaches via the Docker daemon.

What I'd do differently

Start with Docker, not Firecracker. I built the Firecracker provider first because I was excited about 28ms boots. But 80% of people trying ForgeVM don't have KVM available (Mac users, CI/CD, cloud VMs without nested virt). The Docker provider should have been day one.

The guest agent protocol should have been gRPC, not custom JSON. The length-prefixed JSON protocol works fine but I'm essentially maintaining a custom RPC framework. gRPC over vsock would have given me streaming, error codes, and code generation for free.

Pool mode security should have been built-in from the start. The directory-level isolation works, but per-user UIDs and PID namespace isolation should be default-on, not optional. I'm retrofitting this now.

Try it

git clone https://github.com/DohaerisAI/forgevm && cd forgevm
./scripts/setup.sh
./forgevm serve
from forgevm import Client

client = Client("http://localhost:7423")
with client.spawn(image="python:3.12") as sb:
    result = sb.exec("print('hello from a 28ms sandbox')")
    print(result.stdout)

MIT licensed. Single binary. No telemetry. No cloud.

GitHub: github.com/DohaerisAI/forgevm

If you made it this far and found this useful, a star on GitHub genuinely helps with discoverability. Happy to answer questions in the comments about the Firecracker internals, the provider architecture, or the pool mode design.

"Building a Real-Time AI Tutor with Gemini Live"

2026-03-17 06:37:10

The Idea
Most AI tutors are still built around one-way explanation. They deliver information, but they do not really force the learner to explain, defend, or retrieve what they know.

TeachBack flips that dynamic. It is a real-time voice learning app where the user chooses a study topic, a learning mode, and an AI persona, then enters a live conversation with the tutor. Instead of passively consuming answers, the user has to talk through concepts out loud while the AI listens, challenges weak reasoning, asks follow-up questions, and scores the session at the end.

The goal is to make learning active rather than passive.

The Stack
TeachBack is built around Google’s AI and cloud tooling.

For the live conversation layer, I used the Gemini Live API with gemini-2.5-flash-native-audio-preview-12-2025. That powers the real-time voice session: the user speaks, Gemini responds with audio, and both sides are transcribed live during the session. This is what gives the app its conversational feel.

For the non-live tasks, I used gemini-2.5-flash. That model handles things like topic generation, study material preparation, and fallback scoring when I need a structured evaluation outside the live loop.

The backend is built with FastAPI and the Google GenAI SDK (google-genai). The frontend is built with React + Vite, using the Web Audio API to capture microphone input and stream PCM audio over WebSocket to the backend. On the cloud side, the app is deployed on Google Cloud Run, preset study content is stored in Google Cloud Storage, and deployment is automated through Cloud Build and a small deploy.sh script.

How It Works
The app starts with a preset study trail. Each trail contains prepared grounding material plus the original source PDFs. When a user selects a trail, the backend prepares a session and the frontend opens a live WebSocket connection for the tutoring run.

From there, the browser captures mic audio, encodes it as PCM, and sends it to the backend. The backend acts as the bridge between the browser and Gemini Live, forwarding user audio, receiving model audio/transcripts, handling session events, and returning everything back to the UI in real time.

I also built several tutoring behaviors on top of that live loop:

Four learning modes

Explain Mode for Feynman-style explanation
Socratic Mode for guided questioning
Recall Mode for conversational retrieval practice
Teach Mode for interactive instruction
Three personas with distinct voices and behaviors

Curious Kid
Skeptical Peer
Tough Professor
Each persona changes not just the wording of the tutor, but the tone, pacing, and conversational style of the session.

The session booth includes two types of grounding:

an expandable prepared-material panel showing the normalized source-of-truth text the tutor is using
an in-browser PDF viewer so the original source documents can also be inspected directly during the session
That second piece turned out to be important because it makes the system much easier to trust.

The Most Interesting Feature
One of the most interesting parts of the project is the interruption sidecar.

During a live session, the main tutoring flow can pause while a focused correction flow takes over. That correction path can gather clarification, resolve a misunderstanding, and then feed the result back into the main tutoring session so the lesson continues with the right context.

What made this technically interesting was not just showing a new UI window. The hard part was preserving state cleanly across the pause and resume boundary: stopping the main agent at the right moment, disabling the main mic path, collecting the correction context, relaying that information back into the active session, and then resuming without making the whole conversation feel broken.

That interruption-and-recovery behavior is a big part of what I think real-time AI tutors need in order to feel genuinely useful.

What Was Hard
The hardest part was making the real-time session architecture stable.

Once you move beyond static prompt/response interactions, the hard problems change. It becomes much more about:

audio transport
sample-rate handling
turn boundaries
transcript accuracy
playback coordination
session lifecycle
interruption state
One of the biggest lessons from this project was that the model is only part of the challenge. The surrounding systems are where most of the engineering complexity shows up.

I spent a lot of time making sure the app could survive real multi-turn conversation, keep the transcript faithful to what was actually being answered, and recover cleanly when the session needed to pause or redirect.

Why I Built It This Way
I wanted the project to feel like a real learning product rather than just a technical demo of voice AI.

That meant focusing on things like:

grounded study material instead of vague free-form chatting
learning modes based on real pedagogical techniques
personas that feel different to interact with
source visibility so the user can inspect what the tutor is grounded on
correction and interruption behavior so the session can adapt instead of continuing blindly
In other words, I wanted the app to show what a voice-native learning companion could actually feel like when built around understanding rather than just answer generation.

Try It
The app is live here:

https://teachback-ig3hrbcina-uc.a.run.app

Tested in Chrome and Safari.

Built for the Gemini Live Agent Challenge.

GeminiLiveAgentChallenge

Try it
The app is live at https://teachback-ig3hrbcina-uc.a.run.app. Works in Chrome and Safari.

Built for the Gemini Live Agent Challenge. #GeminiLiveAgentChallenge

Streaming Large Financial Transaction Exports Without Breaking Your API

2026-03-17 06:29:48

Large financial transaction exports can easily overwhelm traditional REST APIs.

When datasets reach hundreds of thousands or even millions of records, generating export files entirely in memory becomes inefficient and sometimes unstable.

Many financial platforms provide some version of an “Export transactions” feature for reconciliation, accounting, tax preparation, or compliance reporting.

At small scale, this is straightforward: query the transactions, generate a file, and return it to the client. Problems start appearing when those exports grow large.

This post explores a practical streaming pattern for handling those exports efficiently while keeping memory usage predictable.

The architectural approach discussed here is also described in more detail in my research paper:
Streaming REST APIs for Large Financial Transaction Exports from Relational Databases

Why Traditional Export APIs Struggle

A typical export endpoint might look something like this:

@GET
@Path("/transactions/export")
@Produces("text/csv")
public Response exportTransactions(@QueryParam("accountId") String accountId) {

    List<Transaction> transactions = transactionRepository.findAllTransactions(accountId);

    String csv = generateCsv(transactions);

    return Response.ok(csv)
            .header("Content-Disposition", "attachment; filename=transactions.csv")
            .build();
}

This pattern works well when datasets for the requested range are small. However, as the dataset grows, major problems begin to appear.

  • The API server needs to hold the entire dataset in memory while the export file is being generated.
  • The client must wait until the entire file has been generated before the download even begins.

For large exports, this can lead to:

  • Significant memory usage
  • Delayed response times
  • Poor scalability under concurrent requests

When exports reach hundreds of thousands or millions of records, these issues become noticeable very quickly.

A Simpler Way: Stream the Data

Instead of building the entire export file first, a more scalable approach is to stream the data from the database directly to the HTTP response.

The idea is simple:

  1. Fetch transactions incrementally from the database
  2. Process each record as it arrives
  3. Immediately write it to the HTTP response

Conceptually the pipeline looks like this:

Database → API → Format Encoder → HTTP Response → Browser

Each transaction flows through this pipeline and is delivered to the client immediately.

The server never needs to hold the full dataset in memory.

Architecture Overview

At a high level, a streaming export pipeline avoids assembling the full export file in memory by moving records through a continuous response path.

Architecture

Each record flows through the pipeline independently.

  1. The database returns rows incrementally using a cursor.
  2. The API processes one record at a time.
  3. The encoder formats the record into the target export format.
  4. The formatted data is immediately written to the HTTP response stream.

Because records are processed sequentially, the server never needs to hold the full dataset in memory.

Streaming the Database Query

The first step is ensuring the database driver retrieves rows incrementally instead of loading the entire result set.

Using JDBC, this can be achieved with a forward-only cursor and a fetch size.

try (
    Connection connection = dataSource.getConnection();
    PreparedStatement stmt = connection.prepareStatement(
        "SELECT date, description, amount FROM transactions WHERE account_id = ?",
        ResultSet.TYPE_FORWARD_ONLY,
        ResultSet.CONCUR_READ_ONLY
    )
) {
    stmt.setString(1, accountId);
    stmt.setFetchSize(1000);

    try (ResultSet rs = stmt.executeQuery()) {
        while (rs.next()) {
            Date date = rs.getDate("date");
            String description = rs.getString("description");
            BigDecimal amount = rs.getBigDecimal("amount");

            processRow(date, description, amount);
        }
    }
}

This allows the application to process transactions one row at a time, keeping memory usage predictable even for very large datasets.

Streaming the HTTP Response

Once rows are processed incrementally, they can be written directly to the HTTP response stream.

JAX-RS provides a convenient mechanism for this using StreamingOutput.

Here is a simplified example of a streaming export endpoint:

@GET
@Path("/transactions/export")
@Produces("text/csv")
public Response exportTransactions(
        @QueryParam("accountId") String accountId,
        @QueryParam("startDate") String startDate,
        @QueryParam("endDate") String endDate) {

    StreamingOutput stream = output -> {
        try (
            Connection conn = dataSource.getConnection();
            PreparedStatement stmt = conn.prepareStatement(
                "SELECT date, description, amount " +
                "FROM transactions " +
                "WHERE account_id = ? " +
                "AND date BETWEEN ? AND ? "+
                "ORDER BY date",
                ResultSet.TYPE_FORWARD_ONLY,
                ResultSet.CONCUR_READ_ONLY
            )
        ) {

            stmt.setString(1, accountId);
            stmt.setDate(2, java.sql.Date.valueOf(startDate));
            stmt.setDate(3, java.sql.Date.valueOf(endDate));
            stmt.setFetchSize(1000);

            try (
                ResultSet rs = stmt.executeQuery();
                PrintWriter writer = new PrintWriter(output)
            ) {
                while (rs.next()) {
                    writer.print(rs.getDate("date"));
                    writer.print(",");
                    writer.print(rs.getString("description"));
                    writer.print(",");
                    writer.println(rs.getBigDecimal("amount"));
                    writer.flush();
                }
            }
        }
    };

    return Response.ok(stream)
            .header("Content-Disposition", "attachment; filename=transactions.csv")
            .build();
}

Once the response begins streaming, the browser starts downloading the file immediately.

There is no need to wait for the entire dataset to be processed.

Supporting Multiple Export Formats

Financial platforms often support multiple export formats depending on the client system being used.

Common examples include:

  • CSV
  • OFX
  • QFX
  • QBO

A clean way to support these formats is to separate the export pipeline from the encoding logic.

For example:

public interface ExportEncoder {

    void start(OutputStream outputStream) throws IOException;

    void writeTransaction(Transaction transaction) throws IOException;

    void finish() throws IOException;
}

Each format can then implement its own encoder while the streaming pipeline remains unchanged.

This makes the export system easy to extend as new formats are required. This separation also keeps transport logic independent from file-format concerns, which makes testing and maintenance simpler.

Memory Behavior: Buffered vs Streaming

To understand the impact of streaming, it helps to compare how memory behaves in the two approaches.

Buffered Export

Traditional implementations load the entire dataset first.

List<Transaction> transactions = repository.findAllTransactions();
generateCSV(transactions);

Memory usage grows with dataset size because the full collection of transactions must be held in memory.

Streaming Export

With streaming, rows are processed one at a time.

while (rs.next()) {
    encode(rs);
    writeToResponse();
}

Only a small working set is required for the current fetch batch and output buffer.

As a result:

  • Memory usage stays relatively constant
  • Large exports do not require large heap allocations
  • API servers remain stable under concurrent export requests

What This Approach Improves

Streaming exports provide several practical advantages.

Memory usage remains low because transactions are processed incrementally rather than stored in large collections.

Users also experience faster response times because the download begins immediately instead of waiting for the server to build the entire export file.

Most importantly, the API infrastructure remains stable even when multiple large exports are requested concurrently.

For platforms that frequently generate large transaction exports, this architecture can significantly improve reliability and scalability.

Production Considerations

While streaming exports solve many scalability issues, there are a few practical considerations when implementing them in production systems.

Database timeouts

Long-running exports may require increased query timeouts depending on the database configuration.

Connection management

Streaming queries keep database connections open while data is being processed. Connection pool sizes should account for this behavior. In systems with frequent exports, it may also be useful to isolate export traffic from latency-sensitive request paths.

Backpressure

If the client download speed is slow, the server may block while writing to the response stream. Proper thread management is important to avoid tying up request threads unnecessarily.

Export limits

Some platforms enforce export limits or pagination windows to prevent excessively large exports from overwhelming infrastructure.

Even with these considerations, streaming remains one of the most effective techniques for handling large dataset exports.

Final Thoughts

Exporting large datasets is one of those features that seems trivial at first but becomes challenging as systems scale.

Streaming the export path from database to HTTP response is a simple and effective way to handle large exports at scale.

By processing transactions incrementally and delivering them directly to the client, APIs can handle large exports efficiently without excessive memory consumption.

In systems where large exports are common, adopting a streaming architecture can often be the difference between an export feature that works occasionally and one that scales reliably.

Although this article focuses on financial transaction exports, the same streaming approach can be applied to any API that returns large datasets—such as reporting endpoints, audit logs, analytics exports, or bulk data downloads.

How I give my AI agents eyes with a single API call

2026-03-17 06:23:05

My AI agent was blind.

It could read text, write code, call APIs — but the moment I asked it to work with a webpage, it hit a wall. "Go check if this landing page looks broken." "Tell me what the pricing page says now." "Monitor this competitor's homepage for changes." All blocked.

The obvious fix: give it a browser. The actual experience: install Puppeteer, debug the Chrome binary path, hit memory limits in Lambda, watch it break on every third-party CDN that detects headless browsers. An afternoon of yak-shaving every time.

I built SnapAPI to fix this.

What SnapAPI is

A REST API that wraps a headless browser. You send a URL, you get back a screenshot, PDF, or structured page data. No Puppeteer, no containers, no Chrome binary management.

Three lines of Python vs. a weekend of DevOps:

import requests

resp = requests.get(
    "https://snapapi.tech/v1/analyze",
    params={"url": "https://example.com"},
    headers={"X-API-Key": "YOUR_KEY"}
)
data = resp.json()
print(data["title"])       # "Example Domain"
print(data["text_summary"]) # "This domain is for use in illustrative examples..."

The three calls I use constantly

1. Analyze — structured page intelligence

This is the workhorse for AI pipelines. Instead of dumping raw HTML into a context window, I use /v1/analyze to get a clean, token-efficient JSON summary:

curl "https://snapapi.tech/v1/analyze?url=https://news.ycombinator.com" \
  -H "X-API-Key: YOUR_KEY"

Response:

{
  "url": "https://news.ycombinator.com",
  "title": "Hacker News",
  "description": "Links to stuff",
  "headings": [
    { "level": 1, "text": "Hacker News" }
  ],
  "links": [
    { "text": "new", "href": "https://news.ycombinator.com/newest" },
    { "text": "past", "href": "https://news.ycombinator.com/front" },
    { "text": "comments", "href": "https://news.ycombinator.com/newcomments" }
  ],
  "text_summary": "Ask HN: What are you working on? | 312 comments\nShow HN: ...",
  "load_time_ms": 847
}

Feed that text_summary to GPT-4 instead of the raw HTML. You go from 150k tokens of angle brackets to 2k tokens of actual content.

2. Screenshot — visual verification

AI agents that can see screenshots can verify things that text parsing misses: broken layouts, missing images, visual regressions, forms that didn't render.

curl "https://snapapi.tech/v1/screenshot?url=https://snapapi.tech&width=1280&height=800&format=png" \
  -H "X-API-Key: YOUR_KEY" \
  --output page.png

Parameters worth knowing:

  • full_page=true — captures the entire scrollable page, not just the viewport
  • dark_mode=true — renders in dark mode (useful for testing)
  • block_ads=true — blocks ad scripts before capture
  • wait_for_selector=.main-content — waits for a specific element before shooting
  • delay=1000 — waits N milliseconds after load (for JS-heavy SPAs)

I pipe screenshots directly to GPT-4V: "Does this page look broken? What changed since yesterday?"

3. Batch — process multiple URLs in one call

When you're monitoring a competitor's entire pricing page, checking 50 product pages for freshness, or building a dataset:

curl -X POST "https://snapapi.tech/v1/batch" \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://competitor-a.com/pricing",
      "https://competitor-b.com/pricing",
      "https://competitor-c.com/pricing"
    ],
    "endpoint": "analyze",
    "params": {}
  }'

Response:

{
  "total": 3,
  "succeeded": 3,
  "failed": 0,
  "duration_ms": 2841,
  "results": [
    { "url": "https://competitor-a.com/pricing", "title": "Pricing — CompA", "text_summary": "..." },
    { "url": "https://competitor-b.com/pricing", "title": "Plans — CompB", "text_summary": "..." },
    { "url": "https://competitor-c.com/pricing", "title": "Pricing — CompC", "text_summary": "..." }
  ]
}

One API call. Three pages. No rate limit juggling, no thread management.

Real use cases from my own pipelines

AI research assistant: Agent gets asked "what does Company X's product page say?" — calls /v1/analyze, feeds structured JSON to the LLM instead of raw HTML. Works reliably even on JS-heavy SPAs.

Automated visual regression: Cron job calls /v1/screenshot on a set of pages after every deploy. Screenshots stored in S3. If diff score exceeds threshold, Slack alert fires. Cost: ~$0.001/screenshot.

Competitive monitoring: Weekly job batches competitor pricing and feature pages. LLM diffs the extracted text against last week's version. Email alert on any meaningful change.

OG image generation: /v1/render takes raw HTML and returns a screenshot. Feed it a styled HTML template, get back a 1200×630 social share image. No canvas, no serverless Chrome, no font loading headaches.

Other endpoints

/v1/pdf — generates a PDF from any URL. Useful for reports, invoices, archival. Supports custom margins, landscape mode, background printing.

/v1/metadata — lightweight, fast metadata pull (title, og:image, canonical, favicon) without a full render. Use when you just need basic page info without executing JavaScript.

Getting started

Free tier requires no credit card. API key in your dashboard within 30 seconds.

# 1. Sign up at https://snapapi.tech
# 2. Copy your API key from the dashboard
# 3. Try it:
curl "https://snapapi.tech/v1/analyze?url=https://example.com" \
  -H "X-API-Key: YOUR_KEY"

Full docs at snapapi.tech/docs.

If your AI pipeline currently handles URLs by fetching raw HTML and dumping it into the context window — this is a direct upgrade. Structured output, real rendering, lower token cost, higher reliability.

Cómo sacar el máximo partido a Kiro

2026-03-17 06:20:04

Empecé con Kiro hace unos meses, en octubre de 2025, durante un hackathon llamado Kiroween. Fue básicamente mi punto de partida para empezar a probar Kiro como IDE y entender qué estaba intentando hacer AWS con esta propuesta. Mi objetivo era probar Kiro y ver en qué se diferenciaba de otras herramientas que ya estaba usando.

En aquel entonces tampoco había demasiadas guías sobre Kiro. Así que casi todo lo que hice fue exploratorio: abrir proyectos, probar el vibecoding, entender eso del spec-driven development y ver cómo respondía el IDE.

Con el paso de los meses y metiéndome más en la comunidad de Kiro, empecé a descubrir todo lo que realmente hay detrás del IDE. No es solamente un editor con IA. Hay todo un sistema alrededor de cómo gestionar el contexto de la IA, cómo configurar el comportamiento del IDE y cómo gobernar lo que hace.

Ahí es donde empiezan a aparecer conceptos como:

  • steerings
  • hooks
  • MCPs
  • powers
  • agentes
  • prompts
  • skills

Si ya estás usando Kiro, o estás pensando en usarlo, y aun no sabes como encajan todas estas piezas, este artículo es para ti.

TLDR: instalar Kiro

Si todavía no tienes Kiro instalado, aquí tienes los enlaces:

Con eso deberías poder tener el IDE funcionando en unos minutos.
A partir de aquí es donde empieza lo interesante.

Steerings

Kiro Steerings

Los Steerings son simplemente archivos Markdown donde defines instrucciones, reglas o contexto que quieres que Kiro tenga en cuenta cuando trabaja en tu proyecto.

No solamente sirven para dar instrucciones. También pueden contener información estructural sobre tu proyecto.

Por ejemplo:

  • La estructura y organizacion del repositorio
  • Convenciones de código y nomenclatura
  • Arquitectura del proyecto
  • Como generar la documentación

Uno de los casos más comunes es pedirle a Kiro que genere los Foundational Steerings.

Cuando haces esto, Kiro escanea tu proyecto y genera varios archivos dentro del directorio de configuración de Kiro. Normalmente incluyen:

  • producto
  • estructura
  • tech

Todo esto se genera usando el contexto que Kiro detecta en el repositorio. Es un buen punto de partida porque te da una base de documentación que la IA puede usar cuando trabaja en el código.

Steerings que recomiendo usar

Además de los foundational steerings, hay varios tipos que son bastante útiles añadir en proyectos reales.

Por ejemplo:

Reglas de trabajo en el proyecto

Cómo trabaja el equipo en ese repositorio. Esto puede incluir cosas como:

  • Convenciones de nomenclaturas
  • Estructura de carpetas
  • Vision y Mision del projecto

Esto ayuda mucho a que Kiro no empiece a generar cosas que se salgan del estilo del proyecto.

Estilo de documentación

Puedes definir cómo se documenta el código, qué secciones deben aparecer, o cómo se escriben los READMEs.

Convenciones de arquitectura

Si tu proyecto sigue un patrón concreto, hexagonal, event driven, clean architecture, etc., tener eso definido en un steering ayuda bastante a mantener consistencia.

Cuanto más claro le dejes a Kiro cómo funciona tu proyecto, mejores resultados vas a obtener.

Hooks

Kiro Hooks

Los Hooks son otra pieza bastante potente dentro de Kiro. Básicamente son prompts que se ejecutan automáticamente cuando ocurre un evento.

Unos ejemplos de esos eventos son:

  • Cuando se crea, modifica o elimina un fichero
  • Al empezar o terminar una tarea
  • Manualmente

Un ejemplo bastante simple:
Si estás modificando archivos Markdown, puedes tener un hook que:

  • Ejecute un lint
  • Formatee el Markdown
  • Revise enlaces rotos

Otro ejemplo:
Si estás modificando código, podrías tener un hook que genere automáticamente documentación sobre los cambios que se han hecho.

Esto abre la puerta a automatizar bastantes cosas dentro del flujo de desarrollo.

Cómo empezar con hooks

Los hooks tienen distinta maneras de dispararse: hook triggers

Mi recomendación es empezar siempre con hooks manuales.

Primero pruebas qué hacen, verificas que el resultado es el esperado y te aseguras de que no están haciendo nada raro.
Cuando ya estás contento con el resultado entonces sí puedes empezar a automatizarlos.
Recuerda que los hooks lanzan una tarea en Kiro, si tienes muchos hooks automatizados eso puede acarrear un coste elevado de creditos. Esa fue una de las primeras lecciones que aprendi con los hooks.

MCPs

Kiro MCPs

Los MCPs son otra de las piezas importantes dentro de Kiro y de la IA en general.

Para explicarlo de una manera muy sencilla, un MCP es básicamente como una API para que la IA lo use. Los MCPs permiten conectar la IA con servicios externos. Le da acceso a información o herramientas que normalmente no estarían dentro del contexto del proyecto.

Algunos ejemplos bastante comunes:

  • Cocumentación de AWS
  • Context7 para documentación técnica
  • Integración con Atlassian, como Jira o Confluence

Puedes conectar Kiro con documentación, herramientas externas o sistemas internos del equipo y hacer que la IA pueda trabajar con todo eso.

Powers

Kiro Powers

La parte no tan buena de los MCPs es que añade contexto dentro de la sesión de IA. Y cuantos mas MCPs tengas activados mas contexto va a consumir, causando que te cueste mas las tareas.

Muchas veces no necesitas tener todos los MCPs activos todo el tiempo. Y aquí es donde entran los Kiro Powers.

Los powers permiten encapsular tareas o dominios que se repiten mucho. Includas tareas que usen MCPs.
Imaginate que tienes un MCP de la base de datos Supabase. Pero para esta tarea sabes que no vas a necesitar la base de datos. Si lo encapsulas con un Power, Kiro lo invocaria solo cuando necesites la base de datos, liberandote el contexto que usaria ese MCP cuando no lo uses.

Hace tiempo escribir un articulo sobre la creacion de mi primer Power (esta en Ingles, sorry): Construyendo mi primer Kiro Power - Posthog Observability.

Agentes customizables

Custom Agents

A principios de este año, Kiro anuncio los agentes customizables o sub-agentes. Los agentes ya existian dentro de Kiro, esto lo habréis visto si ya habéis hecho el spec-driven development en el IDE o el modo plan con el CLI. Ambos usan agentes que ya están definidos dentro de Kiro, para ejecutar una tipo especifico de tarea.

Los agents tienen una structura y un prompt propio que lo hace comportarse de la manera especificada.

Lo que hacen es que tienen una estructura y un prompt interno en el cual solamente aparece el agente. Tiene ciertas herramientas en las cuales están habilitadas o no. Y pueden tener selecionado un modelo.
Por ejemplo, el agente del plan se comporta como un guia para planificar una tarea dada, te hace preguntas acerca de la tarea y te hace un plan detallado. Este agente tiene la herramienta de escribir ficheros deshabilitada, ya que solamente es un plan el que vas a hacer, no vas a hacer la implementación.

El plan es un agente que ya estaba pre-definido por Kiro. Ahora tú puedes hacer los tuyos propios, lo que significa que puedes llevar esto al siguiente nivel. O cuatro niveles mas alla. Y por que cuatro? porque con los sub-agentes puedes tener cuatro agentes corriendo en paralelo.

Agentes en paralelo

Cómo uso yo los agentes customizables

Los agentes que siempre uso son agentes que estén relacionados con lo que estás haciendo en el proyecto como tal.

Por ejemplo, tengo un proyecto que es de TypeScript, pues tengo un agente que es experto en TypeScript. Tiene un prompt especifico de como trabajar como un experto de TypeScript. Y le doy a habilitar tools de write y read.

Consejo de LLMs para planificación y arquitectura

Otro ejemplo que tengo es para tareas de planificación, arquitectura y toma de decisiones.

Normalmente, si le preguntas al mejor modelo que tengas, te puede dar una opción que sea bastante buena, pero lo ideal es que tengas varias opciones, ya que la IA es no-deterministica y puede alucinar.

A la hora de hacer planificación, arquitectura, toma de decisiones, asuntos en los cuales sea más crear un plan o un diseño, y no una implementación, en la vida real te juntas con un equipo, haces una lluvia de ideas y tienes diferentes opiniones por parte del equipo.

¿Cómo llevariamos esto dentro de Kiro? Tienes agentes customizables que tienen un prompt específico, por ejemplo en planificar. Como en los agentes customizables tú puedes decir qué modelo está usando, le puedes dar un modelo distinto para el mismo tipo de agente. Por ejemplo: Opus, Sonnet, Haiku y Auto, cuatro modelos.

A la hora de hacer un plan, le pido a Kiro que use mi Consejo de Agentes en paralelo. Kiro lanza estos cuatro modelos corriendo en paralelo haciéndome un plan independiente cada uno. Una vez terminan, el agente por defecto me compila esos cuatro planes, cogiendo lo mejor de cada uno.

Esto es algo que cambia el modelo de hacer una planificación completamente. Haces un brainstorming de varios modelos y te aseguras de tener siempre lo mejor de lo mejor. A lo mejor un modelo puede alucinar, pero que cuatro alucinen a la vez es más difícil.

Spec-driven development

La parte que me llamó más la atención de Kiro es el spec-driven development que ya estaba incluido dentro del IDE.

Para los desarolladores este no es un concepto del todo nuevo, es parecido a definir un ticket, darle unos requerimientos, sacar el diseño y luego implementar el ticket basado en distintas sub-tareas.

El spec-mode de Kiro te crea unos requerimientos, un diseño y genera una lista de tareas, todo en ficheros markdown, en los cuales tú los puedes revisar y tener contexto de tu aplicación o tu repositorio.

Si os interesa que escriba un articulo mas centrado en Spec-Driven Development, dejadme un comentario con vuestro interes.

Conclusión

En este artículo hemos visto lo que son los Steerings, Hooks, MCPs, Powers y los agentes customizables, y hemos visto por encima el spec-driven development.

Todas estas funcionalidades dentro de Kiro se define como el contexto de la IA. Si habeis escuchado hablar del Context-Driven Development, es este contexto al que se refieren.

Este articulo es el resultado de un aprendizaje que he hecho a lo largo de los meses. Y el aprendizaje es continuo, me sigo encontrando nuevas funcionalidades, como por ejemplo los skills, que no los he mencionado, pero es algo que también está incluido dentro de Kiro y es se han hecho muy famosos recientemente.

Si no estás usando Kiro, te recomendaría que lo pruebes, tanto el IDE como por la línea de comandos, el CLI, y que pruebes el spec-driven development.

Y si te interesa saber mas sobre Kiro, el Context-Driven Development y de Spec-Driven Development, o quieres comentar tu experiencia con Kiro, no dudes en contáctarme o escríbelo en los comentarios.