2026-04-03 06:08:48
Most agentic systems, like Claude Code, that run on laptops and servers, interact with files natively through bash. But building an agentic system that allows users to upload and work with files comes with its own limitations that make you unable to store files on the server the agent runs on, and give the agent the bash tool:
There are other solutions to these problems, but they each come with their own tradeoffs:
I ran into this problem recently while building a legal AI agentic system where users had to upload files for the agent to work with. The solution I needed had to be database-like storage that doesn't need to be spun up and down like a server, but supports native file operations that can be exposed as tools to the agent, with the agent unable to access anything outside its own scoped workspace.
Then I found AgentFS — a filesystem built specifically for AI agents, backed by Turso/SQLite. It provides scoped, isolated storage per user and session, with file operations that can be wired directly as agent tools.
Of the integration options — Python SDK, AgentFS + just-bash, AgentFS + FUSE — I went with the Python SDK. Unlike FUSE, which gives the agent a real mount but leaves the rest of the server exposed, the Python SDK puts you in full control. The agent can only do what you explicitly wire up as a tool. No shell escape, no arbitrary commands, no environment variable leaks. The isolation is in the design, not bolted on afterward.
The trade-off is that you're responsible for the tool surface. The SDK ships with the basics — read, write, list — but search operations were missing. No grep, no find, no wc. For an agent that needs to navigate files without dumping everything into context, those aren't optional. So I built them and raised a PR to have them integrated directly into the SDK.
AgentFS relies on Turso DB for hosted production use. Locally, the pattern already works — one SQLite file per user, each opened independently with full read-write access. But on a production server, you can't manage hundreds of separate database files manually. You need a single server process that can route connections to the right user's database.
Turso Cloud solves part of this — it supports creating thousands of separate databases and even lets you query across them using ATTACH. But attached databases are currently read-only. You can read from multiple user databases in one session, but you can't write to them. For an agentic system where the agent needs to create, modify, and delete files in a user's scoped workspace, read-only access isn't enough.
Turso has confirmed that full read-write ATTACH support is on their roadmap. On the AgentFS side, the open() call goes through a connect() function that can be pointed at a Turso-managed database instead of a local file — so the SDK integration path is straightforward once Turso ships the write support. Until then, full production multi-user AgentFS is blocked on this upstream feature.
2026-04-03 06:07:40
Antes de hablar de soluciones, hay que nombrar los retos con claridad porque es donde más se subestima el esfuerzo. El primero es elegir la solución tecnológica correcta — no todas las cargas de trabajo necesitan multi-región y no todos los servicios de AWS están disponibles igual en todas las regiones. El segundo es el manejo de fallos a escala: no basta con tener recursos en dos regiones si no has pensado cómo se comporta cada componente ante una falla. El tercero es la cercanía a los usuarios, que no siempre es puramente técnica — hay leyes, regulaciones y requisitos de soberanía de datos que dictan dónde puede vivir tu información.
Ignorar cualquiera de estos puntos al inicio garantiza una conversación mucho más difícil después.
El concepto clave aquí es el dominio de error (fault domain). Cada componente de tu arquitectura pertenece a un dominio que define su política de falla: puede ser redundante (se replica), ignorable (su caída no afecta el sistema), o en cascada (si cae, arrastra a quien depende de él — el temido SPOF).
El problema clásico es una arquitectura donde la base de datos es un dominio en cascada dentro de una sola AZ, en una sola región. Si esa AZ tiene problemas, caes completo. La estrategia multi-región resuelve esto añadiendo un nivel más en la jerarquía de dominios, pero también introduce nuevas preguntas sobre consistencia y latencia de replicación que hay que responder explícitamente.
Pensar en capas ayuda a no perderse. Cada capa tiene sus propias decisiones y sus propios servicios.
Capa de redes. El CDN entrega contenido global con acceso seguro y rápido — CloudFront es el componente natural aquí en AWS. El DNS, específicamente Route 53, es quien realmente orquesta el tráfico entre regiones: puedes rutear por latencia, por failover, por geolocalización o con políticas ponderadas. Una buena estrategia de DNS hace más diferencia de lo que la gente espera — es literalmente el primer punto de decisión que toca cada request de usuario. Las redes internas entre regiones deben estar interconectadas y planificadas desde el inicio, no como un afterthought.
Capa de cómputo. Los servicios deben ser modulares, organizados por dominio de negocio y escalables bajo demanda. La elección entre Lambda, EC2, ECS o Kubernetes depende del caso de uso — no hay respuesta genérica, y lo que sí aplica siempre es que la capa de cómputo debe poder replicarse o levantarse en otra región sin fricción manual.
Capa de aplicación. Aquí hay un principio que marca la diferencia: la aplicación debe ser agnóstica a la región. Eso implica configuración externalizada, procesos sin estado (stateless) y secretos administrables. Un ejemplo concreto: leer el region_name desde una variable en lugar de hardcodearlo en el código. Suena básico y sin embargo es donde se rompen más arquitecturas multi-región en la práctica.
Capa de datos. Esta es la más compleja. Antes de elegir un servicio hay que identificar los patrones de acceso, el tipo de almacenamiento (bloque, archivo u objeto), el costo de replicación y dónde están los usuarios. AWS tiene soporte de replicación cross-region en DynamoDB, RDS Aurora, RDS estándar, S3, ElastiCache y DocumentDB. Cada uno tiene sus propias implicaciones de consistencia eventual vs. consistencia fuerte que hay que entender antes de decidir.
Capa de seguridad, identidad y acceso. IAM es global, lo cual simplifica la gestión de usuarios, roles y grupos. KMS permite crear llaves con capacidad multi-región. Secrets Manager puede replicar secretos en regiones secundarias — y aquí hay un detalle importante de Terraform: cuando configuras un aws_secretsmanager_secret con un bloque replica, la región secundaria se sincroniza automáticamente. Parece trivial hasta que lo necesitas en un failover real.
Una arquitectura multi-región sin observabilidad centralizada es básicamente una caja negra distribuida. CloudWatch, Config, GuardDuty y CloudTrail son servicios regionales, pero servicios como Security Hub y CloudTrail soportan agregados multi-región, lo que permite tener una vista unificada de eventos de seguridad sin tener que revisar consola por consola.
Hay un punto importante aquí: una estrategia de monitoreo requiere varias iteraciones. No sale perfecta a la primera. Herramientas como Amazon DevOps Guru ayudan a identificar comportamientos anómalos, sugerir mejoras de configuración y alertar sobre fallos críticos — complementan bien el stack base de observabilidad.
En arquitecturas multi-región, el despliegue manual no es una opción viable a largo plazo. Infraestructura como código (Terraform, CDK, CloudFormation) no es solo una buena práctica — es lo que permite recrear un entorno completo en otra región en minutos en lugar de días. El control de cambios debe ser granular: por cuenta, por ambiente y por región. IAM debe seguir el principio de mínimo privilegio, y los fallos deben estar controlados — es decir, un error en el despliegue de una región no debe tumbar las otras.
Un tip práctico: las nuevas regiones también funcionan muy bien como sandbox para validar nuevas funcionalidades o para simular desastres antes de que lleguen solos.
Multi-región no es gratis ni en costo ni en complejidad operacional. El operational overhead es real: cada recurso que existe en una región ahora existe en dos o más, con todo lo que eso implica en mantenimiento, monitoreo y actualizaciones. Los costos de transferencia de datos entre regiones también se acumulan rápido si no se modelan desde el inicio.
Antes de empezar, vale la pena hacer un ejercicio de planeación con una matriz de prioridad, esfuerzo, complejidad y dependencias — algo similar al Método Eisenhower. No todo tiene que regionalizarse al mismo tiempo ni con la misma urgencia. Siempre hay componentes que son candidatos naturales para regionalizarse primero (típicamente los más críticos y con menor complejidad de replicación) y otros que pueden esperar.
Un Well Architected Review es un buen punto de partida para hacer ese inventario con una metodología estructurada.
El estado final de una arquitectura multi-región en AWS se ve algo así: el usuario llega a Route 53, que rutea al CloudFront más cercano, que a su vez dirige el tráfico a la región correspondiente — donde viven el API Gateway, las Lambdas y la base de datos Aurora replicada. Todo gestionado por certificados en ACM y con tráfico distribuido por políticas de latencia o failover en DNS.
Llegar ahí no pasa de un día para otro. Llega por iteraciones, con IaC como columna vertebral y con una estrategia de DNS que desde el primer día esté pensada para escalar.
Multi-región no es un problema de servicios, es un problema de diseño. Los servicios de AWS están listos. La pregunta es si tu arquitectura, tu código y tus procesos también lo están.
2026-04-03 06:05:40
I shipped 19 tools across 2 npm packages, got them reviewed, fixed 10 bugs, and published, all in one evening. I did not do it by typing faster. I did it by orchestrating multiple AI models the same way I would coordinate a small development team.
That shift changed how I use AI for software work. Instead of asking one model to do everything, I assign roles: one model plans, another researches, another writes code, another reviews, and another handles large-scale analysis when the codebase is too broad for everyone else.
Most developers start with a simple pattern: open one chat, paste some code, and keep asking the same model to help with everything. That works for small tasks. It breaks down on real projects.
The first problem is context pressure. As the conversation grows, the model’s context window fills with stale details, exploratory dead ends, copied logs, and half-finished code. Even when the window is technically large enough, quality often degrades because the model is trying to juggle too many concerns at once.
The second problem is that modern codebases are not tidy, single-language systems. The projects I work on often span TypeScript, Python, C#, shell scripts, README docs, test suites, CI config, and package metadata. The mental model required to review a TypeScript AST transform is not the same as the one required to inspect Unity C# editor code or write reliable Python tests.
The third problem is that software development is not one task. It is a bundle of different tasks:
Using one model for all of that is like asking one engineer to do product design, coding, testing, documentation, DevOps, and code review at the same time.
I now use a multi-model setup where each model has a clear job.
| Model | Role | Why This Model |
|---|---|---|
| Claude Opus (Orchestrator) | Decision-making, planning, user communication, coordination | Strongest reasoning, sees the big picture |
| Claude Sonnet (Subagent) | Codebase research, file reading, build/test, pattern finding | Fast, cheap, parallelizable |
| Codex MCP | Code writing in sandbox, counter-analysis, code review | Independent context, can debate with Opus |
| Gemini 2.5 Pro | Large-scale analysis (10+ files), cross-cutting research | 1M token context for massive codebases |
This is the important constraint: Opus almost never reads more than three files directly, and it never writes code spanning more than two files.
Opus is my scarce resource. I want its context window reserved for decisions, tradeoffs, and coordination. If I let it spend tokens reading ten implementation files, parsing test fixtures, or editing code across half the repo, I am wasting the most valuable reasoning surface in the system.
So I deliberately make Opus act more like a tech lead than a hands-on individual contributor:
The best model should not be your file reader, log parser, or bulk code generator.
If I need to answer questions like these:
I do not spend Opus on that. I send Sonnet agents to inspect the codebase and return structured findings. If the question spans a huge number of files, I use Gemini for the broad scan and have it summarize patterns, architectural seams, and constraints.
Then Opus makes the decision with clean inputs instead of raw noise.
One of the clearest examples was figma-spec-mcp, an open source MCP server that bridges Figma designs to code platforms. The package already had a React mapper, and I wanted to expand it with React Native, Flutter, and SwiftUI support while preserving shared conventions and reusing the normalized UI AST.
Instead, I split the work.
publint.The review surfaced bugs that were not obvious from a green-looking implementation:
shadowOffset represented as a string instead of an object.I ended that session with four platform mappers, reviewed, fixed, lint-clean, and production-ready in about two hours. The speed came from specialization and parallelism, not from asking one model to “be smarter.”
CoplayDev/unity-mcp
The second example was a series of open source contributions to CoplayDev/unity-mcp, a Unity MCP server with over 1,000 stars. The most significant was adding an execute_code tool that lets AI agents run arbitrary C# code directly inside the Unity Editor, with in-memory compilation via Roslyn, safety checks, execution history, and replay support.
The interesting part is how the feature gap was identified. I was already using a different Unity MCP server (AnkleBreaker) for my own projects, and I noticed it had capabilities that CoplayDev lacked. Rather than manually comparing 78 tools against 34, I had AI agents do the comparison systematically.
execute_code as the highest-impact contribution: it unlocks an entire class of workflows where AI agents can inspect live Unity state, run editor automation, and validate assumptions without requiring manual steps.execute, get_history, replay, clear_history), safety checks for dangerous patterns, Roslyn/CSharpCodeProvider fallback, and execution history management.ExecuteCode.cs (C# Unity handler with in-memory compilation), execute_code.py (Python MCP tool), and test_execute_code.py (unit tests). Over 1,600 lines of additions.System.IO and Process usageThe execute_code tool became one of the more significant contributions to the project, enabling AI agents to do things like inspect scene hierarchies at runtime, validate component references programmatically, and run editor automation scripts. The contribution was grounded in a real gap analysis rather than guesswork, and the multi-model workflow ensured the implementation matched the project’s conventions across two languages.
roblox-shipcheck Shooter Audit Expansion
The third example was roblox-shipcheck, an open source Roblox game audit tool. I wanted to add six shooter-genre-specific tools and expand the package around them with tests, documentation, examples, and release notes.
CHANGELOG, usage examples, and unit tests.The first review wave found:
The automated reviewer then found:
The package ended with 49 tools total, 124 passing tests, a cleaner README, updated examples, release notes, and green CI across TypeScript, ESLint, Prettier, and SonarCloud. That is the difference between “I added some code” and “I shipped a maintainable release.”
The most important lesson in all of this is simple: your orchestrator’s context window is the scarcest resource in the system.
These are the rules I follow now:
Here is the mental model I use:
Opus = scarce strategic bandwidth
Sonnet = cheap parallel investigation
Codex = isolated implementation and review
Gemini = massive-context research pass
Once I started treating context like a budget instead of an infinite buffer, my sessions became dramatically more reliable.
One of the most effective techniques in this setup is what I call the debate pattern.
Instead of asking one model for a solution and immediately implementing it, I force a disagreement phase.
This works because disagreement exposes hidden assumptions.
In one session, that debate caught:
Color formatting confusion between 0xRRGGBBAA and 0xAARRGGBB
mode where variant was correctColor(hex:) initializerNone of those issues were broad architectural failures. They were the kind of platform-specific correctness bugs that burn time after merge if you do not catch them early.
The debate pattern turns AI assistance from “fast autocomplete” into “adversarial design review plus implementation.”
The performance difference is large enough that I now think in terms of orchestration by default.
| Metric | Single Model | Multi-Model Orchestration |
|---|---|---|
| Tools shipped per session | 2-3 | 10-15 |
| Bugs caught before publish | ~60% | ~95% (Codex review) |
| Parallel workstreams | 1 | 6+ simultaneous |
| Context preservation | Degrades after 3-4 files | Stays sharp (delegated) |
| Convention compliance | Often drifts | Exact match (research first) |
If you want to try this workflow, start simple. You do not need a huge automation stack on day one. You just need role separation and a few clear rules.
npm: codex) for implementation, sandboxed code changes, and reviewnpm: gemini-mcp-tool) for large-scale repo analysis and broad research across many filesThe most important operational detail is to write your rules down in CLAUDE.md. If the orchestrator has to rediscover your preferences every session, you lose consistency and waste tokens.
My CLAUDE.md contains rules like:
- Opus reads <= 3 files directly
- Opus writes <= 2 files directly
- Delegate codebase exploration to Sonnet
- Use Codex for implementation spanning multiple files
- Always run a separate review pass before publish
- Prefer parallel subagents for independent tasks
That single file turns ad hoc prompting into a repeatable operating model.
If you want a low-friction way to start, try this:
Three habits made the biggest difference for me.
First, I stopped treating AI output as a finished artifact and started treating it as a managed workstream. Every meaningful code change has research, implementation, review, and verification phases. Different models are better at different phases.
Second, I learned that independent context is a feature, not a limitation. When Codex reviews code from a separate session, it does not inherit all the assumptions of the implementation pass. That distance is exactly why it catches bugs.
Third, I stopped optimizing for “best prompt” and started optimizing for “best routing.” The better question is: which model should spend tokens on this specific task?
The future of AI-assisted development is not a single omniscient model sitting in one giant chat. It is orchestration: using the right model for the right task, preserving your strongest model’s context for decisions, and letting specialized agents handle research, implementation, review, and verification.
If you are already using AI in development, my practical advice is simple: stop asking one model to do everything. Give each model a role, protect your orchestrator’s context window, and add a real review pass. That is where the 10x improvement comes from.
2026-04-03 06:05:00
A practical guide to migrating a federated remote to Vite, based on lessons from a real migration.
I was tasked with updating a legacy React application that did not support Module Federation. That integration was added first so the app could run as a remote inside a larger host application. Later, the remote needed to migrate from Create React App (CRA) to Vite. By that point, the host already depended on the remote's loading behavior. The tricky part was not replacing CRA with Vite. It was preserving the runtime contract while only the remote changed bundlers.
If you own a CRA or webpack-era remote that still has to load cleanly inside an existing host, this post covers the cleanup work beforehand, the core CRA-to-Vite swap, the federation-specific deployment fixes, and a local dev harness for debugging the full host loading sequence without redeploying every change.
Terms for reference
- CRA: Create React App. For years it was the default easy on-ramp for React apps before being deprecated in 2025.
- CRACO: Create React App Configuration Override
- Module Federation: A way for one application to load code from another at runtime instead of bundling everything together up front.
- Host: The application that loads another app at runtime.
- Remote: The application that exposes code for the host to load.
- Runtime contract: The files and exported APIs the host already expects.
Dependabot alerts. The biggest issue was that the CRA dependency tree had kept accumulating a number of high-risk Dependabot alerts, and patching around them was getting harder to justify.
Slow builds. CRA and webpack took over a minute for a cold-start build.
Too many config layers. CRACO was overriding CRA's webpack config, plus custom build scripts for module federation.
Stale tooling. ESLint was still on the legacy .eslintrc format. Jest had its own separate config.
Dependency rot. Years of Dependabot patches left dozens of manual resolutions in the dependency manifest that nobody fully understood anymore.
The goal was not just "swap the build tool." It was to reduce dependency risk, simplify the toolchain, and leave the project in a state that another engineer could pick up. Vite had already earned a strong reputation. What was different now was that there was finally enough maintenance pressure to justify spending sprint time on the migration.
Before touching the build tool, everything that would conflict with Vite or had become dead weight needed to go.
Some dependencies werent really "dependencies" so much as assumptions about the old toolchain:
preval.macro that ran at compile time. Vite doesnt run your app through the same pipeline that a CRA stack does.react-scripts, craco, react-app-rewired
jsonwebtoken that were built for Node.js and rely on polyfills that webpack injected automatically. Vite does not do this, so if anything in the browser code imports Node.js built-ins like crypto or Buffer, it will break.The package dependencies were audited and around a dozen were removed. Then the pile of old manual resolutions that had accumulated from years of Dependabot fixes was cleared out. Most of those overrides were for transitive deps of packages that were already gone.
Worth checking early: a shared design system was still using deprecated Sass @import patterns, and it had to be updated before the new toolchain would build cleanly.
With the codebase cleaned up, the core migration came down to a few straightforward steps:
vite.config.ts
index.html from public/ to the project root and point it at the module entryREACT_APP_* env vars to VITE_*; in application code, replace process.env usage with import.meta.env
ReactDOM.render calls to createRoot
vite, vite build, vite preview, and vitest
Once Vite was the build tool, Vitest was the obvious test runner. It shares the same config file, understands the same path aliases, and removed a lot of separate config glue.
Add the test config directly to vite.config.ts:
import { defineConfig } from 'vite';
export default defineConfig({
// ...build config above...
test: {
globals: true,
environment: 'jsdom',
setupFiles: './src/test/setup.ts',
coverage: {
reporter: ['text', 'html'],
include: ['src/**/*.{ts,tsx}'],
},
},
});
No separate jest.config.js. No babel-jest transform. No moduleNameMapper to keep in sync with path aliases.
This is where the migration stopped being a normal bundler swap. The host still ran webpack and expected all of this to keep working:
host -> fetch asset-manifest.json
host -> load remoteEntry.js
host -> init shared scope
host -> get exposed module
host -> call inject(container, props)
host -> later call unmount()
Install @module-federation/vite and add it to your Vite config:
import react from '@vitejs/plugin-react';
import { federation } from '@module-federation/vite';
import { defineConfig } from 'vite';
export default defineConfig({
plugins: [
react(),
federation({
name: 'remoteApp',
filename: 'remoteEntry.js',
exposes: {
'./RemoteModule': './src/remote/entry.ts',
},
}),
],
// ...
});
The exposed entry file should export the lifecycle functions the host expects:
// src/remote/entry.ts
export { inject, unmount } from './RemoteModule';
export { default } from './RemoteModule';
import { MemoryRouter } from 'react-router-dom';
import { createRoot, type Root } from 'react-dom/client';
import App from '../App';
let root: Root | null = null;
export const inject = (
container: string | HTMLElement,
_props?: Record<string, unknown>
): void => {
const element =
typeof container === 'string'
? document.getElementById(container)
: container;
if (!element) return;
// Guard against duplicate roots if the host mounts twice.
root?.unmount();
root = createRoot(element);
root.render(
<MemoryRouter>
<App />
</MemoryRouter>
);
};
export const unmount = (): void => {
if (root) {
root.unmount();
root = null;
}
};
Note: The
inject(container, props)andunmount()API here is host-specific.MemoryRoutermade sense because the embedded remote needed internal navigation but not deep-linkable standalone URLs. Standalone development usedBrowserRouterinstead.
The host fetched asset-manifest.json and expected specific keys for remoteEntry.js and main.css. Vite produced a different file (manifest.json) with a different shape, so even after renaming the file, the host couldnt parse it.
The fix was a small Vite plugin that generates a compatible manifest after the build:
import { promises as fs } from 'node:fs';
import path from 'node:path';
import type { Plugin } from 'vite';
export const rewriteHostManifest = (): Plugin => ({
name: 'rewrite-host-manifest',
async writeBundle(options, bundle) {
const outputDir = options.dir || 'dist';
const files = Object.keys(bundle);
const remoteEntry = files.find((file) => file.endsWith('remoteEntry.js'));
const mainCss = files.find((file) => file.endsWith('.css'));
if (!remoteEntry || !mainCss) {
throw new Error('remoteEntry.js not found in bundle output');
}
const manifest = {
files: {
'remoteEntry.js': `/${remoteEntry}`,
'main.css': `/${mainCss}`,
},
};
await fs.writeFile(
path.join(outputDir, 'asset-manifest.json'),
JSON.stringify(manifest, null, 2)
);
},
});
Add it to the plugins:
plugins: [
react(),
federation({ /* ... */ }),
rewriteHostManifest(),
],
Adapt the manifest shape to whatever the host actually reads. This was specific to this setup.
If the built assets are served from a CDN or cloud storage bucket, you need to tell Vite:
export default defineConfig({
base: process.env.ASSET_BASE_PATH || '/',
// ...
});
Without this, Vite generates root-relative paths like /assets/chunk-abc123.js. The host resolves those relative to its own origin, which in this case served index.html instead of the JS file, producing MIME type errors. Setting base to the bucket or CDN path fixed it.
The module bundled custom fonts, but the host already loaded the same fonts globally. The fix was to move the @font-face declarations into a separate SCSS file and only import it in standalone mode, not in the federated entry.
This was the biggest QOL improvement, and probably the most reusable part of the migration. Testing a federated module usually means deploying to a test environment and loading it through the host. That's a slow feedback loop. Instead, a local dev harness was built to replicate the host's loading sequence.
The harness used vite build --watch plus vite preview instead of the normal dev server because the goal was to validate the real emitted artifacts: asset-manifest.json, remoteEntry.js, built CSS, and chunk URLs. The standard dev server is great for app development, but it doesnt produce the same output the host will actually fetch in production.
The harness did the following:
vite build
vite build --watch
vite preview
asset-manifest.json from the local preview serverremoteEntry.js
container.init() and container.get()
inject() with configurable props and verify unmount() cleanupThat made it possible to test the full federation lifecycle locally, including script loading, module init, prop injection, CSS loading, auth handling, and unmount cleanup, without deploying anything.
The entry point ended up with three runtime modes:
// src/main.tsx
if (import.meta.env.VITE_USE_FEDERATION_HARNESS === 'true') {
const { FederationHarness } = await import('./dev/FederationHarness');
root.render(<FederationHarness />);
} else if (import.meta.env.VITE_EMBEDDED_MODE === 'true') {
const { FederatedEntry } = await import('./remote/FederatedEntry');
root.render(<FederatedEntry />);
} else {
const { StandaloneEntry } = await import('./standalone/StandaloneEntry');
root.render(<StandaloneEntry />);
}
start runs standalone app developmentdev runs federation development against a local preview serverbuild produces the production remote for the real hostbase matters for remote hosting. Forget it and every chunk import will 404 or return HTML instead of JavaScript.These were the checks that mattered more than "the build passed":
asset-manifest.json, load remoteEntry.js, and mount the module.babelrc, craco.config.js, jest.config.js, and custom webpack overridesvite.config.ts
If you're maintaining a federated micro frontend on CRA, the path to Vite is worth the effort. Just remember to analyze the host's loading contract and build yourself a local harness that exercises the real federation lifecycle.
A note on Vite 8: Vite 8 shipped recently, after this migration was already complete. Its release notes mention Module Federation support as one of the capabilities unlocked by the new Rolldown-based architecture, which looks promising. If I were starting today, I would look into this first.
2026-04-03 06:00:00
GitHub Copilot code review is an AI-powered feature that analyzes pull requests directly within the GitHub interface and posts inline comments on potential bugs, security issues, performance problems, and code quality concerns. Instead of waiting hours or days for a human reviewer to look at your PR, you can assign Copilot as a reviewer and receive automated feedback within minutes.
This feature is part of GitHub's broader strategy to embed AI into every stage of the software development lifecycle. Copilot started as an inline code completion tool in 2022, expanded to include chat in 2023, added code review in 2024, and launched an autonomous coding agent in late 2025. Code review fits naturally into this trajectory - if Copilot can help you write code, it should also be able to help you review it.
The March 2026 agentic architecture overhaul was the turning point. Before this update, Copilot's code review was limited to shallow, line-by-line diff analysis that often produced generic comments. The new agentic system uses tool-calling to actively explore your repository, read related files, trace cross-file dependencies, and build broader context before generating review comments. This is a fundamental architectural shift from "look at the diff and comment" to "understand the change in context and then comment."
GitHub reports that Copilot has processed over 60 million code reviews since the feature launched, and adoption has accelerated significantly after the agentic update. For teams already paying for Copilot Business or Enterprise, code review is included at no additional cost, which makes it the path of least resistance for organizations looking to add AI review to their workflow.
That said, Copilot code review is one feature within a generalist AI coding platform. It competes against dedicated review tools like CodeRabbit, CodeAnt AI, and PR-Agent that do nothing but code review and have optimized their entire architecture for that single use case. Whether Copilot's code review is sufficient for your team depends on your review standards, your git platform, and how much customization you need.
Understanding the underlying mechanics helps set realistic expectations for what Copilot can and cannot catch. The system works in three stages: context gathering, LLM-based analysis, and comment generation.
When you request a review from Copilot on a pull request, the agentic architecture begins by collecting context about the change. This goes beyond simply reading the diff. The system:
This context-gathering step is what distinguishes the post-March 2026 version from the earlier line-level analysis. However, the amount of context Copilot can gather is constrained by the model's context window and the time budget allocated per review. For very large PRs or monorepos with deep dependency chains, the system may not trace every relevant file.
With context assembled, Copilot feeds the information to a large language model for analysis. The model evaluates the code changes against several dimensions:
Copilot supports multiple underlying models (GPT-5.4, Claude Opus 4, Gemini 3 Pro), and the model used for code review may vary. The analysis is purely static - Copilot does not execute the code, run tests, or perform dynamic analysis. Everything it identifies comes from pattern recognition and reasoning over the code text.
After analysis, Copilot generates inline review comments attached to specific lines in the PR diff. Each comment typically includes:
Comments are posted as a standard GitHub review, appearing in the same conversation thread as human reviews. Developers can reply to Copilot's comments, dismiss them, or apply the suggested fixes directly. The experience is seamless within the GitHub UI - there is no separate dashboard or interface to learn.
Copilot can also read custom instructions from a copilot-instructions.md file in your repository. This file lets you specify review guidelines, coding conventions, or areas of focus. However, the file is limited to 4,000 characters, which constrains how detailed your instructions can be.
Setting up Copilot code review is straightforward for teams already using GitHub and Copilot, but there are specific requirements and configuration steps depending on your plan.
Before you can use Copilot code review, you need:
A GitHub Copilot plan that includes code review. Copilot Pro ($10/month) includes 300 premium requests per month, and each code review consumes premium requests. Copilot Business ($19/user/month) and Enterprise ($39/user/month) include code review with their respective premium request allocations. The free tier includes only 50 premium requests per month, which is too limited for regular review usage.
A GitHub repository. Copilot code review works exclusively on GitHub. It does not support GitLab, Bitbucket, or Azure DevOps. If your team uses any other git platform, Copilot code review is not an option.
Copilot enabled for your organization (for Business and Enterprise plans). Individual Pro subscribers can use code review on their personal repositories without additional setup.
For Copilot Business and Enterprise plans, an organization administrator needs to enable code review in the org settings:
Organization admins can also configure which repositories Copilot is allowed to review and set policies for how review comments are displayed.
At the repository level, you can further customize Copilot's behavior:
.github/copilot-instructions.md file in your repository root to provide custom review guidelines. For example:
## Code Review Instructions
- Always check for null/undefined before accessing object properties
- Flag any database queries that don't use parameterized inputs
- Ensure all API endpoints have proper error handling
- Warn about functions exceeding 50 lines
Remember the 4,000-character limit on the instructions file. Prioritize your most important review criteria rather than trying to be exhaustive.
Once Copilot is enabled, requesting a review is simple:
You can also trigger a review by commenting @copilot review on the pull request. This is useful when you want Copilot to re-review after pushing additional commits.
Copilot posts its review as a standard GitHub PR review with inline comments. You can interact with these comments just as you would with human review comments - reply, resolve, or apply suggested fixes.
Copilot's agentic code review catches a meaningful range of issues across several categories. Here are concrete examples from real-world usage patterns.
Copilot is reasonably effective at catching common bug patterns, particularly null reference errors, off-by-one mistakes, and incorrect logic flow.
Example: Missing null check
async function getUser(userId: string) {
const user = await db.users.findOne({ id: userId });
return user.name; // Copilot flags: user could be null
}
Copilot would comment something like: "The result of findOne could be null if no user matches the given ID. Accessing .name without a null check will throw a TypeError at runtime. Consider adding a null check or using optional chaining (user?.name)."
Example: Off-by-one in loop boundary
def process_items(items):
for i in range(1, len(items)): # Copilot flags: starts at 1, skips first item
transform(items[i])
Copilot would note that the loop starts at index 1, which skips the first element. Depending on the intent, this could be a bug or deliberate - but Copilot flags it for the developer to confirm.
Copilot identifies common security anti-patterns, though its coverage is narrower than dedicated SAST tools.
Example: SQL injection risk
def get_orders(user_id):
query = f"SELECT * FROM orders WHERE user_id = '{user_id}'"
return db.execute(query)
Copilot flags this as a SQL injection vulnerability and suggests using parameterized queries instead:
def get_orders(user_id):
query = "SELECT * FROM orders WHERE user_id = %s"
return db.execute(query, (user_id,))
Example: Hardcoded credentials
const client = new S3Client({
credentials: {
accessKeyId: "AKIAIOSFODNN7EXAMPLE",
secretAccessKey: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
},
});
Copilot identifies hardcoded AWS credentials and recommends using environment variables or a secrets manager. This is a pattern that most AI review tools catch reliably.
Copilot flags certain performance anti-patterns, particularly around database queries and algorithmic inefficiency.
Example: N+1 query pattern
async function getOrdersWithProducts(orderIds: string[]) {
const orders = await db.orders.findMany({ where: { id: { in: orderIds } } });
for (const order of orders) {
order.products = await db.products.findMany({
where: { orderId: order.id },
});
}
return orders;
}
Copilot identifies the N+1 query pattern - one query for orders, then one additional query per order for products - and suggests batching the product lookup into a single query with a WHERE orderId IN (...) clause.
Copilot comments on code clarity, naming conventions, and maintainability concerns. These tend to be the most subjective comments and the source of most false positives.
Examples include flagging overly long functions, suggesting more descriptive variable names, recommending extraction of duplicated logic into shared utilities, and noting inconsistencies with the surrounding code style.
Being honest about limitations is important for setting the right expectations. Copilot code review has several meaningful gaps that teams should understand before relying on it as their primary review mechanism.
Copilot does not support custom deterministic rules. You cannot tell it "every API endpoint must call auditLog() before returning" or "all database models must include a createdAt field" and have it enforce those rules consistently across every PR. The copilot-instructions.md file provides soft guidance to the LLM, but compliance is probabilistic rather than guaranteed. Some PRs will catch the violation; others will miss it entirely.
Dedicated tools like CodeRabbit support natural language review instructions without character limits, and tools like CodeAnt AI include deterministic rule engines that enforce custom patterns with zero false negatives on defined rules.
Copilot does not bundle deterministic linters like ESLint, Pylint, Golint, or RuboCop. It relies entirely on LLM-based analysis, which means it can catch the spirit of style violations but may miss specific rule violations that a deterministic linter would always flag. CodeRabbit includes 40+ built-in linters that run alongside its AI analysis, creating a dual-layer approach that catches both subtle semantic issues and concrete rule violations.
Despite the agentic architecture, Copilot's ability to gather context is bounded by the model's context window and the time budget per review. For large PRs (500+ lines changed across dozens of files) or monorepos with deep dependency chains, Copilot may not trace every relevant relationship. Users report that on very large PRs, the review quality degrades noticeably, with Copilot sometimes commenting only on a subset of changed files.
Copilot code review does not integrate into CI/CD pipelines. It operates exclusively within the GitHub PR interface. You cannot run Copilot's review as a step in a GitHub Actions workflow, gate merges based on Copilot's findings, or pipe review results into other tools. Dedicated review tools like PR-Agent and CodeAnt AI offer CI/CD integration that allows you to incorporate AI review into your automated pipeline and enforce review gates.
Copilot does not learn from your team's review patterns. If your team consistently dismisses a certain type of comment, Copilot will continue making that same comment on future PRs. There is no feedback loop that adapts the review to your team's preferences over time. CodeRabbit's learnable preferences system explicitly addresses this - the more your team interacts with its reviews, the more accurately it aligns with your standards.
This is a hard constraint. Copilot code review works on GitHub and only GitHub. Teams using GitLab, Bitbucket, or Azure DevOps cannot use this feature at all. For organizations with repositories spread across multiple git platforms, Copilot code review covers only a portion of their workflow.
In practice, Copilot's false positive rate on code review is noticeable. Users report that roughly 15-25% of Copilot's review comments are either incorrect, irrelevant, or so vague as to be unhelpful. This is higher than specialist tools - CodeRabbit's false positive rate is approximately 8% in testing, and DeepSource claims sub-5%. A high false positive rate erodes developer trust and can lead teams to ignore Copilot's comments entirely, defeating the purpose of automated review.
Copilot does not pull context from external project management tools like Jira or Linear. It cannot verify that a PR's implementation matches the requirements described in a linked ticket. CodeRabbit integrates with Jira and Linear, pulling issue context into its review analysis to verify that the code changes align with the stated requirements.
Copilot code review competes directly with tools built specifically for AI-powered PR review. Here is how it stacks up against the three most prominent alternatives.
CodeRabbit is the most widely used dedicated AI code review tool, having reviewed over 13 million pull requests across more than 2 million repositories. The comparison between CodeRabbit and Copilot comes down to specialist depth versus generalist convenience.
Where CodeRabbit wins:
Where Copilot wins:
PR-Agent (by Qodo, formerly CodiumAI) is an open-source AI code review tool that can be self-hosted for free or used as a hosted service.
Where PR-Agent wins:
Where Copilot wins:
CodeAnt AI combines AI code review with static analysis, security scanning, and secrets detection in a single platform.
Where CodeAnt AI wins:
Where Copilot wins:
| Feature | GitHub Copilot | CodeRabbit | PR-Agent | CodeAnt AI |
|---|---|---|---|---|
| Primary focus | AI coding platform | AI PR review | AI PR review (OSS) | AI review + SAST |
| Review approach | Agentic LLM | LLM + 40 linters | Configurable LLM | LLM + static analysis |
| Free tier (review) | 50 premium requests/mo | Unlimited repos | Free (self-hosted) | Yes (Basic plan) |
| Paid pricing | $10-39/user/mo | $24/user/mo | $30/user/mo (hosted) | $24/user/mo |
| GitHub | Yes | Yes | Yes | Yes |
| GitLab | No | Yes | Yes | Yes |
| Bitbucket | No | Yes | Yes | Yes |
| Azure DevOps | No | Yes | Yes | Yes |
| Custom rules | copilot-instructions.md (4K chars) | Unlimited natural language | Custom prompts | 300K+ static rules |
| Built-in linters | None | 40+ | None | Yes |
| CI/CD integration | No | N/A | Yes | Yes |
| Learnable preferences | No | Yes | No | No |
| Self-hosted option | No | Enterprise only | Yes (free) | Enterprise only |
| Code completion | Yes | No | No | No |
| Chat assistant | Yes | No | No | No |
| Coding agent | Yes | No | No | No |
Understanding the true cost of Copilot code review requires looking beyond the headline prices because code review is bundled with other features and consumed through the premium request system.
| Plan | Price | Premium Requests/Month | Code Review | Best For |
|---|---|---|---|---|
| Free | $0 | 50 | Limited | Trying out the feature |
| Pro | $10/month | 300 | Yes | Individual developers |
| Pro+ | $39/month | 1,500 | Yes | Power users |
| Business | $19/user/month | Per-policy | Yes | Teams and organizations |
| Enterprise | $39/user/month | 1,000/user | Yes | Large organizations |
Each code review consumes premium requests. A typical review of a medium-sized PR (100-300 lines changed) uses 1-3 premium requests. For a developer opening 3-5 PRs per week, that translates to roughly 12-60 premium requests per month just for code review. On the Pro plan with 300 premium requests, this is manageable alongside chat and other features. On the free tier with 50 requests, code review competes with chat for a tiny budget.
For a team evaluating Copilot code review specifically:
| Team Size | Copilot Business | CodeRabbit Pro | PR-Agent (hosted) | CodeAnt AI |
|---|---|---|---|---|
| 5 devs | $95/month | $120/month | $150/month | $120/month |
| 10 devs | $190/month | $240/month | $300/month | $240/month |
| 25 devs | $475/month | $600/month | $750/month | $600/month |
| 50 devs | $950/month | $1,200/month | $1,500/month | $1,200/month |
Copilot appears cheapest per seat because it bundles code review with code completion, chat, and the coding agent. If your team uses all these features, the per-feature cost is extremely competitive. However, if your team only needs code review and already has other tools for completion and chat, you are paying for features you do not use.
The key question is whether the price difference justifies the review quality gap.
If your team already pays for Copilot Business or Enterprise: Code review is included at no additional cost. Using it is a no-brainer as a baseline. The question becomes whether to add a dedicated tool like CodeRabbit ($24/user/month additional) for deeper reviews. For teams with strict review standards or multi-platform requirements, the additional cost is justified. For teams with moderate review needs on GitHub only, Copilot's included review may be sufficient.
If your team does not currently use Copilot: Paying $19/user/month for Copilot Business solely for code review is harder to justify when CodeRabbit's free tier provides unlimited AI review at zero cost, or when PR-Agent can be self-hosted for free. In this scenario, choosing a dedicated review tool makes more economic sense unless you also want Copilot's completion and chat features.
If your team uses GitLab, Bitbucket, or Azure DevOps: Copilot code review is not available. This eliminates it from consideration entirely, regardless of pricing.
To ground this discussion in concrete reality, here are three scenarios that illustrate what Copilot code review produces in practice.
A developer opens a PR that adds a new API endpoint for processing payments:
app.post("/api/payments", async (req, res) => {
const { amount, customerId } = req.body;
const customer = await stripe.customers.retrieve(customerId);
const charge = await stripe.charges.create({
amount,
currency: "usd",
customer: customer.id,
});
res.json({ success: true, chargeId: charge.id });
});
Copilot's review comment: "This endpoint has no error handling. If stripe.customers.retrieve fails (e.g., invalid customer ID, network error, or Stripe API outage), the unhandled promise rejection will crash the process or return a 500 error with no useful message. Wrap the Stripe calls in a try-catch block and return appropriate error responses."
Copilot provides a suggested fix:
app.post("/api/payments", async (req, res) => {
try {
const { amount, customerId } = req.body;
if (!amount || !customerId) {
return res.status(400).json({ error: "Missing required fields" });
}
const customer = await stripe.customers.retrieve(customerId);
const charge = await stripe.charges.create({
amount,
currency: "usd",
customer: customer.id,
});
res.json({ success: true, chargeId: charge.id });
} catch (error) {
console.error("Payment processing failed:", error);
res.status(500).json({ error: "Payment processing failed" });
}
});
This is a genuine, helpful catch. The comment is specific, the explanation is clear, and the suggested fix is correct. This represents Copilot at its best.
A PR adds a file download endpoint:
@app.route("/download/<filename>")
def download_file(filename):
file_path = os.path.join("/uploads", filename)
return send_file(file_path)
Copilot's review comment: "This endpoint is vulnerable to path traversal. An attacker could request /download/../../etc/passwd to access arbitrary files on the server. The os.path.join function does not prevent directory traversal when the filename contains .. sequences. Validate the filename and ensure the resolved path stays within the intended upload directory."
Copilot suggests:
@app.route("/download/<filename>")
def download_file(filename):
safe_filename = secure_filename(filename)
file_path = os.path.join("/uploads", safe_filename)
if not os.path.abspath(file_path).startswith(os.path.abspath("/uploads")):
abort(403)
return send_file(file_path)
This is another strong catch. Path traversal is a common vulnerability that Copilot identifies reliably because it matches well-known patterns the LLM was trained on.
A PR refactors a utility function:
func formatUserName(first, last string) string {
return strings.TrimSpace(first) + " " + strings.TrimSpace(last)
}
Copilot's review comment: "Consider using fmt.Sprintf instead of string concatenation for better readability: return fmt.Sprintf(\"%s %s\", strings.TrimSpace(first), strings.TrimSpace(last))"
This is a false positive in most teams' judgment. String concatenation with + is perfectly idiomatic in Go for simple cases, and fmt.Sprintf is not inherently more readable for a two-string join. The suggestion adds no meaningful value and could be actively confusing if a junior developer takes it as a required change. This type of stylistic bikeshedding is where Copilot's review adds noise rather than signal.
False positives like this are not catastrophic - developers learn to dismiss them. But they consume attention and erode trust. When 15-25% of comments are in this category, the cognitive overhead of triaging review comments becomes a real cost.
Getting the most value from Copilot code review requires understanding where it fits in your workflow and setting appropriate expectations.
Use Copilot review as a first pass, not a final review. Copilot is best positioned as a fast, automated first pass that catches obvious issues before a human reviewer looks at the PR. It catches null reference bugs, missing error handling, common security anti-patterns, and performance issues quickly. Think of it as a safety net that reduces the burden on human reviewers rather than replacing them.
Use it for all PRs, not just large ones. Even small PRs can contain security vulnerabilities or logic errors. Since Copilot review takes only 2-5 minutes and requires no effort from the PR author, there is little downside to making it a standard part of every PR.
Do not gate merges on Copilot review alone. Copilot's false positive rate and limited context awareness mean it should not be the sole gatekeeper for code quality. Always require human review for critical code paths, security-sensitive changes, and architectural decisions.
Many teams get the best results by combining Copilot with a dedicated review tool:
Be transparent with your team about what Copilot code review can and cannot do:
copilot-instructions.md file, but understand that compliance with those instructions is probabilistic. For hard requirements, use deterministic linters in your CI pipeline.To maximize the value of the 4,000-character copilot-instructions.md budget:
Teams already on Copilot Business or Enterprise. If you are already paying for Copilot, code review is included. Turn it on, assign Copilot as a reviewer on your PRs, and let it catch what it can. There is no additional cost and minimal setup effort.
GitHub-only teams with moderate review needs. If your entire workflow lives on GitHub and your review standards are "catch obvious bugs and security issues," Copilot's included review is likely sufficient without adding a separate tool.
Solo developers and small teams. For individual developers or teams of 2-3, Copilot Pro at $10/month provides code completion, chat, and review in one affordable package. Adding a separate review tool may not be worth the additional complexity or cost.
When review quality is your top priority. If your team has high review standards, ships security-critical software, or operates in regulated industries, the deeper analysis from dedicated tools like CodeRabbit or CodeAnt AI is worth the additional cost.
When you use GitLab, Bitbucket, or Azure DevOps. Copilot code review does not work on these platforms. Full stop. Use CodeRabbit, PR-Agent, or CodeAnt AI instead.
When you need custom enforcement rules. If your team has specific coding standards that must be enforced consistently, Copilot's probabilistic approach with a 4,000-character instruction limit is insufficient. Tools with deterministic rule engines or unlimited custom instructions provide more reliable enforcement.
When you need CI/CD integration. If code review needs to be a gate in your deployment pipeline, Copilot cannot do this. PR-Agent and CodeAnt AI offer CI/CD integration that blocks merges based on findings.
When you need learning and adaptation. If you want your review tool to get smarter over time based on your team's feedback, CodeRabbit's learnable preferences provide this capability while Copilot does not.
GitHub Copilot code review is a competent, convenient feature that provides real value for teams already in the GitHub and Copilot ecosystem. The March 2026 agentic architecture was a genuine improvement that moved it from "barely useful" to "meaningfully helpful." For teams already paying for Copilot, it is an easy addition to the review workflow.
But it is not the best AI code review tool available. Dedicated review tools catch more issues, produce fewer false positives, offer more customization, support more platforms, and provide deeper integration with development workflows. The gap between Copilot's generalist review and CodeRabbit or CodeAnt AI's specialist review is real and significant for teams with serious review requirements.
The pragmatic approach for most teams is to start with Copilot's included review, evaluate whether it catches enough of what matters to your team, and add a dedicated tool if you find the gaps unacceptable. Many organizations end up running both - Copilot for IDE assistance and a dedicated tool for PR review - because the tools solve genuinely different problems at different stages of the development workflow.
Yes. GitHub Copilot can review pull requests directly in the GitHub UI. You can request a review from 'Copilot' as a reviewer on any PR, and it will analyze the changes and leave comments on potential issues. This feature is available on Copilot Business and Enterprise plans.
Enable Copilot code review in your organization settings under Copilot > Policies. Then on any pull request, click 'Reviewers' and select 'Copilot' from the list. Copilot will automatically analyze the PR and post review comments within a few minutes.
Copilot code review requires a paid GitHub Copilot plan. Copilot Individual ($10/month) includes limited review features. Copilot Business ($19/user/month) and Enterprise ($39/user/month) include full code review capabilities with organization-wide policies.
Copilot offers tighter GitHub integration and is convenient if you already pay for Copilot. CodeRabbit provides more comprehensive reviews, supports custom review instructions, integrates with GitLab and Bitbucket, and offers a free tier. CodeRabbit typically catches more issues per PR but requires a separate tool setup.
Copilot code review checks for bugs, security vulnerabilities, performance issues, code style problems, and logic errors. It analyzes the full PR context including the diff and surrounding code. However, it does not run code or perform dynamic analysis — it's purely LLM-based static review.
Originally published at aicodereview.cc
2026-04-03 05:59:47
El concepto de Multi-Stage CD es sencillo: llevas código a prod en varias iteraciones y a través de diferentes ambientes — dev, staging, prod — con fases bien definidas: build, prepare, deploy, test, notify, rollback. Suena limpio. Y en papel, lo es.
El problema es la realidad. Según el State of DevOps Report 2020, el 95% del tiempo se va en mantenimiento de pipelines, el 80% en tareas manuales, y el 90% en remediación también manual. Nadie escribe esas métricas en su README, pero todos las vivimos.
Los retos concretos son tres y son los de siempre: la disponibilidad de ambientes (el clásico "no le muevan a dev que estoy probando algo"), satisfacer dependencias externas correctamente — JS, Python, AWS, lo que sea — y los ambientes con candado cuando hay un bug en prod y todo se paraliza. A eso le sumas llegada lenta a producción, más de siete herramientas involucradas en el proceso, y pipelines distintos para web, API y mobile que cada quien personalizó a su manera. El resultado es un Frankenstein difícil de mantener para cualquier persona del equipo.
Lo que realmente se necesita no es magia: capacidad de poner ambientes en cuarentena, dependencias siempre disponibles y seguras, configuración que realmente funcione, y despliegues validados con tests, métricas de performance y SLOs/SLIs bien definidos.
La solución que propongo es Keptn — y el título de esta sección es intencional. Keptn es una plataforma open source de orquestamiento que automatiza la configuración y provee en un solo control plane todo lo que normalmente está disperso: monitoreo, despliegue, remediación y resiliencia.
Lo que lo hace diferente es su enfoque declarativo y orientado a GitOps. Defines tus ambientes y estrategias en un archivo shipyard.yaml y Keptn se encarga de la orquestación basada en eventos. No necesitas escribir la lógica de coordinación entre herramientas — eso ya está resuelto.
Desde el punto de vista de plataforma, Keptn entrega progressive delivery, automatización de SRE, auto-remediación y rollback, y una configuración codificable e independiente de herramientas. Pero la parte más importante: mantiene conectividad con las herramientas que ya tienes — JMeter, Argo, Jenkins, Helm, lo que ya está corriendo en tu stack.
Un beneficio que no es obvio a primera vista: los pipelines tradicionales dejan de ser necesarios. Keptn reemplaza esa necesidad con fases dedicadas y orquestamiento event-driven. Tienes estrategias out-of-the-box como Blue/Green y Canary, más observabilidad integrada en el proceso con auditabilidad y trazabilidad completas.
El modelo mental es el siguiente: Keptn expone servicios a los cuales las herramientas se suscriben por medio de integraciones. Los eventos de Keptn se traducen a llamadas API hacia y desde esas herramientas.
En la práctica: Keptn crea un evento y lo distribuye a cualquier servicio que esté escuchando — por ejemplo, sh.keptn.event.hello-world.triggered. El Job Executor Service (JES) detecta el evento, busca la configuración en el YAML correspondiente y ejecuta el contenedor. Una vez que termina, el JES envía de vuelta un par de eventos .started y .finished. Keptn los recibe, sabe que la tarea está completa y avanza en la secuencia. Simple, trazable, predecible.
El ecosistema de integraciones es amplio. Para despliegue: Argo, Jenkins, CircleCI. Para observabilidad: Prometheus, Grafana, Splunk. Para testing: JMeter, Selenium, Artillery. Para notificaciones: Slack, webhooks, Tekton. Para automatización: Ansible, webhooks, AWS Lambda. La idea es clara — Keptn maneja la orquestación, las tareas y la ejecución; nosotros decidimos las herramientas.
La comparación es directa. Los pipelines tradicionales sufren de falta de separación de responsabilidades, código lleno de dependencias y personalizaciones ad hoc, y dificultad para incorporar herramientas específicas sin romper todo. Keptn resuelve esto con fases dedicadas y orquestamiento basado en eventos, interoperabilidad a través de abstracciones bien definidas, y flexibilidad real para cambiar herramientas sin reescribir la lógica de entrega.
Una vez que el flujo básico está corriendo, los casos de uso avanzados son los que realmente cambian el juego. Los Quality Gates basados en SLI/SLO permiten que un despliegue sólo avance si cumple criterios medibles — por ejemplo, que el porcentaje de éxito de probes sea mayor al 95%, o que la duración de respuesta sea menor a 200ms. El score total determina si el pipeline pasa o emite una advertencia.
El Progressive Delivery lleva esto un paso más lejos: defines un flujo que va de dev a hardening a production, con estrategias blue/green en los ambientes de mayor criticidad y remediación automatizada en prod. Keptn evalúa quality gates entre cada etapa y sólo promueve si los números lo justifican.
El punto de todo esto no es adoptar una herramienta más por el gusto de hacerlo. Es reconocer que los pipelines monolíticos tienen un techo bajo, y que un modelo orientado a eventos con separación clara de responsabilidades escala mucho mejor — tanto en complejidad técnica como en tamaño de equipo.
Si quieres profundizar, el punto de partida es keptn.sh y los recursos de la comunidad en keptn.sh/resources/slides.