MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

Filesystem for AI Agents: What I Learned Building One

2026-04-03 06:08:48

AI Systems

Filesystem for AI Agents: What I Learned Building One

Most agentic systems, like Claude Code, that run on laptops and servers, interact with files natively through bash. But building an agentic system that allows users to upload and work with files comes with its own limitations that make you unable to store files on the server the agent runs on, and give the agent the bash tool:

  1. The fact that it's exposed to users anywhere — bad actors can get it to run commands that can crash the server or exploit other stuffs, so you want only file operations
  2. Even if you allow only file operations, you can't store every user's files on the server due to storage limits, so you'll have to store files in remote storage like S3 or Azure — but mounting them will make native commands like grep slow, as it has to download the full file first
  3. Even if you had unlimited storage and didn't need mounting, you still need isolation — where the agent cannot access files uploaded by another user, or by the same user in another session

There are other solutions to these problems, but they each come with their own tradeoffs:

  • VM/sandbox platforms (E2B, Northflank) — spin up an isolated environment per conversation, which solves security and isolation. But they have cold start latency, operational overhead, and cost that compound at scale. You're managing servers again, just indirectly.
  • S3 mounting (mountpoint-s3, JuiceFS, s3fs) — mount remote object storage as a local filesystem. Grep and similar commands work, but inefficiently — each scan triggers sequential HTTP range requests that essentially download the whole file in chunks. Too slow for agent workloads.
  • just-bash (Vercel Labs) — a TypeScript reimplementation of bash with a pluggable filesystem backend. Closest to what I wanted, but TypeScript only. My pipeline is Python.
  • Localsandbox (CoPlane) — Python wrapper around just-bash, which would have solved the language problem. But it bridges Python to just-bash via a Deno runtime, adding a deployment dependency I didn't want in a Celery environment.

I ran into this problem recently while building a legal AI agentic system where users had to upload files for the agent to work with. The solution I needed had to be database-like storage that doesn't need to be spun up and down like a server, but supports native file operations that can be exposed as tools to the agent, with the agent unable to access anything outside its own scoped workspace.

Then I found AgentFS — a filesystem built specifically for AI agents, backed by Turso/SQLite. It provides scoped, isolated storage per user and session, with file operations that can be wired directly as agent tools.

Of the integration options — Python SDK, AgentFS + just-bash, AgentFS + FUSE — I went with the Python SDK. Unlike FUSE, which gives the agent a real mount but leaves the rest of the server exposed, the Python SDK puts you in full control. The agent can only do what you explicitly wire up as a tool. No shell escape, no arbitrary commands, no environment variable leaks. The isolation is in the design, not bolted on afterward.

The trade-off is that you're responsible for the tool surface. The SDK ships with the basics — read, write, list — but search operations were missing. No grep, no find, no wc. For an agent that needs to navigate files without dumping everything into context, those aren't optional. So I built them and raised a PR to have them integrated directly into the SDK.

AgentFS relies on Turso DB for hosted production use. Locally, the pattern already works — one SQLite file per user, each opened independently with full read-write access. But on a production server, you can't manage hundreds of separate database files manually. You need a single server process that can route connections to the right user's database.

Turso Cloud solves part of this — it supports creating thousands of separate databases and even lets you query across them using ATTACH. But attached databases are currently read-only. You can read from multiple user databases in one session, but you can't write to them. For an agentic system where the agent needs to create, modify, and delete files in a user's scoped workspace, read-only access isn't enough.

Turso has confirmed that full read-write ATTACH support is on their roadmap. On the AgentFS side, the open() call goes through a connect() function that can be pointed at a Turso-managed database instead of a local file — so the SDK integration path is straightforward once Turso ships the write support. Until then, full production multi-user AgentFS is blocked on this upstream feature.

Pros y Cons de las arquitecturas multi-región

2026-04-03 06:07:40

Los retos reales de ir multi-región

Antes de hablar de soluciones, hay que nombrar los retos con claridad porque es donde más se subestima el esfuerzo. El primero es elegir la solución tecnológica correcta — no todas las cargas de trabajo necesitan multi-región y no todos los servicios de AWS están disponibles igual en todas las regiones. El segundo es el manejo de fallos a escala: no basta con tener recursos en dos regiones si no has pensado cómo se comporta cada componente ante una falla. El tercero es la cercanía a los usuarios, que no siempre es puramente técnica — hay leyes, regulaciones y requisitos de soberanía de datos que dictan dónde puede vivir tu información.

Ignorar cualquiera de estos puntos al inicio garantiza una conversación mucho más difícil después.

Tolerancia a fallos: el modelo mental que todo lo rige

El concepto clave aquí es el dominio de error (fault domain). Cada componente de tu arquitectura pertenece a un dominio que define su política de falla: puede ser redundante (se replica), ignorable (su caída no afecta el sistema), o en cascada (si cae, arrastra a quien depende de él — el temido SPOF).

El problema clásico es una arquitectura donde la base de datos es un dominio en cascada dentro de una sola AZ, en una sola región. Si esa AZ tiene problemas, caes completo. La estrategia multi-región resuelve esto añadiendo un nivel más en la jerarquía de dominios, pero también introduce nuevas preguntas sobre consistencia y latencia de replicación que hay que responder explícitamente.

Las capas de una arquitectura multi-región

Pensar en capas ayuda a no perderse. Cada capa tiene sus propias decisiones y sus propios servicios.

Capa de redes. El CDN entrega contenido global con acceso seguro y rápido — CloudFront es el componente natural aquí en AWS. El DNS, específicamente Route 53, es quien realmente orquesta el tráfico entre regiones: puedes rutear por latencia, por failover, por geolocalización o con políticas ponderadas. Una buena estrategia de DNS hace más diferencia de lo que la gente espera — es literalmente el primer punto de decisión que toca cada request de usuario. Las redes internas entre regiones deben estar interconectadas y planificadas desde el inicio, no como un afterthought.

Capa de cómputo. Los servicios deben ser modulares, organizados por dominio de negocio y escalables bajo demanda. La elección entre Lambda, EC2, ECS o Kubernetes depende del caso de uso — no hay respuesta genérica, y lo que sí aplica siempre es que la capa de cómputo debe poder replicarse o levantarse en otra región sin fricción manual.

Capa de aplicación. Aquí hay un principio que marca la diferencia: la aplicación debe ser agnóstica a la región. Eso implica configuración externalizada, procesos sin estado (stateless) y secretos administrables. Un ejemplo concreto: leer el region_name desde una variable en lugar de hardcodearlo en el código. Suena básico y sin embargo es donde se rompen más arquitecturas multi-región en la práctica.

Capa de datos. Esta es la más compleja. Antes de elegir un servicio hay que identificar los patrones de acceso, el tipo de almacenamiento (bloque, archivo u objeto), el costo de replicación y dónde están los usuarios. AWS tiene soporte de replicación cross-region en DynamoDB, RDS Aurora, RDS estándar, S3, ElastiCache y DocumentDB. Cada uno tiene sus propias implicaciones de consistencia eventual vs. consistencia fuerte que hay que entender antes de decidir.

Capa de seguridad, identidad y acceso. IAM es global, lo cual simplifica la gestión de usuarios, roles y grupos. KMS permite crear llaves con capacidad multi-región. Secrets Manager puede replicar secretos en regiones secundarias — y aquí hay un detalle importante de Terraform: cuando configuras un aws_secretsmanager_secret con un bloque replica, la región secundaria se sincroniza automáticamente. Parece trivial hasta que lo necesitas en un failover real.

Monitoreo: no es opcional, es parte de la arquitectura

Una arquitectura multi-región sin observabilidad centralizada es básicamente una caja negra distribuida. CloudWatch, Config, GuardDuty y CloudTrail son servicios regionales, pero servicios como Security Hub y CloudTrail soportan agregados multi-región, lo que permite tener una vista unificada de eventos de seguridad sin tener que revisar consola por consola.

Hay un punto importante aquí: una estrategia de monitoreo requiere varias iteraciones. No sale perfecta a la primera. Herramientas como Amazon DevOps Guru ayudan a identificar comportamientos anómalos, sugerir mejoras de configuración y alertar sobre fallos críticos — complementan bien el stack base de observabilidad.

Despliegue: IaC o no escala

En arquitecturas multi-región, el despliegue manual no es una opción viable a largo plazo. Infraestructura como código (Terraform, CDK, CloudFormation) no es solo una buena práctica — es lo que permite recrear un entorno completo en otra región en minutos en lugar de días. El control de cambios debe ser granular: por cuenta, por ambiente y por región. IAM debe seguir el principio de mínimo privilegio, y los fallos deben estar controlados — es decir, un error en el despliegue de una región no debe tumbar las otras.

Un tip práctico: las nuevas regiones también funcionan muy bien como sandbox para validar nuevas funcionalidades o para simular desastres antes de que lleguen solos.

Lo que realmente hay que considerar antes de empezar

Multi-región no es gratis ni en costo ni en complejidad operacional. El operational overhead es real: cada recurso que existe en una región ahora existe en dos o más, con todo lo que eso implica en mantenimiento, monitoreo y actualizaciones. Los costos de transferencia de datos entre regiones también se acumulan rápido si no se modelan desde el inicio.

Antes de empezar, vale la pena hacer un ejercicio de planeación con una matriz de prioridad, esfuerzo, complejidad y dependencias — algo similar al Método Eisenhower. No todo tiene que regionalizarse al mismo tiempo ni con la misma urgencia. Siempre hay componentes que son candidatos naturales para regionalizarse primero (típicamente los más críticos y con menor complejidad de replicación) y otros que pueden esperar.

Un Well Architected Review es un buen punto de partida para hacer ese inventario con una metodología estructurada.

La arquitectura evoluciona

El estado final de una arquitectura multi-región en AWS se ve algo así: el usuario llega a Route 53, que rutea al CloudFront más cercano, que a su vez dirige el tráfico a la región correspondiente — donde viven el API Gateway, las Lambdas y la base de datos Aurora replicada. Todo gestionado por certificados en ACM y con tráfico distribuido por políticas de latencia o failover en DNS.

Llegar ahí no pasa de un día para otro. Llega por iteraciones, con IaC como columna vertebral y con una estrategia de DNS que desde el primer día esté pensada para escalar.

Multi-región no es un problema de servicios, es un problema de diseño. Los servicios de AWS están listos. La pregunta es si tu arquitectura, tu código y tus procesos también lo están.

Multi-Model AI Orchestration for Software Development: How I Ship 10x Faster with Claude, Codex, and Gemini

2026-04-03 06:05:40

I shipped 19 tools across 2 npm packages, got them reviewed, fixed 10 bugs, and published, all in one evening. I did not do it by typing faster. I did it by orchestrating multiple AI models the same way I would coordinate a small development team.

That shift changed how I use AI for software work. Instead of asking one model to do everything, I assign roles: one model plans, another researches, another writes code, another reviews, and another handles large-scale analysis when the codebase is too broad for everyone else.

The Problem

Most developers start with a simple pattern: open one chat, paste some code, and keep asking the same model to help with everything. That works for small tasks. It breaks down on real projects.

The first problem is context pressure. As the conversation grows, the model’s context window fills with stale details, exploratory dead ends, copied logs, and half-finished code. Even when the window is technically large enough, quality often degrades because the model is trying to juggle too many concerns at once.

The second problem is that modern codebases are not tidy, single-language systems. The projects I work on often span TypeScript, Python, C#, shell scripts, README docs, test suites, CI config, and package metadata. The mental model required to review a TypeScript AST transform is not the same as the one required to inspect Unity C# editor code or write reliable Python tests.

The third problem is that software development is not one task. It is a bundle of different tasks:

  • writing implementation code
  • researching project conventions
  • reviewing for defects
  • running builds and tests
  • comparing architectures
  • doing large-scale cross-file analysis
  • answering quick lookup questions

Using one model for all of that is like asking one engineer to do product design, coding, testing, documentation, DevOps, and code review at the same time.

The Architecture: Each Model Has a Role

I now use a multi-model setup where each model has a clear job.

Model Role Why This Model
Claude Opus (Orchestrator) Decision-making, planning, user communication, coordination Strongest reasoning, sees the big picture
Claude Sonnet (Subagent) Codebase research, file reading, build/test, pattern finding Fast, cheap, parallelizable
Codex MCP Code writing in sandbox, counter-analysis, code review Independent context, can debate with Opus
Gemini 2.5 Pro Large-scale analysis (10+ files), cross-cutting research 1M token context for massive codebases

This is the important constraint: Opus almost never reads more than three files directly, and it never writes code spanning more than two files.

Opus is my scarce resource. I want its context window reserved for decisions, tradeoffs, and coordination. If I let it spend tokens reading ten implementation files, parsing test fixtures, or editing code across half the repo, I am wasting the most valuable reasoning surface in the system.

So I deliberately make Opus act more like a tech lead than a hands-on individual contributor:

  • It decides what needs to be built.
  • It asks subagents to gather evidence.
  • It synthesizes findings into an implementation spec.
  • It asks Codex to challenge that spec.
  • It resolves disagreements.
  • It sends implementation to the right execution agent.

The Core Principle: Preserve the Orchestrator

The best model should not be your file reader, log parser, or bulk code generator.

If I need to answer questions like these:

  • What conventions does this repo use for new tools?
  • Which helper utilities are already available?
  • How do existing tests structure edge cases?
  • Where does platform-specific formatting happen?

I do not spend Opus on that. I send Sonnet agents to inspect the codebase and return structured findings. If the question spans a huge number of files, I use Gemini for the broad scan and have it summarize patterns, architectural seams, and constraints.

Then Opus makes the decision with clean inputs instead of raw noise.

Real-World Example 1: Building 4 Platform Mappers in One Session

One of the clearest examples was figma-spec-mcp, an open source MCP server that bridges Figma designs to code platforms. The package already had a React mapper, and I wanted to expand it with React Native, Flutter, and SwiftUI support while preserving shared conventions and reusing the normalized UI AST.

Instead, I split the work.

Workflow

  1. A Sonnet subagent researched the codebase: tool conventions, type patterns, existing React mapper design, shared helpers, and how the normalized AST flowed through the system.
  2. Opus synthesized those findings into a detailed implementation spec.
  3. I sent a single Codex prompt: create all three new mappers by reusing the normalized UI AST and following the discovered conventions.
  4. Codex wrote more than 2,000 lines across the new mapper surfaces.
  5. In a separate Codex review session, I asked it to review the output like a skeptical senior engineer, not like the original author.
  6. That review found ten platform-specific bugs.
  7. Three Sonnet subagents fixed those bugs in parallel.
  8. The full toolset passed TypeScript, ESLint, Prettier, and publint.

What the review caught

The review surfaced bugs that were not obvious from a green-looking implementation:

  • Flutter color output used the wrong byte ordering.
  • React Native had shadowOffset represented as a string instead of an object.
  • SwiftUI output relied on a missing color initializer.
  • A few generated platform props matched one framework’s conventions but not the actual target platform’s API.

Result

I ended that session with four platform mappers, reviewed, fixed, lint-clean, and production-ready in about two hours. The speed came from specialization and parallelism, not from asking one model to “be smarter.”

Real-World Example 2: Contributing to CoplayDev/unity-mcp

The second example was a series of open source contributions to CoplayDev/unity-mcp, a Unity MCP server with over 1,000 stars. The most significant was adding an execute_code tool that lets AI agents run arbitrary C# code directly inside the Unity Editor, with in-memory compilation via Roslyn, safety checks, execution history, and replay support.

The interesting part is how the feature gap was identified. I was already using a different Unity MCP server (AnkleBreaker) for my own projects, and I noticed it had capabilities that CoplayDev lacked. Rather than manually comparing 78 tools against 34, I had AI agents do the comparison systematically.

Workflow

  1. I identified the gap myself by working with both MCP servers daily, then used a Sonnet exploration agent to systematically map all tools from AnkleBreaker’s 78-tool set against CoplayDev’s 34 tools. The agent returned a structured comparison table showing exactly which features were missing.
  2. From that gap analysis, I picked execute_code as the highest-impact contribution: it unlocks an entire class of workflows where AI agents can inspect live Unity state, run editor automation, and validate assumptions without requiring manual steps.
  3. A Sonnet agent deep-dived CoplayDev’s dual-codebase conventions (Python MCP server + C# Unity plugin), studying the tool registration pattern, parameter handling, response envelope format, and test structure.
  4. Opus synthesized the research into a detailed implementation spec covering four actions (execute, get_history, replay, clear_history), safety checks for dangerous patterns, Roslyn/CSharpCodeProvider fallback, and execution history management.
  5. Codex wrote the full implementation: ExecuteCode.cs (C# Unity handler with in-memory compilation), execute_code.py (Python MCP tool), and test_execute_code.py (unit tests). Over 1,600 lines of additions.
  6. Opus reviewed the output and caught issues before the PR went out.
  7. The PR was merged after reviewer feedback was addressed.

What the review caught

  • Safety check patterns needed tightening for edge cases around System.IO and Process usage
  • Error line number normalization had to account for the wrapper class offset
  • Compiler selection logic needed a cleaner fallback path

Result

The execute_code tool became one of the more significant contributions to the project, enabling AI agents to do things like inspect scene hierarchies at runtime, validate component references programmatically, and run editor automation scripts. The contribution was grounded in a real gap analysis rather than guesswork, and the multi-model workflow ensured the implementation matched the project’s conventions across two languages.

Real-World Example 3: roblox-shipcheck Shooter Audit Expansion

The third example was roblox-shipcheck, an open source Roblox game audit tool. I wanted to add six shooter-genre-specific tools and expand the package around them with tests, documentation, examples, and release notes.

Workflow

  1. Background Sonnet agents worked in parallel on the README rewrite, CHANGELOG, usage examples, and unit tests.
  2. Codex wrote all six shooter tools: weapon config audit, hitbox audit, scope UI audit, mobile HUD audit, team infrastructure audit, and anti-cheat surface audit.
  3. In a separate review session, Codex reviewed the generated implementation and found eight issues.
  4. A Sonnet agent fixed those issues and got 124 tests passing.
  5. Sourcery AI, acting as an automated reviewer, found three additional issues.
  6. Another Sonnet agent addressed the review feedback and tightened the remaining edge cases.

What the review caught

The first review wave found:

  • ESLint violations
  • heuristics that were too strict for real-world projects
  • false positives for free-for-all game modes

The automated reviewer then found:

  • opportunities to consolidate shared test helpers
  • missing edge cases in the audit suite
  • rough spots in the implementation details around reuse and consistency

Result

The package ended with 49 tools total, 124 passing tests, a cleaner README, updated examples, release notes, and green CI across TypeScript, ESLint, Prettier, and SonarCloud. That is the difference between “I added some code” and “I shipped a maintainable release.”

Token Budget Rules: The Key Insight

The most important lesson in all of this is simple: your orchestrator’s context window is the scarcest resource in the system.

These are the rules I follow now:

  1. Opus reads three files or fewer per task. If I need more than that, I delegate the reading to Sonnet or Gemini and ask for a structured summary.
  2. Opus writes code in two files or fewer. If the task spans more than two files, I send it to Codex with a detailed spec.
  3. Before starting any task, I ask: “Can a subagent do this?” If the answer is yes, I stop and delegate.
  4. Codex reviews everything. Even code Codex wrote itself. The review happens in a separate session so it can challenge its own assumptions.
  5. Independent work gets parallel agents. If docs, tests, examples, and changelog updates do not depend on each other, they should run at the same time.

Here is the mental model I use:

Opus = scarce strategic bandwidth
Sonnet = cheap parallel investigation
Codex = isolated implementation and review
Gemini = massive-context research pass

Once I started treating context like a budget instead of an infinite buffer, my sessions became dramatically more reliable.

The Debate Pattern

One of the most effective techniques in this setup is what I call the debate pattern.

Instead of asking one model for a solution and immediately implementing it, I force a disagreement phase.

The process

  1. Opus analyzes the problem and proposes a solution.
  2. Codex receives that analysis and produces counter-analysis: where it agrees, where it disagrees, and what it would change.
  3. If there are conflicts, I do one follow-up round to resolve them.
  4. Once there is consensus, I convert that into an implementation plan.
  5. Codex implements.
  6. A separate Codex session reviews the result.

This works because disagreement exposes hidden assumptions.

In one session, that debate caught:

  • Flutter Color formatting confusion between 0xRRGGBBAA and 0xAARRGGBB
  • React Native Paper prop mismatch using mode where variant was correct
  • a non-existent SwiftUI Color(hex:) initializer

None of those issues were broad architectural failures. They were the kind of platform-specific correctness bugs that burn time after merge if you do not catch them early.

The debate pattern turns AI assistance from “fast autocomplete” into “adversarial design review plus implementation.”

Results

The performance difference is large enough that I now think in terms of orchestration by default.

Metric Single Model Multi-Model Orchestration
Tools shipped per session 2-3 10-15
Bugs caught before publish ~60% ~95% (Codex review)
Parallel workstreams 1 6+ simultaneous
Context preservation Degrades after 3-4 files Stays sharp (delegated)
Convention compliance Often drifts Exact match (research first)

Getting Started

If you want to try this workflow, start simple. You do not need a huge automation stack on day one. You just need role separation and a few clear rules.

My practical setup

  • Claude Code CLI with Opus as orchestrator for planning, decisions, and user-facing coordination
  • Codex MCP server (npm: codex) for implementation, sandboxed code changes, and review
  • Gemini MCP (npm: gemini-mcp-tool) for large-scale repo analysis and broad research across many files
  • Sonnet subagents via Claude Code’s Agent tool for codebase research, builds, tests, pattern extraction, docs, and support work

The most important operational detail is to write your rules down in CLAUDE.md. If the orchestrator has to rediscover your preferences every session, you lose consistency and waste tokens.

My CLAUDE.md contains rules like:

- Opus reads <= 3 files directly
- Opus writes <= 2 files directly
- Delegate codebase exploration to Sonnet
- Use Codex for implementation spanning multiple files
- Always run a separate review pass before publish
- Prefer parallel subagents for independent tasks

That single file turns ad hoc prompting into a repeatable operating model.

A good first workflow

If you want a low-friction way to start, try this:

  1. Use Sonnet to inspect the repo and summarize conventions.
  2. Use Opus to write a short implementation spec.
  3. Use Codex to implement across the affected files.
  4. Use a fresh Codex session to review for defects.
  5. Use Sonnet to fix issues and run tests.

Practical Lessons

Three habits made the biggest difference for me.

First, I stopped treating AI output as a finished artifact and started treating it as a managed workstream. Every meaningful code change has research, implementation, review, and verification phases. Different models are better at different phases.

Second, I learned that independent context is a feature, not a limitation. When Codex reviews code from a separate session, it does not inherit all the assumptions of the implementation pass. That distance is exactly why it catches bugs.

Third, I stopped optimizing for “best prompt” and started optimizing for “best routing.” The better question is: which model should spend tokens on this specific task?

Conclusion

The future of AI-assisted development is not a single omniscient model sitting in one giant chat. It is orchestration: using the right model for the right task, preserving your strongest model’s context for decisions, and letting specialized agents handle research, implementation, review, and verification.

If you are already using AI in development, my practical advice is simple: stop asking one model to do everything. Give each model a role, protect your orchestrator’s context window, and add a real review pass. That is where the 10x improvement comes from.

Migrating a Webpack-Era Federated Module to Vite Without Breaking the Host Contract

2026-04-03 06:05:00

A practical guide to migrating a federated remote to Vite, based on lessons from a real migration.

I was tasked with updating a legacy React application that did not support Module Federation. That integration was added first so the app could run as a remote inside a larger host application. Later, the remote needed to migrate from Create React App (CRA) to Vite. By that point, the host already depended on the remote's loading behavior. The tricky part was not replacing CRA with Vite. It was preserving the runtime contract while only the remote changed bundlers.

If you own a CRA or webpack-era remote that still has to load cleanly inside an existing host, this post covers the cleanup work beforehand, the core CRA-to-Vite swap, the federation-specific deployment fixes, and a local dev harness for debugging the full host loading sequence without redeploying every change.

Terms for reference

  • CRA: Create React App. For years it was the default easy on-ramp for React apps before being deprecated in 2025.
  • CRACO: Create React App Configuration Override
  • Module Federation: A way for one application to load code from another at runtime instead of bundling everything together up front.
  • Host: The application that loads another app at runtime.
  • Remote: The application that exposes code for the host to load.
  • Runtime contract: The files and exported APIs the host already expects.

Why migrate?

  1. Dependabot alerts. The biggest issue was that the CRA dependency tree had kept accumulating a number of high-risk Dependabot alerts, and patching around them was getting harder to justify.

  2. Slow builds. CRA and webpack took over a minute for a cold-start build.

  3. Too many config layers. CRACO was overriding CRA's webpack config, plus custom build scripts for module federation.

  4. Stale tooling. ESLint was still on the legacy .eslintrc format. Jest had its own separate config.

  5. Dependency rot. Years of Dependabot patches left dozens of manual resolutions in the dependency manifest that nobody fully understood anymore.

The goal was not just "swap the build tool." It was to reduce dependency risk, simplify the toolchain, and leave the project in a state that another engineer could pick up. Vite had already earned a strong reputation. What was different now was that there was finally enough maintenance pressure to justify spending sprint time on the migration.

Step 1: Remove dead weight

Before touching the build tool, everything that would conflict with Vite or had become dead weight needed to go.

Remove webpack and Babel dependencies

Some dependencies werent really "dependencies" so much as assumptions about the old toolchain:

  • Babel macros like preval.macro that ran at compile time. Vite doesnt run your app through the same pipeline that a CRA stack does.
  • CRA-specific packages like react-scripts, craco, react-app-rewired
  • Packages like jsonwebtoken that were built for Node.js and rely on polyfills that webpack injected automatically. Vite does not do this, so if anything in the browser code imports Node.js built-ins like crypto or Buffer, it will break.

Remove stale deps and manual resolutions

The package dependencies were audited and around a dozen were removed. Then the pile of old manual resolutions that had accumulated from years of Dependabot fixes was cleared out. Most of those overrides were for transitive deps of packages that were already gone.

Check for Sass compatibility

Worth checking early: a shared design system was still using deprecated Sass @import patterns, and it had to be updated before the new toolchain would build cleanly.

Step 2: The CRA-to-Vite swap

With the codebase cleaned up, the core migration came down to a few straightforward steps:

  1. Replace CRA/CRACO config with a single vite.config.ts
  2. Move index.html from public/ to the project root and point it at the module entry
  3. Rename REACT_APP_* env vars to VITE_*; in application code, replace process.env usage with import.meta.env
  4. Update any legacy ReactDOM.render calls to createRoot
  5. Modernize surrounding tooling where it made sense, like moving ESLint to flat config
  6. Update scripts for vite, vite build, vite preview, and vitest

Replace Jest with Vitest

Once Vite was the build tool, Vitest was the obvious test runner. It shares the same config file, understands the same path aliases, and removed a lot of separate config glue.

Add the test config directly to vite.config.ts:

import { defineConfig } from 'vite';

export default defineConfig({
  // ...build config above...
  test: {
    globals: true,
    environment: 'jsdom',
    setupFiles: './src/test/setup.ts',
    coverage: {
      reporter: ['text', 'html'],
      include: ['src/**/*.{ts,tsx}'],
    },
  },
});

No separate jest.config.js. No babel-jest transform. No moduleNameMapper to keep in sync with path aliases.

Step 3: Module Federation with Vite

This is where the migration stopped being a normal bundler swap. The host still ran webpack and expected all of this to keep working:

host -> fetch asset-manifest.json
host -> load remoteEntry.js
host -> init shared scope
host -> get exposed module
host -> call inject(container, props)
host -> later call unmount()

Configuring the federation plugin

Install @module-federation/vite and add it to your Vite config:

import react from '@vitejs/plugin-react';
import { federation } from '@module-federation/vite';
import { defineConfig } from 'vite';

export default defineConfig({
  plugins: [
    react(),
    federation({
      name: 'remoteApp',
      filename: 'remoteEntry.js',
      exposes: {
        './RemoteModule': './src/remote/entry.ts',
      },
    }),
  ],
  // ...
});

The exposed entry file should export the lifecycle functions the host expects:

// src/remote/entry.ts
export { inject, unmount } from './RemoteModule';
export { default } from './RemoteModule';
import { MemoryRouter } from 'react-router-dom';
import { createRoot, type Root } from 'react-dom/client';
import App from '../App';

let root: Root | null = null;

export const inject = (
  container: string | HTMLElement,
  _props?: Record<string, unknown>
): void => {
  const element =
    typeof container === 'string'
      ? document.getElementById(container)
      : container;
  if (!element) return;

  // Guard against duplicate roots if the host mounts twice.
  root?.unmount();

  root = createRoot(element);
  root.render(
    <MemoryRouter>
      <App />
    </MemoryRouter>
  );
};

export const unmount = (): void => {
  if (root) {
    root.unmount();
    root = null;
  }
};

Note: The inject(container, props) and unmount() API here is host-specific. MemoryRouter made sense because the embedded remote needed internal navigation but not deep-linkable standalone URLs. Standalone development used BrowserRouter instead.

Generating a host-compatible asset manifest

The host fetched asset-manifest.json and expected specific keys for remoteEntry.js and main.css. Vite produced a different file (manifest.json) with a different shape, so even after renaming the file, the host couldnt parse it.

The fix was a small Vite plugin that generates a compatible manifest after the build:

import { promises as fs } from 'node:fs';
import path from 'node:path';
import type { Plugin } from 'vite';

export const rewriteHostManifest = (): Plugin => ({
  name: 'rewrite-host-manifest',
  async writeBundle(options, bundle) {
    const outputDir = options.dir || 'dist';
    const files = Object.keys(bundle);
    const remoteEntry = files.find((file) => file.endsWith('remoteEntry.js'));
    const mainCss = files.find((file) => file.endsWith('.css'));
    if (!remoteEntry || !mainCss) {
      throw new Error('remoteEntry.js not found in bundle output');
    }
    const manifest = {
      files: {
        'remoteEntry.js': `/${remoteEntry}`,
        'main.css': `/${mainCss}`,
      },
    };
    await fs.writeFile(
      path.join(outputDir, 'asset-manifest.json'),
      JSON.stringify(manifest, null, 2)
    );
  },
});

Add it to the plugins:

plugins: [
  react(),
  federation({ /* ... */ }),
  rewriteHostManifest(),
],

Adapt the manifest shape to whatever the host actually reads. This was specific to this setup.

Confirm the base path

If the built assets are served from a CDN or cloud storage bucket, you need to tell Vite:

export default defineConfig({
  base: process.env.ASSET_BASE_PATH || '/',
  // ...
});

Without this, Vite generates root-relative paths like /assets/chunk-abc123.js. The host resolves those relative to its own origin, which in this case served index.html instead of the JS file, producing MIME type errors. Setting base to the bucket or CDN path fixed it.

Split fonts from the main bundle (if applicable)

The module bundled custom fonts, but the host already loaded the same fonts globally. The fix was to move the @font-face declarations into a separate SCSS file and only import it in standalone mode, not in the federated entry.

Step 4: Local federation dev harness

This was the biggest QOL improvement, and probably the most reusable part of the migration. Testing a federated module usually means deploying to a test environment and loading it through the host. That's a slow feedback loop. Instead, a local dev harness was built to replicate the host's loading sequence.

The harness used vite build --watch plus vite preview instead of the normal dev server because the goal was to validate the real emitted artifacts: asset-manifest.json, remoteEntry.js, built CSS, and chunk URLs. The standard dev server is great for app development, but it doesnt produce the same output the host will actually fetch in production.

The harness did the following:

  1. Build the module in development mode with vite build
  2. Keep rebuilding with vite build --watch
  3. Serve the output with vite preview
  4. Use a simple intermediary UI to collect runtime props (locale, auth token, environment details)
  5. Fetch asset-manifest.json from the local preview server
  6. Load remoteEntry.js
  7. Call container.init() and container.get()
  8. Call inject() with configurable props and verify unmount() cleanup

That made it possible to test the full federation lifecycle locally, including script loading, module init, prop injection, CSS loading, auth handling, and unmount cleanup, without deploying anything.

The entry point ended up with three runtime modes:

// src/main.tsx
if (import.meta.env.VITE_USE_FEDERATION_HARNESS === 'true') {
  const { FederationHarness } = await import('./dev/FederationHarness');
  root.render(<FederationHarness />);
} else if (import.meta.env.VITE_EMBEDDED_MODE === 'true') {
  const { FederatedEntry } = await import('./remote/FederatedEntry');
  root.render(<FederatedEntry />);
} else {
  const { StandaloneEntry } = await import('./standalone/StandaloneEntry');
  root.render(<StandaloneEntry />);
}
  • start runs standalone app development
  • dev runs federation development against a local preview server
  • build produces the production remote for the real host

Pitfalls to watch out for

  • Vite's manifest is not webpack's manifest. Dont assume the formats will match.
  • base matters for remote hosting. Forget it and every chunk import will 404 or return HTML instead of JavaScript.
  • Shared dependencies are not automatic wins. They are one of the biggest selling points of Module Federation, but cross-bundler setups and older integration contracts can make them risky to use.
  • Suppress lint rules temporarily. A build tool migration will surface new lint errors from updated configs. Add temporary warn overrides and fix them in separate PRs and keep momentum.
  • Fix things at the source. For example, dont patch CI when the build config is wrong :)

Verification

These were the checks that mattered more than "the build passed":

  • Standalone development still worked with the app's normal router and env vars
  • The local federation harness could fetch asset-manifest.json, load remoteEntry.js, and mount the module
  • CSS loaded correctly from the built output
  • Production hosting used the correct base path and chunk URLs all resolved correctly
  • Full regression test of all features

Results

  • Resolved all the open dependabot alerts
  • Removed .babelrc, craco.config.js, jest.config.js, and custom webpack overrides
  • Consolidated build, dev, preview, and test config into vite.config.ts
  • Cold-start build time went from 63.4s in CRA/webpack to 9.3s in Vite
  • The lockfile diff had a reduction of ~10k lines

If you're maintaining a federated micro frontend on CRA, the path to Vite is worth the effort. Just remember to analyze the host's loading contract and build yourself a local harness that exercises the real federation lifecycle.

A note on Vite 8: Vite 8 shipped recently, after this migration was already complete. Its release notes mention Module Federation support as one of the capabilities unlocked by the new Rolldown-based architecture, which looks promising. If I were starting today, I would look into this first.

References

GitHub Copilot Code Review: Complete Guide (2026)

2026-04-03 06:00:00

What Is GitHub Copilot Code Review?

GitHub Copilot Code Review screenshot

GitHub Copilot code review is an AI-powered feature that analyzes pull requests directly within the GitHub interface and posts inline comments on potential bugs, security issues, performance problems, and code quality concerns. Instead of waiting hours or days for a human reviewer to look at your PR, you can assign Copilot as a reviewer and receive automated feedback within minutes.

This feature is part of GitHub's broader strategy to embed AI into every stage of the software development lifecycle. Copilot started as an inline code completion tool in 2022, expanded to include chat in 2023, added code review in 2024, and launched an autonomous coding agent in late 2025. Code review fits naturally into this trajectory - if Copilot can help you write code, it should also be able to help you review it.

The March 2026 agentic architecture overhaul was the turning point. Before this update, Copilot's code review was limited to shallow, line-by-line diff analysis that often produced generic comments. The new agentic system uses tool-calling to actively explore your repository, read related files, trace cross-file dependencies, and build broader context before generating review comments. This is a fundamental architectural shift from "look at the diff and comment" to "understand the change in context and then comment."

GitHub reports that Copilot has processed over 60 million code reviews since the feature launched, and adoption has accelerated significantly after the agentic update. For teams already paying for Copilot Business or Enterprise, code review is included at no additional cost, which makes it the path of least resistance for organizations looking to add AI review to their workflow.

That said, Copilot code review is one feature within a generalist AI coding platform. It competes against dedicated review tools like CodeRabbit, CodeAnt AI, and PR-Agent that do nothing but code review and have optimized their entire architecture for that single use case. Whether Copilot's code review is sufficient for your team depends on your review standards, your git platform, and how much customization you need.

How GitHub Copilot Code Review Works

Understanding the underlying mechanics helps set realistic expectations for what Copilot can and cannot catch. The system works in three stages: context gathering, LLM-based analysis, and comment generation.

Context Gathering

When you request a review from Copilot on a pull request, the agentic architecture begins by collecting context about the change. This goes beyond simply reading the diff. The system:

  1. Reads the full diff of all changed files, including additions, deletions, and modifications.
  2. Examines surrounding code in the changed files, reading the full file content rather than just the modified lines. This allows Copilot to understand how the changes fit within the broader file structure.
  3. Traces imports and dependencies by following import statements and function calls to related files. If your change modifies a function that is called from three other modules, Copilot attempts to read those modules to understand downstream impact.
  4. Reads the PR description and commit messages to understand the developer's stated intent. This helps Copilot evaluate whether the implementation matches the described goal.
  5. Examines directory structure to understand project organization and conventions.

This context-gathering step is what distinguishes the post-March 2026 version from the earlier line-level analysis. However, the amount of context Copilot can gather is constrained by the model's context window and the time budget allocated per review. For very large PRs or monorepos with deep dependency chains, the system may not trace every relevant file.

LLM-Based Analysis

With context assembled, Copilot feeds the information to a large language model for analysis. The model evaluates the code changes against several dimensions:

  • Correctness: Does the code do what it is supposed to do? Are there logic errors, off-by-one mistakes, null reference risks, or unhandled edge cases?
  • Security: Are there potential vulnerabilities like SQL injection, cross-site scripting, hardcoded credentials, insecure deserialization, or path traversal?
  • Performance: Are there obvious performance anti-patterns like unnecessary database queries inside loops, missing pagination, or blocking operations on the main thread?
  • Readability: Is the code clear and maintainable? Are variable names descriptive? Are functions appropriately sized?
  • Best practices: Does the code follow common patterns for the language and framework in use?

Copilot supports multiple underlying models (GPT-5.4, Claude Opus 4, Gemini 3 Pro), and the model used for code review may vary. The analysis is purely static - Copilot does not execute the code, run tests, or perform dynamic analysis. Everything it identifies comes from pattern recognition and reasoning over the code text.

Comment Generation

After analysis, Copilot generates inline review comments attached to specific lines in the PR diff. Each comment typically includes:

  • A description of the identified issue
  • An explanation of why it matters (potential impact)
  • A suggested fix, often presented as a code suggestion that the developer can apply with one click

Comments are posted as a standard GitHub review, appearing in the same conversation thread as human reviews. Developers can reply to Copilot's comments, dismiss them, or apply the suggested fixes directly. The experience is seamless within the GitHub UI - there is no separate dashboard or interface to learn.

Copilot can also read custom instructions from a copilot-instructions.md file in your repository. This file lets you specify review guidelines, coding conventions, or areas of focus. However, the file is limited to 4,000 characters, which constrains how detailed your instructions can be.

Setup Guide

Setting up Copilot code review is straightforward for teams already using GitHub and Copilot, but there are specific requirements and configuration steps depending on your plan.

Prerequisites

Before you can use Copilot code review, you need:

  1. A GitHub Copilot plan that includes code review. Copilot Pro ($10/month) includes 300 premium requests per month, and each code review consumes premium requests. Copilot Business ($19/user/month) and Enterprise ($39/user/month) include code review with their respective premium request allocations. The free tier includes only 50 premium requests per month, which is too limited for regular review usage.

  2. A GitHub repository. Copilot code review works exclusively on GitHub. It does not support GitLab, Bitbucket, or Azure DevOps. If your team uses any other git platform, Copilot code review is not an option.

  3. Copilot enabled for your organization (for Business and Enterprise plans). Individual Pro subscribers can use code review on their personal repositories without additional setup.

Enabling for Your Organization

For Copilot Business and Enterprise plans, an organization administrator needs to enable code review in the org settings:

  1. Navigate to your GitHub organization's Settings page.
  2. Go to Copilot in the left sidebar, then select Policies.
  3. Under the Code review section, set the policy to Enabled for all members, or configure it for specific teams.
  4. Optionally, set a premium request spending limit to control costs from code review usage.

Organization admins can also configure which repositories Copilot is allowed to review and set policies for how review comments are displayed.

Enabling for Your Repository

At the repository level, you can further customize Copilot's behavior:

  1. Create a .github/copilot-instructions.md file in your repository root to provide custom review guidelines. For example:
## Code Review Instructions

- Always check for null/undefined before accessing object properties
- Flag any database queries that don't use parameterized inputs
- Ensure all API endpoints have proper error handling
- Warn about functions exceeding 50 lines
  1. Optionally, configure review scope in the repository settings under Copilot to exclude certain file paths or patterns from review (such as generated files or vendor directories).

Remember the 4,000-character limit on the instructions file. Prioritize your most important review criteria rather than trying to be exhaustive.

Requesting a Review on a PR

Once Copilot is enabled, requesting a review is simple:

  1. Open a pull request on GitHub (or navigate to an existing one).
  2. Click the Reviewers gear icon in the right sidebar.
  3. Select Copilot from the reviewer list. It appears alongside human team members.
  4. Copilot will begin analyzing the PR and typically posts its review within 2-5 minutes.

You can also trigger a review by commenting @copilot review on the pull request. This is useful when you want Copilot to re-review after pushing additional commits.

Copilot posts its review as a standard GitHub PR review with inline comments. You can interact with these comments just as you would with human review comments - reply, resolve, or apply suggested fixes.

What Copilot Catches

Copilot's agentic code review catches a meaningful range of issues across several categories. Here are concrete examples from real-world usage patterns.

Bug Detection

Copilot is reasonably effective at catching common bug patterns, particularly null reference errors, off-by-one mistakes, and incorrect logic flow.

Example: Missing null check

async function getUser(userId: string) {
  const user = await db.users.findOne({ id: userId });
  return user.name; // Copilot flags: user could be null
}

Copilot would comment something like: "The result of findOne could be null if no user matches the given ID. Accessing .name without a null check will throw a TypeError at runtime. Consider adding a null check or using optional chaining (user?.name)."

Example: Off-by-one in loop boundary

def process_items(items):
    for i in range(1, len(items)):  # Copilot flags: starts at 1, skips first item
        transform(items[i])

Copilot would note that the loop starts at index 1, which skips the first element. Depending on the intent, this could be a bug or deliberate - but Copilot flags it for the developer to confirm.

Security Vulnerability Detection

Copilot identifies common security anti-patterns, though its coverage is narrower than dedicated SAST tools.

Example: SQL injection risk

def get_orders(user_id):
    query = f"SELECT * FROM orders WHERE user_id = '{user_id}'"
    return db.execute(query)

Copilot flags this as a SQL injection vulnerability and suggests using parameterized queries instead:

def get_orders(user_id):
    query = "SELECT * FROM orders WHERE user_id = %s"
    return db.execute(query, (user_id,))

Example: Hardcoded credentials

const client = new S3Client({
  credentials: {
    accessKeyId: "AKIAIOSFODNN7EXAMPLE",
    secretAccessKey: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
  },
});

Copilot identifies hardcoded AWS credentials and recommends using environment variables or a secrets manager. This is a pattern that most AI review tools catch reliably.

Performance Issues

Copilot flags certain performance anti-patterns, particularly around database queries and algorithmic inefficiency.

Example: N+1 query pattern

async function getOrdersWithProducts(orderIds: string[]) {
  const orders = await db.orders.findMany({ where: { id: { in: orderIds } } });
  for (const order of orders) {
    order.products = await db.products.findMany({
      where: { orderId: order.id },
    });
  }
  return orders;
}

Copilot identifies the N+1 query pattern - one query for orders, then one additional query per order for products - and suggests batching the product lookup into a single query with a WHERE orderId IN (...) clause.

Code Style Issues

Copilot comments on code clarity, naming conventions, and maintainability concerns. These tend to be the most subjective comments and the source of most false positives.

Examples include flagging overly long functions, suggesting more descriptive variable names, recommending extraction of duplicated logic into shared utilities, and noting inconsistencies with the surrounding code style.

What Copilot Misses

Being honest about limitations is important for setting the right expectations. Copilot code review has several meaningful gaps that teams should understand before relying on it as their primary review mechanism.

No Custom Rule Engine

Copilot does not support custom deterministic rules. You cannot tell it "every API endpoint must call auditLog() before returning" or "all database models must include a createdAt field" and have it enforce those rules consistently across every PR. The copilot-instructions.md file provides soft guidance to the LLM, but compliance is probabilistic rather than guaranteed. Some PRs will catch the violation; others will miss it entirely.

Dedicated tools like CodeRabbit support natural language review instructions without character limits, and tools like CodeAnt AI include deterministic rule engines that enforce custom patterns with zero false negatives on defined rules.

No Built-In Linters

Copilot does not bundle deterministic linters like ESLint, Pylint, Golint, or RuboCop. It relies entirely on LLM-based analysis, which means it can catch the spirit of style violations but may miss specific rule violations that a deterministic linter would always flag. CodeRabbit includes 40+ built-in linters that run alongside its AI analysis, creating a dual-layer approach that catches both subtle semantic issues and concrete rule violations.

Limited Context Window

Despite the agentic architecture, Copilot's ability to gather context is bounded by the model's context window and the time budget per review. For large PRs (500+ lines changed across dozens of files) or monorepos with deep dependency chains, Copilot may not trace every relevant relationship. Users report that on very large PRs, the review quality degrades noticeably, with Copilot sometimes commenting only on a subset of changed files.

No CI/CD Integration

Copilot code review does not integrate into CI/CD pipelines. It operates exclusively within the GitHub PR interface. You cannot run Copilot's review as a step in a GitHub Actions workflow, gate merges based on Copilot's findings, or pipe review results into other tools. Dedicated review tools like PR-Agent and CodeAnt AI offer CI/CD integration that allows you to incorporate AI review into your automated pipeline and enforce review gates.

No Learning from Team Feedback

Copilot does not learn from your team's review patterns. If your team consistently dismisses a certain type of comment, Copilot will continue making that same comment on future PRs. There is no feedback loop that adapts the review to your team's preferences over time. CodeRabbit's learnable preferences system explicitly addresses this - the more your team interacts with its reviews, the more accurately it aligns with your standards.

No Cross-Platform Support

This is a hard constraint. Copilot code review works on GitHub and only GitHub. Teams using GitLab, Bitbucket, or Azure DevOps cannot use this feature at all. For organizations with repositories spread across multiple git platforms, Copilot code review covers only a portion of their workflow.

False Positive Rate

In practice, Copilot's false positive rate on code review is noticeable. Users report that roughly 15-25% of Copilot's review comments are either incorrect, irrelevant, or so vague as to be unhelpful. This is higher than specialist tools - CodeRabbit's false positive rate is approximately 8% in testing, and DeepSource claims sub-5%. A high false positive rate erodes developer trust and can lead teams to ignore Copilot's comments entirely, defeating the purpose of automated review.

No Project Management Integration

Copilot does not pull context from external project management tools like Jira or Linear. It cannot verify that a PR's implementation matches the requirements described in a linked ticket. CodeRabbit integrates with Jira and Linear, pulling issue context into its review analysis to verify that the code changes align with the stated requirements.

Comparison with Dedicated Review Tools

Copilot code review competes directly with tools built specifically for AI-powered PR review. Here is how it stacks up against the three most prominent alternatives.

GitHub Copilot vs CodeRabbit

CodeRabbit is the most widely used dedicated AI code review tool, having reviewed over 13 million pull requests across more than 2 million repositories. The comparison between CodeRabbit and Copilot comes down to specialist depth versus generalist convenience.

Where CodeRabbit wins:

  • Deeper reviews. CodeRabbit's entire architecture is optimized for PR review. It caught 87% of intentionally planted issues in testing, compared to Copilot's estimated 60-70%.
  • 40+ built-in linters. Deterministic rules from ESLint, Pylint, Golint, RuboCop, and others run alongside AI analysis. Copilot has no built-in linters.
  • Learnable preferences. CodeRabbit adapts to your team's review patterns over time. Copilot does not learn from feedback.
  • Multi-platform support. GitHub, GitLab, Azure DevOps, and Bitbucket. Copilot is GitHub-only.
  • Unlimited custom instructions. No character limit on natural language review guidelines, versus Copilot's 4,000-character cap.
  • Free tier for review. CodeRabbit's free plan offers unlimited repos with AI review. Copilot's free tier limits you to 50 premium requests per month.
  • Project management integration. Jira and Linear integration for requirement-aware reviews.

Where Copilot wins:

  • Zero setup for GitHub teams. If you already have Copilot Business, code review works instantly with no additional tool installation.
  • All-in-one platform. Code completion, chat, coding agent, and review in one subscription. CodeRabbit does only review.
  • Multi-model selection. Choose GPT-5.4, Claude Opus 4, or Gemini 3 Pro. CodeRabbit uses a proprietary model pipeline.
  • Lower incremental cost. If you already pay for Copilot, review is included. Adding CodeRabbit is $24/user/month on top.

GitHub Copilot vs PR-Agent

PR-Agent (by Qodo, formerly CodiumAI) is an open-source AI code review tool that can be self-hosted for free or used as a hosted service.

Where PR-Agent wins:

  • Open source and self-hosted. Full source code available; run it on your own infrastructure with no data leaving your environment.
  • CI/CD integration. Can run as a GitHub Action, GitLab CI step, or Jenkins plugin. Copilot cannot integrate into CI pipelines.
  • Configurable prompts. You control the exact prompts sent to the LLM, allowing deep customization of review behavior.
  • Multi-platform support. GitHub, GitLab, Bitbucket, and Azure DevOps.
  • Cost control. Self-hosting with your own LLM API keys means you pay only for API usage, not per-seat SaaS fees.

Where Copilot wins:

  • No infrastructure to manage. PR-Agent self-hosting requires server setup, API key management, and ongoing maintenance.
  • Broader feature set. Copilot includes code completion, chat, and agent capabilities alongside review.
  • Smoother UX. Native GitHub integration versus PR-Agent's bot-based approach.
  • No LLM API costs to manage. Copilot's pricing is predictable per-seat; PR-Agent's self-hosted costs depend on LLM token usage.

GitHub Copilot vs CodeAnt AI

CodeAnt AI combines AI code review with static analysis, security scanning, and secrets detection in a single platform.

Where CodeAnt AI wins:

  • Integrated SAST. Built-in static analysis with 300,000+ rules, not just LLM-based review. Copilot has no deterministic static analysis.
  • Secrets detection. Dedicated scanning for hardcoded secrets, API keys, and credentials across the codebase history. Copilot may catch obvious hardcoded strings but lacks a systematic secrets scanner.
  • Multi-platform. GitHub, GitLab, Bitbucket, and Azure DevOps.
  • CI/CD gating. Can block merges based on findings. Copilot cannot enforce merge gates.
  • Free tier. Basic plan at no cost for small teams.

Where Copilot wins:

  • Broader AI capabilities. Code completion, chat, and agent mode are not part of CodeAnt AI's scope.
  • Larger ecosystem. Copilot benefits from GitHub's massive developer ecosystem and continuous investment.
  • IDE integration. Full IDE support for coding assistance, whereas CodeAnt AI focuses on the PR review workflow.

Comparison Table

Feature GitHub Copilot CodeRabbit PR-Agent CodeAnt AI
Primary focus AI coding platform AI PR review AI PR review (OSS) AI review + SAST
Review approach Agentic LLM LLM + 40 linters Configurable LLM LLM + static analysis
Free tier (review) 50 premium requests/mo Unlimited repos Free (self-hosted) Yes (Basic plan)
Paid pricing $10-39/user/mo $24/user/mo $30/user/mo (hosted) $24/user/mo
GitHub Yes Yes Yes Yes
GitLab No Yes Yes Yes
Bitbucket No Yes Yes Yes
Azure DevOps No Yes Yes Yes
Custom rules copilot-instructions.md (4K chars) Unlimited natural language Custom prompts 300K+ static rules
Built-in linters None 40+ None Yes
CI/CD integration No N/A Yes Yes
Learnable preferences No Yes No No
Self-hosted option No Enterprise only Yes (free) Enterprise only
Code completion Yes No No No
Chat assistant Yes No No No
Coding agent Yes No No No

Pricing Analysis

Understanding the true cost of Copilot code review requires looking beyond the headline prices because code review is bundled with other features and consumed through the premium request system.

Copilot Plans That Include Code Review

Plan Price Premium Requests/Month Code Review Best For
Free $0 50 Limited Trying out the feature
Pro $10/month 300 Yes Individual developers
Pro+ $39/month 1,500 Yes Power users
Business $19/user/month Per-policy Yes Teams and organizations
Enterprise $39/user/month 1,000/user Yes Large organizations

Each code review consumes premium requests. A typical review of a medium-sized PR (100-300 lines changed) uses 1-3 premium requests. For a developer opening 3-5 PRs per week, that translates to roughly 12-60 premium requests per month just for code review. On the Pro plan with 300 premium requests, this is manageable alongside chat and other features. On the free tier with 50 requests, code review competes with chat for a tiny budget.

Cost Per Developer Per Month

For a team evaluating Copilot code review specifically:

Team Size Copilot Business CodeRabbit Pro PR-Agent (hosted) CodeAnt AI
5 devs $95/month $120/month $150/month $120/month
10 devs $190/month $240/month $300/month $240/month
25 devs $475/month $600/month $750/month $600/month
50 devs $950/month $1,200/month $1,500/month $1,200/month

Copilot appears cheapest per seat because it bundles code review with code completion, chat, and the coding agent. If your team uses all these features, the per-feature cost is extremely competitive. However, if your team only needs code review and already has other tools for completion and chat, you are paying for features you do not use.

Value Comparison with Standalone Tools

The key question is whether the price difference justifies the review quality gap.

If your team already pays for Copilot Business or Enterprise: Code review is included at no additional cost. Using it is a no-brainer as a baseline. The question becomes whether to add a dedicated tool like CodeRabbit ($24/user/month additional) for deeper reviews. For teams with strict review standards or multi-platform requirements, the additional cost is justified. For teams with moderate review needs on GitHub only, Copilot's included review may be sufficient.

If your team does not currently use Copilot: Paying $19/user/month for Copilot Business solely for code review is harder to justify when CodeRabbit's free tier provides unlimited AI review at zero cost, or when PR-Agent can be self-hosted for free. In this scenario, choosing a dedicated review tool makes more economic sense unless you also want Copilot's completion and chat features.

If your team uses GitLab, Bitbucket, or Azure DevOps: Copilot code review is not available. This eliminates it from consideration entirely, regardless of pricing.

Real PR Examples

To ground this discussion in concrete reality, here are three scenarios that illustrate what Copilot code review produces in practice.

Example 1: Bug Caught - Unhandled Promise Rejection

A developer opens a PR that adds a new API endpoint for processing payments:

app.post("/api/payments", async (req, res) => {
  const { amount, customerId } = req.body;
  const customer = await stripe.customers.retrieve(customerId);
  const charge = await stripe.charges.create({
    amount,
    currency: "usd",
    customer: customer.id,
  });
  res.json({ success: true, chargeId: charge.id });
});

Copilot's review comment: "This endpoint has no error handling. If stripe.customers.retrieve fails (e.g., invalid customer ID, network error, or Stripe API outage), the unhandled promise rejection will crash the process or return a 500 error with no useful message. Wrap the Stripe calls in a try-catch block and return appropriate error responses."

Copilot provides a suggested fix:

app.post("/api/payments", async (req, res) => {
  try {
    const { amount, customerId } = req.body;
    if (!amount || !customerId) {
      return res.status(400).json({ error: "Missing required fields" });
    }
    const customer = await stripe.customers.retrieve(customerId);
    const charge = await stripe.charges.create({
      amount,
      currency: "usd",
      customer: customer.id,
    });
    res.json({ success: true, chargeId: charge.id });
  } catch (error) {
    console.error("Payment processing failed:", error);
    res.status(500).json({ error: "Payment processing failed" });
  }
});

This is a genuine, helpful catch. The comment is specific, the explanation is clear, and the suggested fix is correct. This represents Copilot at its best.

Example 2: Security Issue - Path Traversal

A PR adds a file download endpoint:

@app.route("/download/<filename>")
def download_file(filename):
    file_path = os.path.join("/uploads", filename)
    return send_file(file_path)

Copilot's review comment: "This endpoint is vulnerable to path traversal. An attacker could request /download/../../etc/passwd to access arbitrary files on the server. The os.path.join function does not prevent directory traversal when the filename contains .. sequences. Validate the filename and ensure the resolved path stays within the intended upload directory."

Copilot suggests:

@app.route("/download/<filename>")
def download_file(filename):
    safe_filename = secure_filename(filename)
    file_path = os.path.join("/uploads", safe_filename)
    if not os.path.abspath(file_path).startswith(os.path.abspath("/uploads")):
        abort(403)
    return send_file(file_path)

This is another strong catch. Path traversal is a common vulnerability that Copilot identifies reliably because it matches well-known patterns the LLM was trained on.

Example 3: False Positive - Unnecessary Suggestion

A PR refactors a utility function:

func formatUserName(first, last string) string {
    return strings.TrimSpace(first) + " " + strings.TrimSpace(last)
}

Copilot's review comment: "Consider using fmt.Sprintf instead of string concatenation for better readability: return fmt.Sprintf(\"%s %s\", strings.TrimSpace(first), strings.TrimSpace(last))"

This is a false positive in most teams' judgment. String concatenation with + is perfectly idiomatic in Go for simple cases, and fmt.Sprintf is not inherently more readable for a two-string join. The suggestion adds no meaningful value and could be actively confusing if a junior developer takes it as a required change. This type of stylistic bikeshedding is where Copilot's review adds noise rather than signal.

False positives like this are not catastrophic - developers learn to dismiss them. But they consume attention and erode trust. When 15-25% of comments are in this category, the cognitive overhead of triaging review comments becomes a real cost.

Best Practices

Getting the most value from Copilot code review requires understanding where it fits in your workflow and setting appropriate expectations.

When to Use Copilot Review

Use Copilot review as a first pass, not a final review. Copilot is best positioned as a fast, automated first pass that catches obvious issues before a human reviewer looks at the PR. It catches null reference bugs, missing error handling, common security anti-patterns, and performance issues quickly. Think of it as a safety net that reduces the burden on human reviewers rather than replacing them.

Use it for all PRs, not just large ones. Even small PRs can contain security vulnerabilities or logic errors. Since Copilot review takes only 2-5 minutes and requires no effort from the PR author, there is little downside to making it a standard part of every PR.

Do not gate merges on Copilot review alone. Copilot's false positive rate and limited context awareness mean it should not be the sole gatekeeper for code quality. Always require human review for critical code paths, security-sensitive changes, and architectural decisions.

Combining with Other Tools

Many teams get the best results by combining Copilot with a dedicated review tool:

  • Copilot for IDE assistance + CodeRabbit for PR review. Use Copilot's code completion and chat while writing code, then let CodeRabbit handle the deep PR review. This gives you the breadth of Copilot's IDE features and the depth of CodeRabbit's review specialization.
  • Copilot for review + a SAST tool for security. If security scanning is critical, pair Copilot's general review with Semgrep or Snyk Code for dedicated security analysis. Copilot catches common security patterns but is not a replacement for taint analysis and CVE database matching.
  • Copilot for GitHub repos + PR-Agent for other platforms. If your organization uses multiple git platforms, use Copilot for GitHub repositories and self-hosted PR-Agent for GitLab or Bitbucket repos to get AI review coverage everywhere.

Setting Expectations

Be transparent with your team about what Copilot code review can and cannot do:

  • It will miss things. No AI review tool catches everything. Copilot will miss some bugs, some security issues, and most architectural concerns. It is a supplement to human review, not a replacement.
  • It will flag non-issues. The 15-25% false positive rate means developers will need to exercise judgment about which comments to act on. Establish a team norm that dismissing a Copilot comment is perfectly acceptable.
  • It does not learn. Unlike tools with learnable preferences, Copilot will keep making the same types of suggestions regardless of how your team responds. Manage this expectation upfront.
  • Custom instructions help but have limits. Invest time in writing a good copilot-instructions.md file, but understand that compliance with those instructions is probabilistic. For hard requirements, use deterministic linters in your CI pipeline.

Writing Effective Custom Instructions

To maximize the value of the 4,000-character copilot-instructions.md budget:

  1. Prioritize your top 5-10 rules. Do not try to cover everything. Focus on the issues that matter most to your team.
  2. Be specific. "Check for security issues" is too vague. "Flag any SQL query constructed with string concatenation or f-strings" is actionable.
  3. Include examples. Show what bad code looks like and what the fix should be.
  4. Update regularly. As your team encounters new patterns, update the instructions file to address them.

Verdict

Who Should Use Copilot Code Review

Teams already on Copilot Business or Enterprise. If you are already paying for Copilot, code review is included. Turn it on, assign Copilot as a reviewer on your PRs, and let it catch what it can. There is no additional cost and minimal setup effort.

GitHub-only teams with moderate review needs. If your entire workflow lives on GitHub and your review standards are "catch obvious bugs and security issues," Copilot's included review is likely sufficient without adding a separate tool.

Solo developers and small teams. For individual developers or teams of 2-3, Copilot Pro at $10/month provides code completion, chat, and review in one affordable package. Adding a separate review tool may not be worth the additional complexity or cost.

When to Choose Alternatives

When review quality is your top priority. If your team has high review standards, ships security-critical software, or operates in regulated industries, the deeper analysis from dedicated tools like CodeRabbit or CodeAnt AI is worth the additional cost.

When you use GitLab, Bitbucket, or Azure DevOps. Copilot code review does not work on these platforms. Full stop. Use CodeRabbit, PR-Agent, or CodeAnt AI instead.

When you need custom enforcement rules. If your team has specific coding standards that must be enforced consistently, Copilot's probabilistic approach with a 4,000-character instruction limit is insufficient. Tools with deterministic rule engines or unlimited custom instructions provide more reliable enforcement.

When you need CI/CD integration. If code review needs to be a gate in your deployment pipeline, Copilot cannot do this. PR-Agent and CodeAnt AI offer CI/CD integration that blocks merges based on findings.

When you need learning and adaptation. If you want your review tool to get smarter over time based on your team's feedback, CodeRabbit's learnable preferences provide this capability while Copilot does not.

The Bottom Line

GitHub Copilot code review is a competent, convenient feature that provides real value for teams already in the GitHub and Copilot ecosystem. The March 2026 agentic architecture was a genuine improvement that moved it from "barely useful" to "meaningfully helpful." For teams already paying for Copilot, it is an easy addition to the review workflow.

But it is not the best AI code review tool available. Dedicated review tools catch more issues, produce fewer false positives, offer more customization, support more platforms, and provide deeper integration with development workflows. The gap between Copilot's generalist review and CodeRabbit or CodeAnt AI's specialist review is real and significant for teams with serious review requirements.

The pragmatic approach for most teams is to start with Copilot's included review, evaluate whether it catches enough of what matters to your team, and add a dedicated tool if you find the gaps unacceptable. Many organizations end up running both - Copilot for IDE assistance and a dedicated tool for PR review - because the tools solve genuinely different problems at different stages of the development workflow.

Frequently Asked Questions

Does GitHub Copilot do code review?

Yes. GitHub Copilot can review pull requests directly in the GitHub UI. You can request a review from 'Copilot' as a reviewer on any PR, and it will analyze the changes and leave comments on potential issues. This feature is available on Copilot Business and Enterprise plans.

How do I enable Copilot code review?

Enable Copilot code review in your organization settings under Copilot > Policies. Then on any pull request, click 'Reviewers' and select 'Copilot' from the list. Copilot will automatically analyze the PR and post review comments within a few minutes.

Is Copilot code review free?

Copilot code review requires a paid GitHub Copilot plan. Copilot Individual ($10/month) includes limited review features. Copilot Business ($19/user/month) and Enterprise ($39/user/month) include full code review capabilities with organization-wide policies.

How does Copilot code review compare to CodeRabbit?

Copilot offers tighter GitHub integration and is convenient if you already pay for Copilot. CodeRabbit provides more comprehensive reviews, supports custom review instructions, integrates with GitLab and Bitbucket, and offers a free tier. CodeRabbit typically catches more issues per PR but requires a separate tool setup.

What does Copilot code review check for?

Copilot code review checks for bugs, security vulnerabilities, performance issues, code style problems, and logic errors. It analyzes the full PR context including the diff and surrounding code. However, it does not run code or perform dynamic analysis — it's purely LLM-based static review.

Originally published at aicodereview.cc

Multi-Stage Continuous Delivery

2026-04-03 05:59:47

El problema con los pipelines tradicionales

El concepto de Multi-Stage CD es sencillo: llevas código a prod en varias iteraciones y a través de diferentes ambientes — dev, staging, prod — con fases bien definidas: build, prepare, deploy, test, notify, rollback. Suena limpio. Y en papel, lo es.

El problema es la realidad. Según el State of DevOps Report 2020, el 95% del tiempo se va en mantenimiento de pipelines, el 80% en tareas manuales, y el 90% en remediación también manual. Nadie escribe esas métricas en su README, pero todos las vivimos.

Los retos concretos son tres y son los de siempre: la disponibilidad de ambientes (el clásico "no le muevan a dev que estoy probando algo"), satisfacer dependencias externas correctamente — JS, Python, AWS, lo que sea — y los ambientes con candado cuando hay un bug en prod y todo se paraliza. A eso le sumas llegada lenta a producción, más de siete herramientas involucradas en el proceso, y pipelines distintos para web, API y mobile que cada quien personalizó a su manera. El resultado es un Frankenstein difícil de mantener para cualquier persona del equipo.

Lo que realmente se necesita no es magia: capacidad de poner ambientes en cuarentena, dependencias siempre disponibles y seguras, configuración que realmente funcione, y despliegues validados con tests, métricas de performance y SLOs/SLIs bien definidos.

Keptn: un control plane para gobernarlos a todos

La solución que propongo es Keptn — y el título de esta sección es intencional. Keptn es una plataforma open source de orquestamiento que automatiza la configuración y provee en un solo control plane todo lo que normalmente está disperso: monitoreo, despliegue, remediación y resiliencia.

Lo que lo hace diferente es su enfoque declarativo y orientado a GitOps. Defines tus ambientes y estrategias en un archivo shipyard.yaml y Keptn se encarga de la orquestación basada en eventos. No necesitas escribir la lógica de coordinación entre herramientas — eso ya está resuelto.

Desde el punto de vista de plataforma, Keptn entrega progressive delivery, automatización de SRE, auto-remediación y rollback, y una configuración codificable e independiente de herramientas. Pero la parte más importante: mantiene conectividad con las herramientas que ya tienes — JMeter, Argo, Jenkins, Helm, lo que ya está corriendo en tu stack.

Un beneficio que no es obvio a primera vista: los pipelines tradicionales dejan de ser necesarios. Keptn reemplaza esa necesidad con fases dedicadas y orquestamiento event-driven. Tienes estrategias out-of-the-box como Blue/Green y Canary, más observabilidad integrada en el proceso con auditabilidad y trazabilidad completas.

Cómo funciona por dentro

El modelo mental es el siguiente: Keptn expone servicios a los cuales las herramientas se suscriben por medio de integraciones. Los eventos de Keptn se traducen a llamadas API hacia y desde esas herramientas.

En la práctica: Keptn crea un evento y lo distribuye a cualquier servicio que esté escuchando — por ejemplo, sh.keptn.event.hello-world.triggered. El Job Executor Service (JES) detecta el evento, busca la configuración en el YAML correspondiente y ejecuta el contenedor. Una vez que termina, el JES envía de vuelta un par de eventos .started y .finished. Keptn los recibe, sabe que la tarea está completa y avanza en la secuencia. Simple, trazable, predecible.

El ecosistema de integraciones es amplio. Para despliegue: Argo, Jenkins, CircleCI. Para observabilidad: Prometheus, Grafana, Splunk. Para testing: JMeter, Selenium, Artillery. Para notificaciones: Slack, webhooks, Tekton. Para automatización: Ansible, webhooks, AWS Lambda. La idea es clara — Keptn maneja la orquestación, las tareas y la ejecución; nosotros decidimos las herramientas.

Por qué esto importa frente a pipelines tradicionales

La comparación es directa. Los pipelines tradicionales sufren de falta de separación de responsabilidades, código lleno de dependencias y personalizaciones ad hoc, y dificultad para incorporar herramientas específicas sin romper todo. Keptn resuelve esto con fases dedicadas y orquestamiento basado en eventos, interoperabilidad a través de abstracciones bien definidas, y flexibilidad real para cambiar herramientas sin reescribir la lógica de entrega.

Próximos pasos: quality gates y progressive delivery

Una vez que el flujo básico está corriendo, los casos de uso avanzados son los que realmente cambian el juego. Los Quality Gates basados en SLI/SLO permiten que un despliegue sólo avance si cumple criterios medibles — por ejemplo, que el porcentaje de éxito de probes sea mayor al 95%, o que la duración de respuesta sea menor a 200ms. El score total determina si el pipeline pasa o emite una advertencia.

El Progressive Delivery lleva esto un paso más lejos: defines un flujo que va de dev a hardening a production, con estrategias blue/green en los ambientes de mayor criticidad y remediación automatizada en prod. Keptn evalúa quality gates entre cada etapa y sólo promueve si los números lo justifican.

El punto de todo esto no es adoptar una herramienta más por el gusto de hacerlo. Es reconocer que los pipelines monolíticos tienen un techo bajo, y que un modelo orientado a eventos con separación clara de responsabilidades escala mucho mejor — tanto en complejidad técnica como en tamaño de equipo.

Si quieres profundizar, el punto de partida es keptn.sh y los recursos de la comunidad en keptn.sh/resources/slides.