2026-02-18 21:01:44
Imagine your company’s website is scaling globally, but a regional outage just took your product images offline. We all know data is vulnerable to regional outages, accidental deletions, and overwrites. To a customer, the site looks broken. To a business, that's lost revenue.
I recently secured an Azure Storage for a high-demand public website. From enabling 21-day "Time Machine" restorations with Soft Delete to managing document history via Blob Versioning.
In this post, I’m breaking down how I configured the mission-critical storage solution using Azure Blob Storage. I’ll walk you through how I implemented Read-Access Geo-Redundant Storage (RA-GRS) and fail-safe data protection features to ensure that global users never see a "404 Not Found" again.
Here is my roadmap for deploying resilient, high-availability cloud storage.
Create a storage account to support the public website.
Step 1: In the portal, search for and select Storage accounts.
Step 2: Select + Create.
Step 3: For resource group select new. Give your resource group a name and select OK.
Step 4: Set the Storage account name to public website. Make sure the storage account name is unique by adding an identifier.
Step 5: Take the defaults for other settings.
Step 6: Select Review + Create, and then Create.
Step 7: Wait for the storage account to deploy, and then select Go to resource.
This storage requires high availability if there’s a regional outage. Additionally, enable read access to the secondary region.
Step 1: In the storage account, in the Data management section, select the Redundancy blade.
Step 2: Ensure Read-access Geo-redundant storage is selected.
Step 3: Review the primary and secondary location information.
Information on the public website should be accessible without requiring customers to login.
Step 1: In the storage account, in the Settings section, select the Configuration blade.
Step 2: Ensure the Allow blob anonymous access setting is Enabled.
Step 3: Be sure to Save your changes.
The public website has various images and documents. Create a blob storage container for the content.
Step 1: In your storage account, in the Data storage section, select the Containers blade.
Step 2: Select + Container.
Step 3: Ensure the Name of the container is public.
Step 4: Select Create.
Customers should be able to view the images without being authenticated. Configure anonymous read access for the public container blobs.
Step 1: Select your public container.
Step 2: On the Overview blade, select Change access level.
Step 3: Ensure the Public access level is Blob (anonymous read access for blobs only).
Step 4: Select OK.
For testing, upload a file to the public container. The type of file doesn’t matter. A small image or text file is a good choice.
Step 1: Ensure you are viewing your container.
Step 2: Select Upload.
Step 3: Browse to files and select a file. Browse to a file of your choice.
Step 4: Select Upload.
Step 5: Close the upload window, Refresh the page and ensure your file was uploaded.
Determine the URL for your uploaded file. Open a browser and test the URL.
Step 1: Select your uploaded file.
Step 2: On the Overview tab, copy the URL.
Step 3: Paste the URL into a new browser tab.
Step 4: If you have uploaded an image file it will display in the browser. Other file types should be downloaded.
It’s important that the website documents can be restored if they’re deleted. Configure blob soft delete for 21 days.
Step 1: Go to the Overview blade of the storage account.
Step 2: On the Properties page, locate the Blob service section.
Step 3: Select the Blob soft delete setting.
Step 4: Ensure the Enable soft delete for blobs is checked.
Step 5: Change the Keep deleted blobs for (in days) setting to 21.
Step 6: Notice you can also Enable soft delete for containers.
Step 7: Don’t forget to Save your changes.
If something gets deleted, you need to practice using soft delete to restore the files.
Step 1: Navigate to your container where you uploaded a file.
Step 2: Select the file you uploaded and then select Delete.
Step 3: Select Delete to confirm deleting the file.
Step 4: On the container Overview page, toggle the slider Show deleted blobs. This toggle is to the right of the search box.
Step 5: Select your deleted file, and use the ellipses on the far right, to Undelete the file.
Step 6: Refresh the container and confirm the file has been restored.
It’s important to keep track of the different website product document versions.
Step 1: Go to the Overview blade of the storage account.
Step 2: In the Properties section, locate the Blob service section.
Step 3: Select the Versioning setting.
Step 4: Ensure the Enable versioning for blobs checkbox is checked.
Step 5: Notice your options to keep all versions or delete versions after.
Step 6: Don’t forget to Save your changes.
As you have time experiment with restoring previous blob versions.
Step 1: Upload another version of your container file. This overwrites your existing file.
Step 2: Your previous file version is listed on Show deleted blobs page.
Setting up a storage account is easy but architecting a resilient, global data strategy is where the real value lies. Here is why these specific configurations are critical for any enterprise-level application:
The Business Impact: In the event of a regional Azure outage, the "Read-access" ensures the website stays up and images continue to load from a secondary region. This prevents lost sales and maintains brand reputation during a crisis.
The Business Impact: With a 21-day Soft Delete policy, I’ve created a "safety net." This allows for near-instant restoration of mission-critical documents without the massive time and cost overhead of traditional database restores.
The Business Impact: Blob Versioning acts as a "Time Machine." It provides a clear audit trail of every change made to a file. If a new version of a customer success story is published with a legal error, we can roll back to the previous version in seconds, ensuring compliance and accuracy.
The Business Impact: By properly configuring Anonymous Public Access at the container level, we allow for seamless integration with Content Delivery Networks (CDNs). This ensures that a customer in London and a customer in Tokyo both experience lightning-fast load times for product media.
Building a storage solution for a global website is about more than just "cloud storage"—it’s about reliability, scalability, and disaster recovery. By implementing RA-GRS, Soft Delete, and Blob Versioning, I’ve created an environment that doesn't just host files, it protects the brand's reputation and ensures that mission-critical data is always a click away, even in the face of regional outages or accidental deletions.
In the cloud, "good enough" is a risk. Architecting for resilience is the only way to scale.
2026-02-18 21:00:00
Hi there. In this article, I'm going to guide you though how to setup a UPS on Windows 11. I'll be doing this with an APC UPS but the process will be similar for other UPS's, although some may require additional software to be installed. Please check with your UPS manufacturer if any is needed.
If you would prefer to see a video of this article, there is a YouTube video available below:
To set the UPS up with Windows 11, you need to connect up the signal cable to a USB port on your PC and the other end needs to go into the data port on the UPS.
Once it is connected, you'll see the battery indicator appear in the bottom left of the taskbar.
When you click on it, it shows the UPS's battery level in bottom left.
Click on that and it'll go to the Power & Battery settings.
You can see the power mode for the system is balanced by default.
Under battery usage, there is a graph that would show the usage over time. This one isn't populated on mine as my PChas not had the UPS connected before.
Close Settings and then open up control panel. Then, check that the battery is showing as a UPS, by opening Device Manager and then expanding Batteries. If it has "UPS" in the description, it's showing correctly. Close Device Manager.
Next, open up Power Options.
First, click on Create a new power plan. I did try using the Balanced one but it wouldn't hibernate for some reason.
Select high performance and then give it a name. Leave everything as is and click create.
Once it's created, click on Change plan settings and then change advanced power settings.
Scroll to the bottom and expand Battery.
Expand low battery action. Both should be set to do nothing. Leave those as they are.
Low battery notification should be on.
Critical battery action will need to be set to hibernate or shutdown for battery and do nothing for plugged in.
Critical battery notifications should both be on.
Now, low battery level and critical battery level can be left as they are but I would recommend changing these to higher values. The reason is that 5% might not offer enough runtime to hibernate or shutdown the system gracefully. What I would go for is 40-50% for low battery level.
When that point is reached, You'll get a notification to say "the battery is low and you should plug in your PC to mains power". The verbiage could do with some changes as this is a UPS, rather than a laptop battery.
For critical battery level, go for 20 - 30%. When this is reached, the PC will hibernate or shutdown, depending on what was set. In my case, I set it to hibernate.
When that is done, click Ok and you're done. There are nothing further you need to do.
To give you an example of what will happen, I'll adjust the low battery level to 64% and the critical battery level to 60. Don't change this.
I'll open the battery settings up again. You'll see the icon at the side of the percentage remaining indicates it's on mains power.
Now, I'll turn the UPS off at the wall. Most UPS's will produce an audible alarm.
The icon should change to just a battery.
There is the low power warning.
It dropped to 57% and then went into hibernate.
Before you turn your PC back on, make sure that mains power is back on first. When I logged back in, the Notepad document I had open before was still there, indicating the PC hibernated correctly.
I hope that was of use to you. Thanks for reading and have a nice day!
2026-02-18 21:00:00
const and readonly in C#
If you’ve been writing C# for a while, you’ve definitely used both const and readonly. They look similar, they both represent “values that shouldn’t change,” and they both help make your code safer.
But under the hood, they behave very differently — and choosing the wrong one can lead to bugs, versioning issues, and unexpected runtime behavior.
This guide breaks down the differences with definitions, examples, IL behavior, performance notes, and real-world scenarios.
const?
const defines a compile-time constant.
public const int MaxItems = 100;
This value is fixed forever at compile time.
readonly?
readonly defines a runtime constant.
public readonly int MaxItems;
public MyClass()
{
MaxItems = 100;
}
| Feature | const | readonly |
|---|---|---|
| When value is set | Compile-time | Runtime |
| Can be changed later | ❌ No | ❌ No (after construction) |
| Allowed types | Primitives + string | Any type |
| Assigned in constructor | ❌ No | ✔️ Yes |
| Inlined by compiler | ✔️ Yes | ❌ No |
| Requires recompilation of dependent assemblies | ✔️ Yes | ❌ No |
| Use for | Fixed values | Configurable runtime values |
const
Compiler replaces all references with the literal value.
ldc.i4.s 100
readonly
Compiler loads the field at runtime.
ldfld int32 MyClass::MaxItems
This is why changing a const in a shared library can break consumers — they still hold the old literal value.
Use const.
public const double Pi = 3.14159;
These values never change.
Use readonly.
public static readonly string BaseUrl =
Environment.GetEnvironmentVariable("API_URL");
This value depends on environment — cannot be const.
Use readonly.
public readonly int CacheDuration;
public Settings(int duration)
{
CacheDuration = duration;
}
If you expose constants in a shared library:
public const int Timeout = 30;
And later change it to:
public const int Timeout = 60;
All dependent projects must be recompiled or they will still use 30.
Use readonly instead.
const = compile-time constant
readonly = runtime constant
const is inlined; readonly is stored in memory
const only supports primitive types + string
readonly supports any type
const for values that never change
readonly for values that may vary per environment or runtime
2026-02-18 21:00:00
In a previous article, I built a bookmarklet to clip product data from e-commerce pages using Shadow DOM and structured data. It worked — until it didn't.
The bookmarklet could detect products and display them in a floating UI, but it had real limitations: no persistent state between pages, no way to batch products from multiple sites, and no communication channel back to my SaaS app. Every time the user navigated, everything was gone.
I needed something that could live across pages, store data, and talk to my backend. That meant a browser extension.
Here's how I built a cross-browser extension (Chrome + Firefox) using WXT, TypeScript, and React — from architecture to publishing.
If you've ever built a browser extension from scratch, you know the pain: manually wiring up manifest.json, handling Chrome vs Firefox API differences, reloading the extension after every change, figuring out which context runs where.
WXT is a framework that solves all of this. Think of it as Vite for browser extensions:
entrypoints/, WXT wires it into the manifest automatically.browser polyfill. Chrome uses chrome.*, Firefox uses browser.* with Promises. WXT's polyfill unifies them — you write browser.runtime.sendMessage() and it works everywhere.Getting started:
npx wxt@latest init my-extension
Browser extensions aren't a single program. They're three separate execution contexts that communicate via message passing. Understanding this is the key to building anything non-trivial.
┌──────────────┐ ┌────────────────┐ ┌──────────────────┐
│ Popup │ runtime.send │ Background │ scripting. │ Content Script │
│ (React) │ ──────────────► │ (Service Worker)│ executeScript() │ (injected into │
│ │ ◄────────────── │ │ ──────────────► │ any web page) │
│ Cart UI │ response │ Message hub │ │ │
│ Import flow │ │ Storage │ │ Sidebar UI │
│ Auth check │ │ API calls │ │ Page highlights │
└──────────────┘ │ Auth/tokens │ │ Product scan │
└────────────────┘ └──────────────────┘
Background (Service Worker) — The brain. It's the only context that persists (sort of — MV3 service workers can be suspended). It handles storage, API calls, authentication, and orchestrates everything. No DOM access.
Content Script — Injected into web pages. It can read and modify the page's DOM, but it's isolated from the page's JavaScript. It talks to the background via browser.runtime.sendMessage().
Popup — A small standalone UI (ours is React + Tailwind). It opens when the user clicks the extension icon. Same messaging as content scripts. It dies when closed — no persistent state here, everything lives in the background's storage.
WXT uses a file-based convention. Here's the layout:
src/
├── entrypoints/
│ ├── background.ts # Service worker
│ ├── content.ts # Injected into scanned pages
│ ├── app-bridge.content.ts # Lightweight bridge for our app
│ └── popup/
│ ├── index.html
│ ├── main.tsx
│ └── App.tsx
├── components/popup/ # React components for the popup
├── lib/
│ ├── api/client.ts # API wrapper
│ ├── detection/ # Product detection engine
│ ├── highlights/manager.ts # Visual feedback on page
│ ├── storage/ # Cart + session persistence
│ └── messaging/types.ts # Typed message definitions
└── hooks/useCart.ts
Every file in entrypoints/ becomes an extension entrypoint. WXT reads the export default to configure it — no manual manifest editing.
// wxt.config.ts
export default defineConfig({
modules: ['@wxt-dev/module-react'],
manifest: {
name: 'My Extension',
permissions: ['activeTab', 'storage', 'scripting'],
host_permissions: [
'https://app.my-saas.com/*',
],
},
});
Three permissions, all justified:
activeTab — access the current tab when the user clicks "Scan"storage — persist cart items and session across pagesscripting — inject the content script dynamically (not on every page, only on demand)No <all_urls>, no cookies, no broad host permissions. Chrome Web Store reviewers care about this, and so should you.
Message passing between contexts is stringly-typed by default. A typo in your action name and you get silent failures. We fix this with a union type:
// lib/messaging/types.ts
export type ExtensionMessage =
| { type: 'ADD_TO_CART'; products: CartProduct[] }
| { type: 'REMOVE_FROM_CART'; urls: string[] }
| { type: 'GET_CART' }
| { type: 'CLEAR_CART' }
| { type: 'CLEAR_SITE'; siteOrigin: string }
| { type: 'IMPORT_CART'; siteUrl?: string; siteId?: number }
| { type: 'SCAN_PAGE' }
| { type: 'CHECK_AUTH' }
| { type: 'SAVE_SESSION'; session: AuthSession }
| { type: 'GET_SITES' }
| { type: 'LOGOUT' };
Every message has a discriminated type field. TypeScript narrows the payload automatically:
// entrypoints/background.ts
browser.runtime.onMessage.addListener((message: ExtensionMessage, sender, sendResponse) => {
switch (message.type) {
case 'ADD_TO_CART':
// message.products is typed as CartProduct[]
handleAddToCart(message.products).then(sendResponse);
return true; // keep channel open for async response
case 'IMPORT_CART':
// message.siteUrl is typed as string | undefined
handleImport(message.siteUrl, message.siteId).then(sendResponse);
return true;
case 'GET_CART':
getCartItems().then(sendResponse);
return true;
}
});
The return true is a gotcha that will cost you hours if you don't know about it. Ask me how I know. By default, the message channel closes synchronously. Returning true tells the browser "I'll respond asynchronously" — without it, your sendResponse calls silently fail.
Instead of injecting our content script on every web page (which would require <all_urls> permission and slow down every page load), we inject it on demand when the user clicks "Scan":
// entrypoints/background.ts
case 'SCAN_PAGE':
const [tab] = await browser.tabs.query({ active: true, currentWindow: true });
if (tab?.id) {
await browser.scripting.executeScript({
target: { tabId: tab.id },
files: ['/content-scripts/content.js'],
});
}
sendResponse({ success: true });
return true;
The content script protects against double-injection with a DOM marker:
// entrypoints/content.ts
export default defineContentScript({
matches: ['<all_urls>'], // required by WXT, but never auto-injected — we only use executeScript()
main() {
if (document.querySelector('[data-my-extension-host]')) return;
// ... initialize sidebar, run detection, set up highlights
},
});
The matches: ['<all_urls>'] might look scary, but it's only there because WXT's type system requires it. Since we exclusively inject via scripting.executeScript(), the content script never runs automatically. This gives us the best of both worlds: the extension only runs on pages the user explicitly scans, but works on any site without listing specific domains.
When the content script runs, it does three things:
The sidebar is built with vanilla DOM manipulation inside a Shadow DOM — no framework overhead in the content script. The highlight system uses CSS injected via a <style> tag with attribute selectors:
// lib/highlights/manager.ts
element.setAttribute('data-ext', 'selected'); // green border + checkmark
element.setAttribute('data-ext', 'deselected'); // dashed outline
element.setAttribute('data-ext', 'hover'); // neon glow + "SIGNAL DETECTED" badge
Each state has distinct visual feedback — selected products get a green checkmark overlay, hovered products get a glowing badge. It makes the scanning experience feel alive.
Modern e-commerce sites don't do full page reloads. Products load via infinite scroll or client-side navigation. A static scan would miss everything loaded after the initial page.
We use a MutationObserver with debouncing:
const observer = new MutationObserver(() => {
clearTimeout(rescanTimer);
rescanTimer = setTimeout(() => rescanAndMerge(items), 800);
});
observer.observe(document.body, { childList: true, subtree: true });
The 800ms debounce is important — without it, a single infinite scroll event could trigger dozens of rescans as DOM nodes are added one by one.
The merge logic is path-aware: if the URL pathname changed (SPA navigation), new products are prepended to the list. If it's the same path (infinite scroll), they're appended. This keeps the sidebar order intuitive.
The popup authenticates against our Rails backend using the user's existing session:
// App.tsx — on popup open
const response = await fetch(`${HOST}/api/extension/session`, {
credentials: 'include', // sends Devise session cookies
});
const { token, user } = await response.json();
The credentials: 'include' is what makes this work — the popup runs on the extension's origin, but host_permissions for our domain allows it to send cookies cross-origin. The backend validates the Devise session and returns a short-lived Bearer token.
That token is stored in the background and used for all subsequent API calls. When it expires (401), we automatically refresh:
// lib/api/client.ts
async function importProducts(urls: string[], token: string): Promise<ImportResult> {
const response = await fetch(`${HOST}/api/import`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ urls }),
});
if (response.status === 401) {
throw new AuthExpiredError();
}
return response.json();
}
The background catches AuthExpiredError, refreshes the token via the cookie-based endpoint, and retries once. If that fails too, the user sees a "Session expired" screen.
The popup is a tiny React app with a focused UX: view your cart, pick a target project, import.
Products are grouped by site origin — you might scan three different competitor sites in a session, and each group is collapsible with its own "clear" action.
The import button shows real-time progress and results:
// components/popup/ImportButton.tsx
const handleImport = async () => {
setStatus('importing');
setProgress(15);
const result = await browser.runtime.sendMessage({
type: 'IMPORT_CART',
siteId: selectedSite?.id,
});
setProgress(100);
setResult(result); // { created: 5, skipped: 2, errors: 0 }
};
A lightweight content script runs on our SaaS domain to let the web app know the extension is installed:
// entrypoints/app-bridge.content.ts
export default defineContentScript({
matches: ['https://app.my-saas.com/*'],
main() {
window.addEventListener('message', (event) => {
if (event.data?.type === 'CHECK_EXTENSION') {
window.postMessage({
type: 'EXTENSION_INSTALLED',
version: browser.runtime.getManifest().version,
}, '*');
}
});
},
});
On the web app side:
window.postMessage({ type: 'CHECK_EXTENSION' }, '*');
window.addEventListener('message', (e) => {
if (e.data.type === 'EXTENSION_INSTALLED') {
showExtensionFeatures(e.data.version);
}
});
This is the simplest form of extension detection — no external messaging API, no externally_connectable manifest key. The content script acts as a bridge between two isolated worlds.
WXT makes cross-browser builds a one-liner:
wxt build # Chrome (default)
wxt build --browser firefox # Firefox
Firefox AMO: zip the output, upload, done. Firefox reviewed and approved ours in under an hour.
Chrome Web Store: same zip, but after you've justified every single permission, uploaded screenshots, provided a detailed description of why your extension exists, submitted your blood test results, and filed your tax return — they'll graciously review your extension within 24 hours.
The only Firefox-specific configuration is the browser_specific_settings.gecko block in the manifest (the extension ID and minimum Firefox version). Everything else is shared.
Start with the messaging architecture. Define your message types first. Everything flows from there — what the background handles, what the popup shows, what the content script sends. Type them from day one.
Dynamic injection > static injection. Don't inject content scripts on every page unless you genuinely need to. scripting.executeScript() with activeTab is cleaner, faster, and requires fewer permissions.
Shadow DOM is still your friend. The sidebar technique from the bookmarklet carried over perfectly. Shadow DOM isolation means your extension's UI won't break on sites with aggressive global CSS.
MutationObserver needs debouncing. Without it, infinite scroll pages will obliterate your performance. 800ms worked well for us, but tune it to your use case.
WXT is the right call. I can't imagine going back to raw manifest wiring. The hot reload alone saves hours per week during development.
Now go build something, break things, and enjoy it.
This is the second article in a series about building web tools for e-commerce product tracking. The first one covers the bookmarklet that started it all.
2026-02-18 20:54:20
When I first started my internship, unit testing felt like a chore. In my head, I just wanted to build features and see them live on the new brand site I was developing. I’d write basic Assert.AreEqual statements just to tick a box, but they felt robotic and, honestly, a bit clunky to work with.
Then I discovered FluentAssertions.
The "Ah-ha" moment for me was realizing that my tests didn't have to look like math equations. They could look like English sentences. Instead of struggling through a wall of messy code to figure out why a test failed, I could read a line like response.StatusCode.Should().Be(HttpStatusCode.OK) and immediately know the intent.
As a Computer Science student, you hear a lot about "Clean Code," but you don't really feel its importance until you're six months into an internship with a deadline looming. I realized that keeping my tests organized wasn't just about being "easier on the eye", it was about saving time.
When your test suite is organized, you don't fear it. You actually start to rely on it. During my work on the subsidiary brand's site, I caught several bugs in my C# logic that I would have completely missed if I were just manually testing the UI. There’s a specific kind of relief when you see a red light in your test runner and realize, "Wait, I didn't handle the empty string case," before that bug ever hits a real user's phone.
I believe that code should be a story that explains what the software is supposed to do. Using FluentAssertions allowed me to write tests that even a non-technical manager could almost understand.
For example, when I was testing the login logic, my assertions looked like this:
response.Data.Should().NotBeNull();
response.Data.Token.Should().NotBeNullOrEmpty();
This isn't just "organized" code; it’s self-documenting code. If I leave the company and another intern takes over, they don't have to guess what I was trying to check. The test tells them exactly what the expectation was.
If you’re a developer working with .NET, don't settle for the bare minimum in your tests. Taking that extra step to use tools like FluentAssertions makes your workflow smoother and your codebase much more professional. It’s the difference between just "coding" and actually practicing software engineering.
2026-02-18 20:52:17
Your AI agent can write code, deploy it, and even test it. But who decides if the output is actually good?
I ran into this problem while building Spell Cascade — a Vampire Survivors-like action game built entirely with AI. I'm not an engineer. I use Claude Code (Anthropic's AI coding assistant) and Godot 4.3 to ship real software, and the whole point is that the AI handles development autonomously while I sleep.
The problem? My AI agent would make a change, run the tests, see green checkmarks, commit, and move on. The tests passed. The code compiled. The game launched.
And the game was unplayable.
Zero damage taken in 60 seconds. Level-ups every 3.9 seconds (the "fun" range for Vampire Survivors-style games is 10-30 seconds). A difficulty rating the automated evaluator scored as "TOO_EASY."
All tests passing. All quality gone.
That's when I realized: tests verify correctness. Quality Gates verify value.
Here's a concrete example of the difference:
| Check | What It Asks | Type |
|---|---|---|
| Unit test | "Does the fire spell deal the right damage?" | Correctness |
| Integration test | "Does the spell hit enemies and trigger XP drops?" | Correctness |
| Quality Gate | "Is the game actually fun to play for 60 seconds?" | Value |
The first two are binary. Pass or fail. The third one is a judgment call — and that's exactly why most CI/CD pipelines don't have one.
When a human developer ships code, there's an implicit quality gate running in their head. They play the game. They feel the pacing. They notice when something is off. When an AI agent ships code at 3 AM while you're asleep, that implicit gate doesn't exist.
You need to make it explicit.
Before I explain the Quality Gate, here's the pipeline it lives in.
Spell Cascade is a top-down action game where players survive waves of enemies while collecting spells and upgrades. Think Vampire Survivors, but built by someone who can't write code.
The autonomous testing pipeline:
The bot isn't smart. It mashes buttons and picks random upgrades. That's the point. If a random bot can't have a reasonable experience in 60 seconds, a real player won't either.
The whole thing runs with one command:
quality-gate.sh
And it exits with code 0 (ship it) or code 1 (don't ship it).
I didn't start with 3 tiers. I started with 20 candidate checks, narrowed to 6, then grouped them into 3 tiers. The grouping matters because not all failures are equal.
Question: "Did the game even work?"
This tier is non-negotiable. If any check fails, the verdict is NO-GO immediately. No point evaluating balance if the game didn't boot.
| Check | Threshold | Why |
|---|---|---|
| Game pass | pass == true |
AutoTest completed without fatal errors |
| Spells fired | total_fires >= 1 |
Core combat loop is functioning |
| Level-ups | level_ups >= 1 |
Progression system is working |
If total_fires is 0, it means the player couldn't use abilities. That's not a balance issue — that's a broken game. Tier 1 catches this and stops the pipeline cold.
Question: "Is the game worth playing?"
This is where it gets interesting. Tier 2 has four sub-checks, and the build needs to pass 3 out of 4 to get a GO. Passing 2 out of 4 gives a CONDITIONAL — the AI can commit but should flag the issue.
One exception: if the Difficulty Ceiling check fails (player died), it's an automatic NO-GO regardless of the other three. A player dying in the first 60 seconds of a Vampire Survivors-like is a hard dealbreaker.
"Is the game too easy?"
min_damage_taken: 1
If the player takes zero damage in 60 seconds, the enemies might as well not exist. This was exactly the problem with my early builds — the quality evaluator flagged "TOO_EASY" but nothing stopped the AI from committing.
"Is the game too hard?"
min_lowest_hp_pct: 0.10
must_survive_60s: true
The player's HP should never drop below 10% in the first minute. If it does, new players will quit. If the player actually dies (HP = 0%), the build is NO-GO no matter what else looks good.
"Does progression feel right?"
min_avg_interval: 8.0s
max_avg_interval: 35.0s
min_gap_between_levelups: 2.0s
This one caught my biggest "tests pass, game sucks" moment. Average level-up interval was 3.9 seconds. That means the player was getting an upgrade menu every 4 seconds — constant interruption, no flow state possible. The pacing check enforces a band: not too frequent (menu fatigue), not too rare (boredom).
The burst check (min_gap_between_levelups: 2.0s) catches a subtler issue: even if the average is fine, two level-ups within 2 seconds of each other feels broken.
"Are there enough enemies on screen?"
min_peak_enemies: 5
min_avg_enemies: 3
A Vampire Survivors-like with 2 enemies on screen is a walking simulator. The density check ensures the screen feels alive. These thresholds are intentionally low — early game should ramp up gradually, not overwhelm from second one.
Why 3 out of 4 instead of 4 out of 4?
Because game balance is messy. A run where the bot happens to dodge everything (damage = 0) but has great pacing, density, and ceiling is probably fine. Demanding perfection would create false negatives and slow down the autonomous loop.
But 2 out of 4 is a yellow flag. Something is meaningfully off.
Question: "Is this build worse than the last known good one?"
Every time the gate says GO, it saves the current results.json as the new baseline. The next run compares against it.
warn_threshold_pct: 25
nogo_threshold_pct: 50
If peak enemy count drops by more than 25% compared to baseline, the gate warns. More than 50%? NO-GO.
This catches the sneaky regressions. Your AI agent "fixes" a bug in the spawn system. Tests pass. But peak enemies dropped from 33 to 7. Without Tier 3, that ships.
Here's what the gate produced across 26 unique runs (deduplicated from the raw log — some runs were replayed against cached results for testing):
| Verdict | Count | Percentage |
|---|---|---|
| GO | 18 | 69% |
| CONDITIONAL | 4 | 15% |
| NO-GO | 4 | 15% |
The 4 NO-GO verdicts weren't false alarms:
total_fires=0, level_ups=0, peak_enemies=0. These were broken builds that would have shipped as "tests pass" in a naive pipeline.lowest_hp_pct=0 — the player died. damage_taken=39 in 60 seconds. The AI had overcorrected from "too easy" to "impossibly hard."tier2: 4/4
damage_taken: 16
lowest_hp_pct: 0.66 (player took real damage but survived comfortably)
avg_levelup_interval: 16.8s (right in the sweet spot)
peak_enemies: 33
verdict: GO, reasons: (none)
This was a build where the AI had iterated through several balance passes. The gate validated what "good" looks like numerically.
tier2: 3/4
damage_taken: 0
avg_levelup_interval: 18.3s
peak_enemies: 21
reasons: difficulty_floor_warn
Damage was 0 — too easy — but pacing and density were solid. The gate let it through as 3/4, which is the right call. A run where the bot happens to dodge everything isn't necessarily a broken build. But the difficulty_floor_warn gets logged, and if it shows up in 3 consecutive runs, that's a pattern the AI should address.
All 4 CONDITIONAL verdicts had the same pattern: difficulty_floor_warn + pacing_warn. The game was too easy and level-ups were too fast (2/4 tier2 checks). These builds work but need improvement — exactly the signal CONDITIONAL is designed to send.
This 3-tier architecture isn't game-specific. The core insight works anywhere an AI agent produces output that needs to be "good enough to ship."
| Tier | Checks |
|---|---|
| Stability | Spell check passes, no broken links, all images load |
| Balance | Reading level in target range, section length variance < 2x, CTA present |
| Regression | Word count not >30% shorter than previous, readability score stable |
| Tier | Checks |
|---|---|
| Stability | Compiles, all tests pass, no new lint errors |
| Balance | Cyclomatic complexity < threshold, test coverage > floor, no files > 500 lines |
| Regression | Performance benchmarks within 25% of baseline, bundle size stable |
| Tier | Checks |
|---|---|
| Stability | Schema validates, no null primary keys, row count > 0 |
| Balance | Column distributions within expected ranges, no single-value columns in output |
| Regression | Row count within 25% of previous run, new nulls < 5% |
The pattern is always the same:
The entire quality gate is a ~220-line bash script with one dependency: jq. No frameworks. No SaaS. No SDK.
All the magic numbers live in a single JSON file. Tune them without touching code:
{
"tier1_stability": {
"max_exit_code": 0,
"max_script_errors": 0,
"min_total_fires": 1,
"min_level_ups": 1
},
"tier2_balance": {
"difficulty_floor": { "min_damage_taken": 1 },
"difficulty_ceiling": {
"min_lowest_hp_pct": 0.10,
"must_survive_60s": true
},
"pacing": {
"min_avg_interval": 8.0,
"max_avg_interval": 35.0,
"min_gap_between_levelups": 2.0
},
"density": {
"min_peak_enemies": 5,
"min_avg_enemies": 3
},
"pass_threshold": 3,
"nogo_on_ceiling_fail": true
},
"tier3_regression": {
"warn_threshold_pct": 25,
"nogo_threshold_pct": 50
}
}
The gate script follows a dead-simple flow:
#!/usr/bin/env bash
# Exit 0 = GO or CONDITIONAL, Exit 1 = NO-GO
VERDICT="GO"
# TIER 1: STABILITY (any fail = NO-GO)
if [[ "$PASS_VAL" != "true" ]]; then VERDICT="NO-GO"; fi
if [[ "$TOTAL_FIRES" -lt "$MIN_FIRES" ]]; then VERDICT="NO-GO"; fi
if [[ "$LEVEL_UPS" -lt "$MIN_LU" ]]; then VERDICT="NO-GO"; fi
# TIER 2: BALANCE BAND (3/4 sub-checks to pass)
# ... run 4 sub-checks, count passes ...
if [[ "$TIER2_PASSES" -ge 3 ]]; then
echo "TIER2: PASS"
elif [[ "$CEILING_PASS" == false ]]; then
VERDICT="NO-GO" # dying is fatal
else
VERDICT="CONDITIONAL"
fi
# TIER 3: REGRESSION (compare vs saved baseline)
if [[ -f "$LATEST_BASELINE" ]]; then
DELTA_PCT=$(compare_metric "$PEAK_ENEMIES" "$BL_PEAK")
if [[ "$DELTA_PCT" -gt 50 ]]; then VERDICT="NO-GO"; fi
if [[ "$DELTA_PCT" -gt 25 ]]; then VERDICT="CONDITIONAL"; fi
fi
# Save baseline on GO
if [[ "$VERDICT" == "GO" ]]; then
cp "$RESULTS_PATH" "$BASELINE_DIR/latest.json"
fi
# Log everything to JSONL for trend analysis
echo "$LOG_ENTRY" >> gate-log.jsonl
# Exit code drives the pipeline
[[ "$VERDICT" == "NO-GO" ]] && exit 1 || exit 0
Everything gets appended to a gate-log.jsonl file — one JSON object per run. This gives you trend analysis for free. When peak_enemies shows a slow downward trend across 10 runs, you catch it before it becomes a regression.
# Full pipeline: run game + evaluate
./quality-gate.sh
# Skip the game run, evaluate existing results
./quality-gate.sh --skip-run --results /path/to/results.json
# Use a specific baseline
./quality-gate.sh --baseline /path/to/baselines/
The full source is on GitHub: github.com/yurukusa/spell-cascade
I'd be dishonest if I didn't mention the gaps.
The gate can't evaluate "feel." A game can pass all 4 tier2 checks and still feel lifeless — bad animations, no screen shake, boring sound effects. I've started building a separate "Feel Scorecard" that measures action density (events/second), dead time (longest gap with no events), and reward frequency, but it's early.
The gate is only as good as the bot. The AutoTest bot moves randomly and picks upgrades randomly. It can't test "is the dodge mechanic satisfying?" or "does the boss fight have good telegraphing?" Those require human playtesting.
Baseline drift is a real problem. If the AI makes a series of small-but-negative changes (each under the 25% warn threshold), the baseline slowly degrades. The JSONL log helps here — you can chart trends — but the gate doesn't do it automatically yet.
One of my "best" runs had a data anomaly. Peak enemies hit 153 in a single run due to a spawn system bug. That became the baseline, which then made every subsequent normal run look like a massive regression. I had to manually reset the baseline. The system needs an outlier filter.
After implementing the Quality Gate, I asked myself: did it actually help?
Yes, with caveats.
It caught 4 builds that would have shipped broken. Two of those were stability failures the AI didn't notice (the game booted but core systems weren't initializing). One was the "overcorrected to impossible difficulty" build. One was a legit regression.
It also correctly let through builds that a stricter gate would have rejected. The 0-damage runs with good pacing were fine — the bot just happened to dodge everything. A 4/4 requirement would have created noise.
But the gate said GO on builds that a human player would flag in 30 seconds. Stiff animations. Boring enemy patterns. No visual feedback on hits. The gap between "numerically balanced" and "fun" is still a human judgment.
That's the next frontier: encoding "feel" into automated metrics. But even without that, having a GO/NO-GO gate between the AI and the commit history has already prevented the worst outcomes.
Tests are necessary but not sufficient. Passing tests means your code is correct. It doesn't mean your output is good.
The 3-tier pattern works everywhere. Stability (did it work?), Balance (is it good enough?), Regression (is it worse?). Apply it to content, code, data, or anything an AI agent produces.
Use majority voting for quality bands. Demanding 4/4 perfect creates false negatives. 3/4 with a hard veto on critical failures is the right balance for autonomous systems.
Log everything to JSONL. Individual gate verdicts are useful. The trend across 26 runs is where the real insights are.
Externalize thresholds. Put them in a JSON file, not in code. You'll tune them constantly, and your AI agent can modify them without touching the gate logic.
Be honest about the gaps. A quality gate doesn't replace human judgment. It catches the bottom 15% — the builds that should never ship — and that alone is worth the ~220 lines of bash.
This concept was born from building an autonomous game testing pipeline. I wrote a deeper dive into the Feel Scorecard — the metrics behind "does this game feel good?" — on Zenn (Japanese).
Curious what happens when the AI says GO but a human finds 3 bugs in 5 minutes? I wrote about that honest reckoning on Hatena Blog (Japanese).
Spell Cascade is playable now: yurukusa.itch.io/spell-cascade
Built by a non-engineer using Claude Code (Anthropic's AI coding assistant) + Godot 4.3. The quality gate, the game, the AutoTest bot — all of it written by AI, reviewed by a human who can't read most of the code.
"I want AI to work while I sleep" → CC-Codex Ops Kit. The guards from this article enabled 88-task overnight runs.