2026-04-06 16:10:49
MCP agents act on your behalf but can't prove what they did. Logs are self-reported claims. Receipts are independently verifiable evidence. Here's how to close the transparency gap with cryptographic proof -- in under 10 lines of code.
You ask your AI agent to cancel a subscription, send an email to a client, or update a database record. The agent says "Done." You move on.
But what actually happened? Which API endpoint was called? What payload was sent? What did the service respond? You don't know -- and neither does anyone else. The agent acted on your behalf, and the only record of that action is the agent's own word.
This is the transparency problem in MCP. Every tool call is a black box: an input goes in, a result comes out, and the specifics of what happened between the two are discarded the moment the call completes.
That might be acceptable for a search query. It is not acceptable when the agent is sending emails, processing payments, modifying records, or making API calls that have real-world consequences.
Transparency in the context of MCP tool calls is not about seeing source code or inspecting model weights. It is about a concrete, answerable question:
Can anyone -- the user, the operator, a regulator, the other party -- independently verify what the agent did?
Today, the answer is no. Here is why.
A standard MCP server handles a tool call like this:
@server.call_tool()
async def handle_tool(name: str, arguments: dict):
if name == "cancel_subscription":
resp = await httpx.post(
"https://api.stripe.com/v1/subscriptions/sub_1234",
data={"cancel_at_period_end": "true"},
headers={"Authorization": f"Bearer {STRIPE_KEY}"},
)
return {"status": "cancelled", "effective": "end_of_period"}
The user sees {"status": "cancelled"}. That is the tool's self-report. The HTTP response from Stripe -- the actual evidence -- was consumed and discarded inside the server process.
Three problems with this:
Every downstream consumer of this tool call's result -- the user, the orchestrator, the compliance system -- is operating on trust. Not verified trust. Assumed trust.
The immediate instinct is to add logging:
import logging
logger = logging.getLogger("mcp-tools")
@server.call_tool()
async def handle_tool(name: str, arguments: dict):
if name == "cancel_subscription":
resp = await httpx.post(stripe_url, data=payload, headers=headers)
logger.info(f"cancel_subscription called at {datetime.utcnow()}, "
f"stripe responded {resp.status_code}")
return {"status": "cancelled"}
This is better than nothing. But the log has a fundamental problem: it was written by the same entity that performed the action. This is the equivalent of a company auditing itself.
In any system where accountability matters -- finance, healthcare, legal, multi-party operations -- self-reported records are not evidence. They are claims. The distinction is not academic. It is the difference between "we say we did it" and "here is proof we did it, verifiable by anyone."
To make a tool call transparent, you need a witness that is independent of both the agent and the upstream service. The pattern looks like this:
Agent → Verification Proxy → Upstream API
↓
Cryptographic Receipt
(signed, timestamped, logged)
The proxy forwards the request to the upstream API unchanged. But it captures the exact request and response bytes, then produces a receipt with three independent attestations:
No single party -- not the agent, not the proxy, not the upstream API -- can forge this combination.
Here is the same subscription cancellation, routed through a certifying proxy:
TRUST_PROXY = "https://trust.arkforge.tech/v1/proxy"
ARKFORGE_KEY = "your_api_key"
@server.call_tool()
async def handle_tool(name: str, arguments: dict):
if name == "cancel_subscription":
resp = await httpx.post(
TRUST_PROXY,
headers={"X-Api-Key": ARKFORGE_KEY},
json={
"target": "https://api.stripe.com/v1/subscriptions/sub_1234",
"method": "POST",
"payload": {"cancel_at_period_end": "true"},
"extra_headers": {"Authorization": f"Bearer {STRIPE_KEY}"},
},
)
data = resp.json()
return {
"status": "cancelled",
"effective": "end_of_period",
"_proof_id": data["proof"]["proof_id"],
}
The upstream API still receives the identical request. Stripe still processes the cancellation exactly the same way. The only difference: a neutral third party now holds a signed, timestamped, publicly logged record of exactly what was sent and what came back.
The _proof_id returned to the user is a handle they can use to verify the action independently -- without trusting the agent, the server, or the proxy.
The proxy returns a proof object alongside the original API response:
{
"proof_id": "prf_20260406_140312_b7d2e4",
"spec_version": "1.2",
"timestamp": "2026-04-06T14:03:12Z",
"hashes": {
"request": "sha256:a4f1...3c8b",
"response": "sha256:d920...7e1a",
"chain": "sha256:6b3e...91f0"
},
"parties": {
"buyer_fingerprint": "sha256:your_api_key_hash",
"seller": "api.stripe.com"
},
"arkforge_signature": "ed25519:KjG8...rQ==",
"arkforge_pubkey": "ed25519:ZLlG...fEY",
"timestamp_authority": {
"status": "verified",
"provider": "freetsa.org"
},
"transparency_log": {
"provider": "sigstore-rekor",
"status": "success",
"entry_uuid": "24296fb5...",
"verify_url": "https://search.sigstore.dev/?logIndex=1217489868"
},
"verification_url": "https://trust.arkforge.tech/v1/proof/prf_20260406_140312_b7d2e4"
}
The chain hash binds the request hash, response hash, timestamp, and party identifiers into a single value using canonical JSON serialization. Changing any field invalidates the chain. The chain hash is what gets signed, timestamped, and logged.
Verification requires math, not trust. Here is how any party -- the user, an auditor, the other side of the transaction -- can verify a receipt independently:
import hashlib, json, httpx
from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PublicKey
from base64 import urlsafe_b64decode
# 1. Fetch the proof by ID
proof = httpx.get(
"https://trust.arkforge.tech/v1/proof/prf_20260406_140312_b7d2e4"
).json()
# 2. Recompute the chain hash
chain_input = {
"request_hash": proof["hashes"]["request"],
"response_hash": proof["hashes"]["response"],
"transaction_id": proof["proof_id"],
"timestamp": proof["timestamp"],
"buyer_fingerprint": proof["parties"]["buyer_fingerprint"],
"seller": proof["parties"]["seller"],
}
canonical = json.dumps(chain_input, sort_keys=True, separators=(",", ":"))
expected = "sha256:" + hashlib.sha256(canonical.encode()).hexdigest()
assert expected == proof["hashes"]["chain"], "Chain hash mismatch"
# 3. Verify the Ed25519 signature
pubkey_bytes = urlsafe_b64decode(proof["arkforge_pubkey"].split(":")[1] + "=")
pubkey = Ed25519PublicKey.from_public_bytes(pubkey_bytes)
sig_bytes = urlsafe_b64decode(proof["arkforge_signature"].split(":")[1] + "=")
pubkey.verify(sig_bytes, proof["hashes"]["chain"].split(":")[1].encode())
# 4. Confirm the Rekor entry exists (public transparency log)
rekor_uuid = proof["transparency_log"]["entry_uuid"]
rekor_resp = httpx.get(
f"https://rekor.sigstore.dev/api/v1/log/entries/{rekor_uuid}"
).json()
log_index = list(rekor_resp.values())[0]["logIndex"]
print(f"Verified. Rekor log index: {log_index}")
If step 2 passes, the chain hash matches its declared inputs -- nothing was tampered with. If step 3 passes, the proxy signed that exact chain hash with a key the agent never held. If step 4 passes, the hash was committed to a public log before anyone knew it would be checked.
This is what transparency means in practice: not a promise, but a proof that any party can verify without asking permission.
An agent sends an invoice reminder email via SendGrid. The customer claims they never received it. Without a receipt, you have the agent's self-report against the customer's claim. With a receipt, you have cryptographic proof of the exact payload sent to SendGrid and SendGrid's exact response -- timestamped and signed by an independent authority.
Agent A fetches pricing data from an API. Agent B uses that data to generate a quote. The quote is wrong. Was the pricing data stale? Did Agent A fetch the wrong endpoint? Did Agent B misinterpret the response? Without receipts at each handoff, debugging is guesswork. With receipts, each agent's inputs and outputs are independently verifiable -- the chain of evidence is complete.
An auditor asks: "Prove that your AI agent's actions on March 15th complied with your stated policy." Without receipts, you hand over server logs that you wrote and control. With receipts, you hand over a set of proof IDs that the auditor can verify against a public transparency log -- without needing access to your systems.
The free tier covers 500 receipts per month. No credit card required. Each receipt adds roughly 200ms of latency (proxy round-trip plus timestamp authority verification). For most MCP tool calls -- API integrations, emails, webhooks, database operations -- that overhead is negligible compared to the upstream call itself.
For production workloads: plans start at EUR 29/month for 5,000 receipts.
Not every tool call needs a receipt. A search_web call probably doesn't. But any tool call where the result could be disputed, audited, or questioned by another party is a candidate.
The decision heuristic: if the answer to "prove it" matters, add a receipt.
Payments. Emails. Data mutations. Cross-organization API calls. Regulatory submissions. Anything where "the agent said it did it" is not sufficient evidence.
MCP gives agents a clean, standardized way to invoke tools. That is a significant step forward. But the protocol says nothing about proving what happened during a tool call. It captures inputs and outputs at the protocol level but discards the evidence of what occurred between the tool server and the upstream API.
This is not a bug in MCP. It is a gap that the protocol was not designed to fill. Transparency is infrastructure -- it needs to be added deliberately, the same way TLS was added to HTTP or signatures were added to package managers.
Cryptographic receipts are the mechanism. A certifying proxy is the deployment pattern. And the cost of adding them -- three lines of code, sub-second latency -- is negligible compared to the cost of operating agents that cannot prove what they did.
The ArkForge Trust Layer is an open-architecture certifying proxy for MCP and API calls. The proof specification is public. The verification algorithm requires no proprietary software. Start free -- 500 proofs/month, no card required.
2026-04-06 16:04:51
My brother Brandon and I run RapidClaw. Most days it's just the two of us, a handful of customers, and a few agents chugging along in production. A few months ago we started putting small open-weight models on the same box as the agent runtime — mostly Gemma 4, a bit of Phi-4 for comparison, some Qwen. This is a short write-up of what's actually worked and what hasn't.
Nothing revolutionary here. I'm writing it because I searched for "agent + local Gemma" a bunch of times last quarter and mostly found benchmark posts, not lived-experience notes.
The newest small models are small enough that they fit on the same machine as the agent loop. That's the whole observation. Gemma 4 4B runs fine on a 24 GB GPU next to a Node process running our agent code. Phi-4 14B is tight but works. A year ago you needed a separate inference box, which meant a network hop, which meant we just paid a hosted API and moved on.
Now the tradeoff is different. You can keep the hosted model for the hard stuff and quietly route the cheap, high-volume calls to the local model. Hybrid, not replacement.
We have four agents running in production right now. One of them — the one that classifies incoming support messages and decides which of the other agents to hand off to — used to make a hosted-model call per message. That single agent was roughly 80% of our inference spend because it ran on every message, even the obvious ones.
We moved that classifier to Gemma 4 4B on the same box. The agent framework is unchanged, it just points at a local OpenAI-compatible endpoint (we're using Ollama for now, llama.cpp's server also works). The other three agents still call the hosted models when they need to reason about something real.
That's it. One local model, four agents, one box. No Kubernetes, no model router, no fancy fallback chain.
Single machine, RTX 4090, one of our production workers. Measured over a week in March on real traffic, not a synthetic benchmark.
| Path | Median latency | p95 | Cost per 1k calls |
|---|---|---|---|
| Hosted Sonnet-class | 1.8s | 4.2s | ~$4.50 |
| Hosted mini/flash-class | 0.9s | 2.1s | ~$0.60 |
| Gemma 3 4B, local, same box | 0.25s | 0.6s | ~$0.04* |
*Local cost is amortized GPU + power on a box we were already paying for. If you had to rent a GPU just for this, the numbers flip hard — more on that below.
For the classifier workload specifically, Gemma 4 is good enough. It's not as sharp as the big hosted models, but "is this message a billing question or a bug report" doesn't need the big hosted models. We compared a week of its outputs against the hosted model's outputs on the same messages — they agreed on about 94% of them. The 6% where they disagreed were mostly ambiguous messages where the hosted model wasn't obviously right either.
Cold starts are real. First request after the model unloads was 8–15 seconds. We pin the model in memory with a keepalive. Obvious in hindsight.
VRAM math is tighter than you think. Gemma 4 4B at Q4, plus an 8k context window, plus our Node process, plus the occasional burst of parallel requests: we hit OOM twice in the first week. We now cap concurrent local calls at 3 and queue the rest. Nothing fancy.
Prompt formats drift. A prompt that worked cleanly on the hosted model produced mush on Gemma. Small models are less forgiving of vague instructions. We ended up maintaining two prompt versions — one terse and explicit for Gemma, one more conversational for the hosted model. Not ideal but it's only two prompts.
Eval is annoying but necessary. You can't just swap models and hope. We built a small eval set (about 200 labeled messages) and run it whenever we change the local model or the prompt. Takes five minutes. Worth it.
Honestly, most people reading this probably shouldn't do this yet. A few cases where it doesn't make sense:
Phi-4 14B for one of the agents that does light reasoning over structured data. We haven't moved it yet because the quality bar is higher and I haven't built the eval set for it. Probably in April.
Also curious about Qwen 2.5 for a multilingual case we have, but that's further out.
That's the whole post. Nothing dramatic — a classifier moved, a bill went down, we learned some boring operational lessons. Small open-weight models finally being small enough to share a box with the agent runtime is, for us, the thing that made any of this viable.
Tijo Bear runs RapidClaw (rapidclaw.dev) with his brother Brandon — managed hosting for AI agents. If you're running agents and curious about hybrid local/hosted setups, the site has more.
2026-04-06 16:00:04
We’ve reached a strange point in history where we pay for hardware but don't actually own its behavior. You buy a "smart" device, but its heartbeat lives on a corporate server thousands of miles away. If that company goes bust or changes its Terms of Service, your device becomes a brick.
This is the Paradox of Smart Device Ownership. To solve it, we have to shift our focus from "convenience at any cost" to Free and Open Source Software (FOSS) and Open Hardware.
1.The Hierarchy of Needs: Control First
In the FOSS philosophy, Control is the prerequisite for both Privacy and Freedom.
Automation is secondary: Having your lights turn on at sunset is cool, but if you can’t turn them on when your internet is down, you don't have a smart home—you have a fragile one.
The Goal: Moving the "brain" of your home from the vendor's cloud to a Local Home Server.
2.The Silicon Revolution: RISC-V
Digital freedom is now moving down to the chip level. RISC-V is an Open Standard Instruction Set Architecture (ISA)—essentially the "Linux for chips."
Why it matters: It’s royalty-free and geologically neutral (HQ in Switzerland), preventing any single nation or corporation from pulling the plug.
Real-world impact: Projects like the Thejas32 (a government-backed board in India) prove that we can build high-performance compute power on open foundations.
3.Liberating the Hardware: The FOSS Toolkit
If you want to own your home, you have to replace the "spyware" that comes pre-installed on your gadgets. The FOSS community has built an incredible ecosystem to handle this:
Custom Firmware: Instead of using closed-source apps, flash your devices with Tasmota, ESPHome, or write your own logic using MicroPython. This forces the device to communicate only with you.
The Central Nervous System: Use Home Assistant or OpenHAB. These are local-first, FOSS platforms that aggregate all your devices into one interface without ever sending your data to a third-party cloud.
Local UI: For those building their own hardware interfaces, LVGL (Light and Versatile Graphics Library) allows you to create beautiful, professional-grade UIs on inexpensive microcontrollers.
4.Networking Without the Cloud
You don't need a corporate relay to access your home from the road. Tailscale (or its fully open-source implementation, Headscale) creates a private "Mesh VPN." This allows your devices to talk to each other securely over the internet as if they were on the same local wire, maintaining your privacy without sacrificing mobility.
5.Join the Movement
The transition to a FOSS-centric life isn't just about code; it's about community. Organizations like FOSS United, TinkerHub, and Liberated Hardware are actively building the tools we need to stay sovereign.
Whether it's using Ente for your photos, Standard Notes for your thoughts, every FOSS tool you adopt is a step away from digital feudalism and a step toward true ownership.
The Rule of Thumb: If the software isn't Open Source, you aren't the owner—you're the product. It's time to take the "Smart" back into our own hands.
2026-04-06 16:00:00
Semgrep CLI is a fast, open-source command-line tool for static analysis that finds bugs, security vulnerabilities, and anti-patterns in your code. Unlike heavyweight SAST tools that require complex server installations and proprietary configurations, Semgrep runs directly in your terminal, finishes most scans in seconds, and uses pattern syntax that mirrors the source code you are already writing. It supports over 30 programming languages and ships with thousands of pre-written rules maintained by the security community.
Whether you are a solo developer looking to catch SQL injection before it ships or a team lead evaluating static analysis tools for your CI pipeline, the Semgrep CLI is the starting point. Every feature of the broader Semgrep platform - cloud dashboards, PR comments, AI-powered triage - builds on top of this command-line foundation. Learning the CLI first gives you the knowledge to configure, debug, and optimize Semgrep in any environment.
This semgrep cli tutorial walks through everything from installation to running your first scan, choosing rulesets, working with output formats, ignoring findings, targeting specific files, integrating with CI, and tuning performance. By the end, you will have a working Semgrep setup that you can use locally and extend into your automated pipelines.
Semgrep provides three official installation methods. Choose the one that matches your environment and workflow.
The pip installation is the most common method and works on macOS, Linux, and Windows via WSL. You need Python 3.8 or later.
# Install Semgrep globally
pip install semgrep
# Verify the installation
semgrep --version
If you prefer to keep Semgrep isolated from your system Python packages, use pipx instead:
# Install with pipx for an isolated environment
pipx install semgrep
# Verify
semgrep --version
The pipx approach is particularly useful on machines where you manage multiple Python projects and want to avoid dependency conflicts. Semgrep bundles its own binary components alongside the Python package, so isolation prevents any interference with other tools.
On macOS, Homebrew provides a clean one-command installation:
# Install via Homebrew
brew install semgrep
# Verify
semgrep --version
Homebrew handles all dependencies automatically and makes upgrades straightforward with brew upgrade semgrep. This method is preferred by macOS users who already rely on Homebrew for their development tooling.
Docker is the best option for CI environments, shared build servers, or situations where you do not want to install anything on the host system:
# Pull the official Semgrep image
docker pull semgrep/semgrep
# Run a scan with Docker
docker run --rm -v "${PWD}:/src" semgrep/semgrep semgrep --config auto /src
The -v "${PWD}:/src" flag mounts your current working directory into the container at /src. This means Semgrep inside the container can read your source files without any of its dependencies touching your host system. The --rm flag removes the container after the scan finishes.
For repeated use, you can create a shell alias:
alias semgrep='docker run --rm -v "${PWD}:/src" semgrep/semgrep semgrep'
With this alias in place, you can run semgrep --config auto /src as if Semgrep were installed locally.
Regardless of which method you chose, confirm that Semgrep is working:
semgrep --version
# Expected output: semgrep 1.x.x
If you see a version number, the installation was successful. If you get a "command not found" error after a pip installation, your Python scripts directory is likely not in your PATH. On Linux, add ~/.local/bin to your PATH. On macOS, the path is typically ~/Library/Python/3.x/bin. You can fix this permanently by adding the following line to your shell profile:
export PATH="$HOME/.local/bin:$PATH"
For a more detailed walkthrough of the full Semgrep setup process including cloud configuration, see the guide on how to setup Semgrep.
With Semgrep installed, you can scan any project immediately - no configuration files, no accounts, and no rule definitions required.
Navigate to any project directory and run:
semgrep --config auto
The --config auto flag tells Semgrep to inspect your project, detect which languages and frameworks are present, and automatically download the relevant rules from the Semgrep Registry. It then runs those rules against your code and prints any findings to the terminal. This is the fastest way to see what Semgrep can do.
A typical first scan on a medium-sized project takes between 5 and 30 seconds. Semgrep does not need to compile your code or resolve dependencies - it performs pattern matching directly on source files, which is why it runs so quickly compared to traditional SAST tools.
When Semgrep finds an issue, the output looks like this:
src/api/users.py
python.lang.security.audit.dangerous-system-call
Detected subprocess call with shell=True. This can lead to
command injection vulnerabilities.
22│ subprocess.call(cmd, shell=True)
Each finding includes four pieces of information: the file path where the issue was found, the rule ID that triggered the match, a human-readable message explaining the problem, and the exact line of code that matched. The rule ID is important because you will use it later to configure suppressions and to look up rule documentation in the Semgrep Registry.
For more control over what Semgrep checks, specify a rule set by name instead of using auto:
# High-confidence security and correctness rules
semgrep --config p/default
# Broader security coverage with more findings
semgrep --config p/security-audit
# OWASP Top 10 vulnerability categories
semgrep --config p/owasp-top-ten
# Language-specific rules
semgrep --config p/python
semgrep --config p/javascript
semgrep --config p/golang
The p/default rule set is the best starting point for most teams. It contains rules curated by the Semgrep team for high confidence and low false positive rates. Once you are comfortable reviewing findings from p/default, you can layer on additional sets like p/security-audit for broader coverage.
For a deeper explanation of how to build and manage your own rule sets, see the guide on Semgrep custom rules.
Rulesets determine what Semgrep looks for in your code. Choosing the right combination affects both the quality of findings and the amount of noise your team needs to triage.
p/default contains approximately 600 high-confidence rules covering security vulnerabilities and correctness issues. These rules are maintained by the Semgrep team and have been tuned for low false positive rates across a wide range of codebases. This is the right starting point for every team.
p/security-audit is a broader collection that trades precision for coverage. It catches more potential issues but produces more findings that require manual review. Use this when you want comprehensive security scanning and have the bandwidth to triage additional results.
p/owasp-top-ten maps rules directly to the OWASP Top 10 categories - injection, broken access control, cryptographic failures, and so on. This set is valuable for compliance-driven teams that need to demonstrate OWASP coverage in audits or security reviews.
Semgrep provides curated rulesets for specific languages:
| Ruleset | Coverage |
|---|---|
| p/python | Python security, Django, Flask patterns |
| p/javascript | JavaScript and Node.js security |
| p/typescript | TypeScript-specific patterns |
| p/golang | Go security and error handling |
| p/java | Java security and Spring patterns |
| p/ruby | Ruby and Rails security |
| p/csharp | C# security patterns |
| p/php | PHP security patterns |
For infrastructure-as-code files, Semgrep offers dedicated rulesets:
| Ruleset | Coverage |
|---|---|
| p/terraform | Terraform misconfigurations |
| p/dockerfile | Dockerfile security best practices |
| p/docker-compose | Docker Compose issues |
| p/kubernetes | Kubernetes YAML security |
You can stack multiple rulesets in a single scan by passing multiple --config flags:
semgrep --config p/default --config p/security-audit --config p/python
Start with p/default alone, review those findings, and then add additional sets one at a time. Adding too many rulesets at once generates an overwhelming volume of findings and makes it difficult to prioritize what to fix first.
Semgrep's default output is human-readable terminal text, but most real-world workflows require structured output for integration with other tools, dashboards, or compliance systems.
JSON is the most versatile format for programmatic processing:
semgrep --config p/default --json > results.json
The JSON output contains an array of findings, each with the rule ID, file path, line and column numbers, matched code snippet, severity, and metadata including CWE identifiers and OWASP categories. You can pipe this output into jq, Python scripts, or any tool that consumes JSON.
# Count findings by severity
semgrep --config p/default --json | jq '[.results[] | .extra.severity] | group_by(.) | map({(.[0]): length}) | add'
SARIF (Static Analysis Results Interchange Format) is the industry standard for static analysis results. GitHub Code Scanning, Azure DevOps, and many security platforms consume SARIF natively:
semgrep --config p/default --sarif > results.sarif
SARIF output is particularly useful for Semgrep GitHub Action workflows where you want findings to appear in the GitHub Security tab:
# Generate SARIF and upload to GitHub Code Scanning
semgrep --config p/default --sarif --output results.sarif
For CI systems that expect JUnit-style test results:
semgrep --config p/default --junit-xml > results.xml
This format allows Semgrep findings to appear as "test failures" in CI dashboards that support JUnit reporting, such as Jenkins, CircleCI, and GitLab CI.
Use the --output flag to write results to a file while keeping the terminal output clean:
# Write JSON to a file
semgrep --config p/default --json --output results.json
# Write SARIF to a file
semgrep --config p/default --sarif --output results.sarif
This is useful when you need both human-readable feedback during development and machine-readable output for downstream processing.
Not every finding requires a code change. Test files, generated code, and known-safe patterns all produce findings that you need a way to suppress without losing track of real issues.
The most granular suppression method is the nosemgrep comment, placed on the line immediately before the flagged code:
# nosemgrep: python.lang.security.audit.dangerous-system-call
subprocess.call(safe_internal_command, shell=True)
In JavaScript or TypeScript:
// nosemgrep: javascript.express.security.audit.xss.mustache-escape
res.send(trustedHtmlContent);
In Go:
// nosemgrep: go.lang.security.audit.dangerous-exec-command
exec.Command("safe-binary", args...)
Always include the specific rule ID in the nosemgrep comment. A bare # nosemgrep without a rule ID suppresses all Semgrep rules on that line, which could hide future findings from new rules that you actually want to see.
For broader exclusions, create a .semgrepignore file in your repository root. The syntax follows .gitignore conventions:
# Test files
tests/
test/
*_test.go
*_test.py
*.test.js
*.test.ts
*.spec.js
*.spec.ts
# Generated code
generated/
__generated__/
*.generated.ts
# Vendored dependencies
vendor/
node_modules/
third_party/
# Build artifacts
dist/
build/
.next/
# Large files that slow down scanning
*.min.js
*.bundle.js
package-lock.json
The .semgrepignore file is the right place for paths that should never be scanned - code you do not own, code you cannot change, and code where findings are not actionable.
If a particular rule from a registry set consistently produces false positives in your codebase, you can exclude it from the scan entirely:
semgrep --config p/default --exclude-rule "generic.secrets.gitleaks.generic-api-key"
This is a better approach than removing an entire ruleset because of one noisy rule. You keep the coverage from all other rules in the set while silencing the specific one that does not work for your project.
You do not always need to scan your entire repository. Targeting specific paths speeds up iteration during development and helps you focus on the code that matters most.
semgrep --config p/default src/auth/login.py
semgrep --config p/default src/api/
semgrep --config p/default src/auth/ src/api/ src/middleware/
The --include and --exclude flags accept glob patterns for fine-grained control:
# Scan only Python files
semgrep --config p/default --include "*.py"
# Scan only JavaScript and TypeScript files
semgrep --config p/default --include "*.js" --include "*.ts"
# Exclude test directories
semgrep --config p/default --exclude "tests/" --exclude "test/"
# Combine include and exclude
semgrep --config p/default --include "*.py" --exclude "tests/"
During development, you often want to scan only the files you have changed. Combine Semgrep with git to achieve this:
# Scan only files changed since the last commit
semgrep --config p/default $(git diff --name-only HEAD~1)
# Scan only staged files
semgrep --config p/default $(git diff --cached --name-only)
# Scan only files changed on the current branch
semgrep --config p/default $(git diff --name-only main...HEAD)
This approach gives you rapid feedback during development without waiting for a full-repository scan.
Running Semgrep in your CI pipeline ensures every pull request and every merge is automatically scanned. Here is how to set up the most common integration.
Create .github/workflows/semgrep.yml:
name: Semgrep
on:
pull_request: {}
push:
branches:
- main
jobs:
semgrep:
name: Semgrep Scan
runs-on: ubuntu-latest
container:
image: semgrep/semgrep
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Run Semgrep
run: semgrep scan --config p/default --error
The --error flag causes Semgrep to exit with a non-zero code when findings are detected, which fails the GitHub Actions check and can block PRs from merging if branch protection is configured.
For teams using Semgrep Cloud, replace the scan command with semgrep ci and add your token:
- name: Run Semgrep
run: semgrep ci
env:
SEMGREP_APP_TOKEN: ${{ secrets.SEMGREP_APP_TOKEN }}
The semgrep ci command automatically performs diff-aware scanning on pull requests, uploads results to the Semgrep dashboard, and posts inline PR comments when the Semgrep GitHub App is installed. For a complete walkthrough of this setup, see the guide on Semgrep GitHub Action.
Add a Semgrep job to your .gitlab-ci.yml:
semgrep:
image: semgrep/semgrep
script:
- semgrep scan --config p/default --error
rules:
- if: $CI_MERGE_REQUEST_IID
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
Semgrep works with any CI platform that can run a shell command. The pattern is the same everywhere: install Semgrep (or use the Docker image), run semgrep scan --config p/default --error, and let the exit code determine whether the build passes or fails. Documented integrations exist for Jenkins, CircleCI, Buildkite, and Azure Pipelines.
To see Semgrep findings in the GitHub Security tab alongside CodeQL results:
- name: Run Semgrep
run: semgrep scan --config p/default --sarif --output results.sarif
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
if: always()
The if: always() condition ensures results are uploaded even when Semgrep finds issues and returns a non-zero exit code.
Semgrep is fast out of the box, but large repositories and complex rulesets can push scan times higher than you want for a CI check. Here are the most effective tuning options.
Semgrep runs scans in parallel by default. On machines with limited memory, reducing parallelism prevents out-of-memory errors:
# Use 2 parallel jobs instead of the default
semgrep --config p/default --jobs 2
# Run sequentially (useful for debugging)
semgrep --config p/default --jobs 1
Minified JavaScript, bundled files, and lock files can be enormous and slow down scanning without producing useful findings:
# Skip files larger than 500KB
semgrep --config p/default --max-target-bytes 500000
Combine this with .semgrepignore entries for *.min.js, *.bundle.js, and package-lock.json to eliminate the most common offenders permanently.
Some rules on certain files can take an unusually long time. Set a per-rule timeout to prevent any single rule from stalling the entire scan:
# 30-second timeout per rule per file
semgrep --config p/default --timeout 30
If a rule exceeds the timeout on a given file, Semgrep skips that rule-file combination and continues scanning. The skipped match is reported in the scan summary so you know it happened.
For CI runners with limited memory, set an upper bound:
# Limit Semgrep to 4GB of memory
semgrep --config p/default --max-memory 4000
The single most impactful performance optimization for CI is diff-aware scanning. Instead of scanning every file in the repository, Semgrep analyzes only the files changed in the current pull request:
# semgrep ci automatically does diff-aware scanning on PRs
semgrep ci
With diff-aware scanning, the median CI scan time drops to approximately 10 seconds regardless of repository size. This is because most pull requests touch only a handful of files, and Semgrep only needs to run rules against those specific changes.
For a large repository with CI time constraints, a well-tuned command looks like this:
semgrep scan \
--config p/default \
--exclude "vendor/" \
--exclude "node_modules/" \
--exclude "*.min.js" \
--max-target-bytes 500000 \
--timeout 30 \
--max-memory 4000 \
--error
This configuration runs the high-confidence default rules, skips vendored code and large files, prevents individual rules from hanging, caps memory usage, and fails the build when real findings are detected.
While Semgrep excels at pattern-based static analysis, CodeAnt AI takes a different approach by combining AI-powered code review with SAST, secret detection, and infrastructure-as-code scanning in a single platform. Starting at $24/user/month for the Basic plan and $40/user/month for Premium, CodeAnt AI provides line-by-line PR feedback, one-click auto-fix suggestions, and support for over 30 languages.
CodeAnt AI is worth evaluating if you want an all-in-one platform that handles both the AI review layer and the deterministic security scanning layer. Many teams run Semgrep CLI for its deep rule customization alongside CodeAnt AI for broader automated review coverage. For a broader comparison of tools in this space, see the roundup of Semgrep alternatives.
Here is a quick reference for the commands and flags covered in this tutorial:
| Command | Purpose |
|---|---|
semgrep --config auto |
Auto-detect languages and scan with relevant rules |
semgrep --config p/default |
Scan with the curated high-confidence ruleset |
semgrep --config p/default --json |
Output findings in JSON format |
semgrep --config p/default --sarif |
Output findings in SARIF format |
semgrep --config p/default --error |
Exit with non-zero code on findings (for CI) |
semgrep --config p/default --include "*.py" |
Scan only Python files |
semgrep --config p/default --exclude "tests/" |
Skip the tests directory |
semgrep --config p/default --jobs 2 |
Limit parallel scanning jobs |
semgrep --config p/default --timeout 30 |
Set per-rule timeout in seconds |
semgrep --config p/default --max-target-bytes 500000 |
Skip files larger than 500KB |
semgrep --config p/default --max-memory 4000 |
Cap memory usage at 4GB |
semgrep --exclude-rule "rule.id" |
Exclude a specific rule from the scan |
semgrep ci |
CI-optimized scan with diff-awareness |
Once you have Semgrep CLI running locally and producing findings, the natural progression is to deepen your configuration and integrate it into your team workflow:
Semgrep CLI is a foundation that scales from a single developer running quick scans on a laptop to enterprise teams scanning millions of lines across hundreds of repositories. The key is to start simple with semgrep --config auto, build confidence in the findings, and expand your configuration as your team's needs grow.
The fastest way to install Semgrep CLI is with pip by running 'pip install semgrep'. On macOS you can also use Homebrew with 'brew install semgrep'. For containerized environments, pull the official Docker image with 'docker pull semgrep/semgrep' and mount your source directory into the container. After installation, verify everything works by running 'semgrep --version' in your terminal. Python 3.8 or later is required for the pip method.
The 'semgrep scan' command runs a local scan against your codebase using rule sets you specify on the command line. The 'semgrep ci' command is designed for CI/CD pipelines and adds diff-aware scanning, automatic rule configuration from Semgrep Cloud policies, result uploading to the Semgrep dashboard, and PR comment integration. Use 'semgrep scan' for local development and 'semgrep ci' in your automated pipelines.
Start with 'semgrep --config auto' which automatically detects the languages and frameworks in your project and selects relevant rules. If you want more control, use 'semgrep --config p/default' which includes high-confidence security and correctness rules with low false positive rates. Avoid starting with broad sets like p/security-audit until you are comfortable triaging findings from the default set.
Use the --json flag to get JSON output: 'semgrep --config p/default --json > results.json'. For SARIF format, which is used by GitHub Code Scanning and other security platforms, use '--sarif': 'semgrep --config p/default --sarif > results.sarif'. Semgrep also supports JUnit XML with '--junit-xml' and Emacs-compatible output with '--emacs'. You can combine format flags with '--output filename' to write results to a file while still seeing terminal output.
Add a nosemgrep comment on the line immediately before the flagged code. Use the format '# nosemgrep: rule-id' in Python, '// nosemgrep: rule-id' in JavaScript or Go, and the appropriate comment syntax for your language. Always include the specific rule ID rather than a blanket suppression so the intent is documented. For broader exclusions, add paths to a .semgrepignore file in your repository root.
Yes. Pass file or directory paths as arguments after the config flag: 'semgrep --config p/default src/api/ src/auth/'. You can also use '--include' and '--exclude' flags with glob patterns. For example, '--include "*.py"' scans only Python files and '--exclude "tests/"' skips the tests directory. These flags can be combined to precisely target the code you want to analyze.
Semgrep is one of the fastest SAST tools available. Most scans complete in under 30 seconds for a typical codebase, and the median CI scan time is approximately 10 seconds because Semgrep supports diff-aware scanning that analyzes only changed files. This is significantly faster than tools like SonarQube, Checkmarx, or Veracode, which can take minutes to hours for comparable analysis. Semgrep achieves this speed by running pattern matching directly on the source code without requiring a full compilation or build step.
Semgrep CLI does not run natively on Windows. The recommended approach for Windows users is to use Windows Subsystem for Linux (WSL) and install Semgrep via pip within the WSL environment. Alternatively, you can run Semgrep through Docker on Windows by using 'docker run semgrep/semgrep' with your source directory mounted as a volume. Both approaches provide the full Semgrep feature set on Windows machines.
Semgrep supports over 30 programming languages including Python, JavaScript, TypeScript, Java, Go, Ruby, C, C++, C#, Rust, Kotlin, Swift, PHP, Scala, Terraform, Dockerfile, and Kubernetes YAML. The open-source engine provides full support for all these languages. The Semgrep Pro engine, available through Semgrep Cloud, adds cross-file and cross-function dataflow analysis for a subset of these languages.
Several techniques help: use '--exclude' to skip directories like vendor/, node_modules/, and build artifacts. Set '--max-target-bytes 500000' to skip very large files. Use '--jobs N' to control parallelism and reduce memory pressure. Add a .semgrepignore file to permanently exclude paths that do not need scanning. In CI, use 'semgrep ci' which automatically performs diff-aware scanning to analyze only changed files instead of the full repository.
Yes, Semgrep CLI is fully free and open source under the LGPL-2.1 license. It includes over 2,800 community-maintained rules covering security, correctness, and best practices. The Semgrep Cloud platform adds cross-file analysis, 20,000+ Pro rules, AI-powered triage, and a web dashboard - and it is free for teams of up to 10 contributors. Beyond 10 contributors, the Team plan costs $35 per contributor per month.
The simplest approach is to add Semgrep to your CI workflow using the official Docker image. In GitHub Actions, create a workflow that runs 'semgrep ci' with a SEMGREP_APP_TOKEN secret for cloud integration, or 'semgrep scan --config p/default --error' for standalone scanning. The '--error' flag causes Semgrep to exit with a non-zero code when findings are detected, which fails the CI check. Semgrep also works with GitLab CI, Jenkins, CircleCI, Buildkite, and Azure Pipelines.
Originally published at aicodereview.cc
2026-04-06 15:56:56
For years I treated side projects like a second job I was failing at. 😅
I was still building. The fuel was not a trophy list. It was ideas: problems I could not drop, small "what ifs," things I wanted to exist whether or not anyone asked. Some turned into code and demos. Some stayed half-born in notes. When something did ship, I parked it on Side Projects so I could point back without turning the build into a performance. The hard part was not a lack of sparks. It was guilt. Every hour on a personal repo felt like an hour I stole from rest, from my day job, from whatever version of adulthood the noise online says you should perform.
If you have ever closed your laptop at 1 AM 🌙 and thought, this does not count, you know the feeling.
Here is the lesson that took me too long to learn: side projects add up over time even when they never become startups. Not because every repo needs a prize. Because they train skills your sprint board rarely optimizes for. ✨
At work, someone else shapes the problem, writes the ticket, and often picks the hard choices for you. That is not a knock on you. It is how real companies ship.
Side projects push you into the step before that: choosing what is worth building when no one asked, when the scope is yours, when the only deadline is your own pride. That is where you practice decisions when nothing is spelled out, the same habit that later helps when two technical options are both "fine" and someone has to pick.
You also get shipping in public if you publish the work. 📣 A private spike teaches syntax. A public repo teaches taste, communication, and the slow work of explaining your own mess.
Figure from Austin Kleon's Show Your Work!.
That book is where I first saw the learn and teach loop drawn as one circle. You learn, you share what you know, and sharing feeds the next round of learning. Austin Kleon's "show your work" idea is not about showing off. It is about letting your thinking meet real people. I wrote about why that cycle is a career catalyst for builders, and how I use it in mentorship, in this LinkedIn post.
And Guess what? The web is not fair. Polished demos can flop. Rough hacks can take off. You cannot chase likes without turning the whole thing into a second social media job. You can still chase practice reps: ship, write it down, cut scope, finish something. 🎯
When I say compound, I do not mean every project becomes a clean story for interviews. I mean a set of skills that still help you after you stop opening the repo every week.
Spotting patterns is the big one. After you debug your own auth flow, deploy surprises, and your own "why is this slow" hunts, incidents at work start to look familiar. The details change. The shape of the problem often does not.
Knowing your tools is the quiet kind of compound interest. 🛠️ You learn a framework because a weekend idea needed it. Two years later that is not just a resume line. It is speed when the team needs a prototype, or when you read someone else's system and you actually get the constraints.
Proof you can ship matters more than people say, especially early. ✅ Not proof you are a genius. Proof you can take an idea from zero to something another person can run. That is a different signal than "I finished courses."
There is a follow-on effect too. 🤖 You can ship side projects lightning fast now: AI handles boilerplate, glue code, and first drafts so an idea can become a working thing in hours, not weeks. As that wall gets lower, the hard part moves earlier. The rare skill is not typing UI faster. It is naming the problem, picking limits, and knowing what "good enough" means for a user. Side projects are a low-risk place to grow that product sense without treating a tutorial like owning a real feature.
Product engineering is part of that same stack, and it grows fast when you build for yourself. You are the user, the scope owner, and the engineer in one loop, so tradeoffs land in your head instead of across three roles in a meeting. I wrote about why that mindset is the skill that survives when AI handles more of the code in product engineering in the AI era.
Growth needs finish lines, not only new ideas. If every spark becomes a new foundation and nothing ships, you get the joke instead of the skills.
Strip: West Side-project story (CommitStrip, 2014).
Side projects still cost something. Sleep. 😴 Relationships. The trap of using "hustle" to skip rest. I will not tell you every engineer needs a perfect GitHub graph. That is not wellness. It is stress with a brand. 😬
The useful idea is smaller: if you already build on the side, stop treating it like a character flaw. Call it practice that stacks over time, then guard your time like a grown person. A side project that ships in six weeks with weekends intact beats a "rewrite everything" dream that eats six months of guilt.
I would pick smaller scope and celebrate done more loudly. I would treat docs as part of the product, not an afterthought, because future me is also a user. 📝
I would split real learning from "learning for the feed." Some spikes stay private. Not everything needs a post. The value still lands in your head. 🧠
Most of all, I would stop forcing side projects to prove themselves in the same frame as my job. They are not copies of each other. They are different gyms. One pays your salary. The other builds more choices, speed, and confidence when you see a work problem no one filed yet.
If you are building something odd this weekend and part of you feels guilty for not "optimizing" your career, that build might already be the optimization. Ship the smallest version. Ship it. Then step away.
The compound part was never the repo. It was the person all those reps built. ✨
Originally published at souravdey.space.
2026-04-06 15:56:12
Previously I already touched on the topic of design patterns. Today I want to continue this topic. Once more during my everyday work, I encountered a situation that perfectly illustrates the usage of one, and I think it's worth sharing.
Classic design patterns can feel awkward when transferred directly into modern React. Many of them were designed for stateful class hierarchies, and mapping them one-to-one to hooks and functional components often produces more ceremony than value. But some patterns remain genuinely useful — especially when adapted to fit the functional style rather than forced into their original shape. The Strategy pattern is one of them.
In our app, push notifications are critical for keeping users informed about important events in real time. But what happens when push notifications are disabled? We decided to implement a fallback: when push is off, establish a WebSocket connection through AWS Amplify Events API to receive events through an alternative channel.
The first implementation was a single custom hook that handled everything:
export const useRealtimeConnection = () => {
const token = useAuthToken();
const user = useCurrentUser();
const { isInternet, isPushEnabled, isInForeground } = useConnectionState();
const subRef = useRef(null);
useEffect(() => {
if (isPushEnabled || !user.id || !isInForeground || !isInternet || !token)
return;
configureAmplify();
let channel;
const connectAndSubscribe = async () => {
try {
channel = await events.connect(`/user/${user.id}/notifications`, {
authToken: token,
});
subRef.current = channel.subscribe({
next: (data) => onDashboardEvent(data.event),
error: (err) => console.error("[Realtime] Error:", err),
});
} catch (error) {
console.error("[Realtime] Connection failed");
}
};
connectAndSubscribe();
return () => {
subRef.current?.unsubscribe();
subRef.current = null;
channel?.close();
};
}, [isPushEnabled, token, user, isInForeground, isInternet]);
};
This worked perfectly. We were happy and moved on.
A few weeks later, a new requirement arrived: the registration flow also needed real-time event handling. Users go through identity verification, document uploads, and compliance checks — all of which can trigger events that need to be communicated back immediately.
But the registration scope had meaningfully different requirements:
My first instinct was to create useRegistrationRealtimeConnection, copy the logic, and adjust it. But as soon as I started, alarm bells went off. The details differ, but the structure is identical:
That's a textbook duplication risk. That was the moment I reached for the Strategy pattern.
Strategy Pattern: Define a family of algorithms, encapsulate each one, and make them interchangeable. Strategy lets the algorithm vary independently from clients that use it.
Think of navigation apps — fastest route, shortest route, avoid highways. The interface stays the same, but the routing algorithm changes based on your chosen strategy.
In our case, this looked like a good fit for a Strategy-style refactor. The lifecycle algorithm — connect, subscribe, clean up — stays fixed in the hook. What varies is the connection policy: when to connect, which endpoint to use, how to identify the user. Extracting that variation into strategy objects would let the hook remain stable while each scope provides its own rules.
I'll come back to whether this is really Strategy in a moment. First, the implementation.
libs/
realtime/
strategies/
types.ts
dashboard.ts
registration.ts
selectStrategy.ts
index.ts
useConnectionState.ts
useRealtimeConnection.ts
// libs/realtime/strategies/types.ts
export type ConnectionParams = {
token: string | null;
user: AppUser | AuthenticatedUser;
isPushEnabled: boolean;
isInForeground: boolean;
isInternet: boolean;
};
export type RealtimeStrategy = {
scope: "dashboard" | "registration";
shouldConnect: (params: ConnectionParams) => boolean;
getEndpoint: (user: AppUser | AuthenticatedUser) => string;
getIdentifier: (user: AppUser | AuthenticatedUser) => string | number;
};
Each strategy is a plain object — no hooks, no side effects, just functions that receive data and return values. shouldConnect encodes the policy: given a snapshot of the current environment, should we connect? scope is the discriminator that lets the hook route events to the right handler.
// libs/realtime/strategies/dashboard.ts
export const dashboardStrategy: RealtimeStrategy = {
scope: "dashboard",
shouldConnect: ({ token, user, isPushEnabled, isInForeground, isInternet }) =>
isAuthenticated(user) &&
token != null &&
!isPushEnabled &&
isInForeground &&
isInternet,
getEndpoint: (user) => {
if (!isAuthenticated(user)) throw new Error("User not authenticated");
return `/user/${user.id}/notifications`;
},
getIdentifier: (user) => {
if (!isAuthenticated(user)) throw new Error("User not authenticated");
return user.id;
},
};
// libs/realtime/strategies/registration.ts
export const registrationStrategy: RealtimeStrategy = {
scope: "registration",
shouldConnect: ({ token, user, isPushEnabled, isInForeground, isInternet }) =>
!isAuthenticated(user) &&
token != null &&
user.sessionId != null &&
!isPushEnabled &&
isInForeground &&
isInternet,
getEndpoint: (user) => {
if (!user.sessionId) throw new Error("Session not available");
return `/registration/${user.sessionId}/notifications`;
},
getIdentifier: (user) => user.sessionId ?? "unknown",
};
The isAuthenticated type guard makes the two strategies mutually exclusive: the dashboard strategy only activates for a fully signed-in user, the registration strategy for an unauthenticated session. Neither touches React — they're pure objects you could test with a single function call.
// libs/realtime/strategies/selectStrategy.ts
export const selectRealtimeStrategy = (
route: AppRoute,
): RealtimeStrategy | null => {
switch (route) {
case "Dashboard":
return dashboardStrategy;
case "Registration":
case "ResumeRegistration":
return registrationStrategy;
default:
return null;
}
};
A pure function — no hooks, no side effects. Called inside the hook via useMemo, so the strategy reference only changes when the user navigates to a different scope. TypeScript will warn if a new route is added and this switch isn't updated.
// libs/realtime/useRealtimeConnection.ts
export const useRealtimeConnection = () => {
const token = useAuthToken();
const user = useCurrentUser();
const route = useCurrentRoute();
const { isInternet, isPushEnabled, isInForeground } = useConnectionState();
const strategy = useMemo(() => selectRealtimeStrategy(route), [route]);
const handler = useNotificationHandler(strategy?.scope ?? null);
const subRef = useRef(null);
useEffect(() => {
if (!strategy) return;
const shouldConnect = strategy.shouldConnect({
token,
user,
isPushEnabled,
isInForeground,
isInternet,
});
if (!shouldConnect || !token) return;
configureAmplify();
let channel;
const connectAndSubscribe = async () => {
try {
const endpoint = strategy.getEndpoint(user);
const identifier = strategy.getIdentifier(user);
console.log(
`[Realtime] Connecting to ${strategy.scope} – ${identifier}`,
);
channel = await events.connect(endpoint, { authToken: token });
subRef.current = channel.subscribe({
next: (data) => handler(data.event),
error: (err) => console.error("[Realtime] Error:", err),
});
} catch (error) {
console.error(`[Realtime] Connection failed for ${strategy.scope}`);
}
};
connectAndSubscribe();
return () => {
subRef.current?.unsubscribe();
subRef.current = null;
channel?.close();
};
}, [
token,
user,
isPushEnabled,
isInForeground,
isInternet,
strategy,
handler,
]);
};
Compare this to the original hook from the Problem section: the structure is identical. The hardcoded endpoint and inline guard condition moved into the strategy; event handling is now routed separately based on the selected scope via useNotificationHandler. The hook no longer knows anything about dashboards or registration — it just manages the connection.
This is worth pausing on, because the honest answer is: partially.
Why it is more than just configuration:
The strategies contain real decision logic. shouldConnect is not a static flag — it evaluates auth state, network state, foreground status, and push permission together. getEndpoint and getIdentifier encapsulate behavior that differs meaningfully between scopes. If you replaced them with a plain config object, that logic would have to move somewhere — most likely back into the hook, which is exactly what we were trying to avoid.
Why it is not full classical Strategy:
In the textbook GoF pattern, the strategy encapsulates the entire algorithm. Here, the hook still owns the lifecycle — connect, subscribe, clean up. The strategy only controls the connection policy: whether to connect, where, and who. Event handling is also routed externally via scope and useNotificationHandler, rather than being part of the strategy itself.
The honest label: this is a Strategy/policy hybrid — a pattern-inspired design that extracts variable policy from an invariant lifecycle, adapted to React's functional model rather than the class-based structure the original pattern assumed. That adaptation is intentional, not a shortcoming.
Before: one hook, one hardcoded scope, no clear path to extend without duplication or conditionals.
After: a clean system where:
case — nothing else changesexpect(dashboardStrategy.shouldConnect({...})).toBe(true)
When we later needed to add real-time connections for our customer support chat — different endpoint, different events, different auth — it took less than an hour. New strategy file, one new case in the selector, done.
The trade-off is real though: this design adds indirection. There's a selection layer, a type contract, and multiple files where one hook used to be. If you only ever have two scopes and no growth expected, this abstraction might cost more than it saves. Before reaching for this structure, ask: is this variation expected to grow? In our case the answer was clear, but it won't always be.
Recognize Duplication Early: When you're about to copy-paste a hook with "just a few changes," pause and consider whether there's a pattern that fits.
Strategy Still Shines: Despite being a "classic" pattern, Strategy remains useful in modern React for handling variations of the same algorithm.
Adapt Patterns to Your Context: In React, strategies can be plain objects with typed method signatures — no classes, no factories, no hooks inside the strategy itself.
Extract Invariants: The hook manages the connection lifecycle; useConnectionState owns the environmental signals; strategies encapsulate the variable connection policy. Each has one job.
Be Honest About What You've Built: This isn't a full Strategy — it's a Strategy-style policy extraction. Knowing the difference helps you explain the design and judge when the same approach fits elsewhere.
Type Safety Pays Off: TypeScript ensures all strategies follow the same contract. Adding a new strategy without satisfying the interface is a compile error, not a runtime surprise.
Thank you for your attention and happy hacking!