MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

Instruction Best Practices: Precision Beats Clarity

2026-03-24 21:12:30

Two rules in the same file. Both say "don't mock."

When working with external services, avoid using mock objects in tests.

When writing tests for src/payments/, do not use unittest.mock.

Same intent. Same file. Same model. One gets followed. One gets ignored.

I stared at the diff for a while, convinced something was broken. The model loaded the file. It read both rules. It followed one and walked past the other like it wasn't there.

Nothing was broken. The words were wrong.

The experiment

I ran controlled behavioral experiments: same model, same context window, same position in the file. One variable changed at a time. Over a thousand runs per finding, with statistically significant differences between conditions.

Two findings stood out.

First (and the one that surprised me most): when instructions have a conditional scope ("When doing X..."), precision matters enormously. A broad scope is worse than a wrong scope.

Second: instructions that name the exact construct get followed roughly 10 times more often than instructions that describe the category. "unittest.mock" vs "mock objects" — same rule, same meaning to a human. Not the same to the model.

Scope it or drop it

Most instructions I see in the wild look like this:

When working with external services, do not use unittest.mock.

That "When working with external services" is the scope — it tells the agent when to apply the rule. Scopes are useful. But the wording matters more than you'd expect.

I tested four scope wordings for the same instruction:

# Exact scope — best compliance
When writing tests for src/payments/, do not use unittest.mock.

# Universal scope — nearly as good
When writing tests, do not use unittest.mock.

# Wrong domain — degraded
When working with databases, do not use unittest.mock.

# Broad category — worst compliance
When working with external services, do not use unittest.mock.

Read that ranking again. Broad is worse than wrong.

"When working with databases" has nothing to do with the test at hand. But it gives the agent something concrete - a specific domain to anchor on. The instruction is scoped to the wrong context, but it's still a clear, greppable constraint.

"When working with external services" is technically correct. It even sounds more helpful. But it activates a cloud of associations - HTTP clients, API wrappers, service meshes, authentication, retries - and the instruction gets lost in the noise.

The rule: if your scope wouldn't work as a grep pattern, rewrite it or drop it.

An unconditional instruction beats a badly-scoped conditional:

# Broad scope — fights itself
When working with external services, prefer real implementations
over mock objects in your test suite.

# No scope — just say it
Do not use unittest.mock.

The second version is blunter. It's also more effective. Universal scopes ("When writing tests") cost almost nothing — they frame the context without introducing noise. But broad category scopes actively hurt.

Name the thing

Here's what the difference looks like across domains.

# Describes the category — low compliance
Avoid using mock objects in tests.

# Names the construct — high compliance
Do not use unittest.mock.

# Category
Handle errors properly in API calls.

# Construct
Wrap calls to stripe.Customer.create() in try/except StripeError.

# Category
Don't use unsafe string formatting.

# Construct
Do not use f-strings in SQL queries. Use parameterized queries
with cursor.execute().

# Category
Avoid storing secrets in code.

# Construct
Do not hardcode values in os.environ[]. Read from .env
via python-dotenv.

The pattern: if the agent could tab-complete it, use that form. If it's something you'd type into an import statement, a grep, or a stack trace - that's the word the agent needs.

Category names feel clearer to us, humans. "Mock objects" is plain English. But the model matches against what it would actually generate, not against what the words mean in English. "unittest.mock" matches the tokens the model would produce when writing test code. "Mock objects" matches everything and nothing.

Think of it like search. A query for unittest.mock returns one result. A query for "mocking libraries" returns a thousand. The agent faces the same problem: a vague instruction activates too many associations, and the signal drowns.

The compound effect

When both parts of the instruction are vague - vague scope, vague body - the failures compound. When both are precise, the gains compound.

# Before — vague everywhere
When working with external services, prefer using real implementations
over mock objects in your test suite.

# After — precise everywhere
When writing tests for `src/payments/`:
Do not import `unittest.mock`.
Use the sandbox client from `tests/fixtures/stripe.py`.

Same intent. The rewrite takes ten seconds. The difference is not incremental, it's categorical.

Formatting gets the instruction read - headers, code blocks, hierarchy make it scannable. Precision gets the instruction followed - exact constructs and tight scopes make it actionable. They work together. A well-formatted vague instruction still gets ignored. A precise instruction buried in a wall of text still gets missed. You need both.

When to adopt this

This matters most when:

  • Your instruction files mention categories more than constructs, like "services," "libraries," "objects," "errors" etc.
  • You use broad conditional scopes: "when working with...," "for external...," "in general..."
  • You have rules that are loaded and read but not followed
  • You want to squeeze more compliance out of existing instructions without restructuring the file

It matters less when your instructions are already construct-level ("do not call eval()") or unconditional.

Try it

  1. Open your instruction files.
  2. Find every instruction that uses a category word -> "services," "objects," "libraries," "errors," "dependencies."
  3. Replace it with the construct the agent would encounter at runtime - the import path, the class name, the file glob, the CLI flag.
  4. For conditional instructions: replace broad scopes with exact paths or file patterns. If you can't be exact, drop the condition entirely - unconditional is better than vague.

Then run your agent on the same task that was failing. You'll see the difference.

Formatting is the signal. Precision is the target.

"The human might be asleep." One line in Karpathy's program.md started 100 automatic experiments per night.

2026-03-24 21:09:57

autoresearch architecture diagram

The biggest bottleneck in code optimization is the human in the loop. You think of an idea, implement it, test it, check results, then think again. In March 2026, Andrej Karpathy removed that bottleneck. He released autoresearch, a tool that lets an AI agent edit code, run experiments, evaluate results, and keep or discard changes automatically. It hit 42,921 GitHub stars in under two weeks (GitHub API, 2026-03-19 11:56 UTC).

The surprising part is where it spread. Shopify CEO Tobi Lutke applied the pattern to Liquid, a template engine running in production for 20 years. He reported a 53% reduction in parse+render time in PR #2056. LangChain CEO hwchase17 used it to optimize agent quality scores. Ole Lehmann reported raising a Claude Code skill eval score from 56% to 92%. This is not an ML research tool anymore. It is a pattern for any task with a measurable metric.

autoresearch loop diagram

Why three files are enough

The architecture is stripped to the minimum. There are three core files.

program.md is the instruction file. A human writes it. It defines what to optimize, how to run experiments, and what must not break. train.py is the only file the agent edits. prepare.py is the evaluation harness. Nobody touches it.

This separation works because the boundary between "what changes" and "what stays fixed" is clear. The agent edits train.py, runs a 5-minute experiment, checks the metric. If it improved, git commit. If not, git reset. About 12 experiments per hour. Leave it running overnight and you get about 100.

The 5-minute cap is what makes this work. It forces every experiment into the same budget. You can compare results fairly. Without a fixed budget, a slow-converging change looks just as good as a fast one. The cap makes comparison possible.

program.md includes this line: "The human might be asleep, or gone from a computer and expects you to continue working indefinitely until you are manually stopped." That single instruction removes the human bottleneck.

autoresearch applications diagram

126 experiments from Karpathy, 974 tests from Shopify

Karpathy ran 126 experiments on a single H100 in about 10.5 hours. He published the full log in Discussion #43. Out of 126 experiments, 23 were kept. That is about 18%. Most experiments fail or make things worse. But the ones that improve stack up. val_bpb went from 0.9979 to 0.9697.

The biggest win was halving the batch size (524K to 262K), which gave -0.0119. The biggest failure was weight tying (sharing embed and unembed layers), which added +2.24 BPB and completely broke the model. The dead-end log is valuable too. Knowing what does not work saves future experiments from going in the wrong direction.

Shopify took a different approach. The target was a Ruby library (lib/liquid/*.rb), not ML training code. The metric was combined_us (parse + render time), not val_bpb. The critical difference was a 3-gate validation system. Every change had to pass 974 unit tests, then a liquid-spec compliance check, then a performance benchmark. Only changes that passed all three gates and improved the metric were kept. About 120 experiments produced 93 commits. Parse time dropped 61%. Render time dropped 20%. Total dropped 53%.

The key insight was that garbage collection was consuming 74% of CPU time. Focusing on reducing object allocations drove most of the improvement. Allocations went from 62,620 to 24,530, a 61% reduction.

Caveats

Shopify PR #2056 was still OPEN as of 2026-03-19. It has not been merged. Comments on the PR mention test failures. The 53% figure is self-reported and has not been independently verified.

Metrics gaming is a known issue. After 30+ iterations, agents start finding ways to improve the metric without real improvement. Random seed engineering is one example. Karpathy's log includes fragile improvements like "seed 137 effect" that may not reproduce.

autoresearch-at-home (440+ stars) extends the pattern to distributed collaboration in a SETI@home style. autoresearch-anything (by zkarimi22) generates setup files for any project with npx autoresearch-anything. The MLX port for Apple Silicon found that a depth=4 model beats depth=8 under the 5-minute budget. Smaller models that run more optimizer steps win. The optimal setup depends on the hardware.

Conclusion

autoresearch success does not depend on model capability. It depends on three design choices. Metric: what you measure. Scope: what the agent is allowed to change. Verify: what tests and constraints protect the things that must not break.

Shopify's 53% improvement happened because they built a 3-gate Verify system with 974 tests, spec compliance, and benchmarks. If you want to apply this pattern, start by asking two questions. Do you have a measurable metric? Do you have a test suite that protects what matters? If the answer to both is yes, you can let an AI run 100 experiments while you sleep.

Building a Concurrent TCP Chat Server in Go (NetCat Clone)

2026-03-24 21:05:53

In this project, we built a simplified version of the classic NetCat ("nc") tool — a TCP-based chat server that allows multiple clients to connect, send messages, and interact in real time.

The goal was not just to recreate a chat system, but to deeply understand:

  • TCP networking
  • Go concurrency (goroutines & channels)
  • State management in concurrent systems
  • Client-server architecture

At its core, the system needed to:

  • Accept multiple client connections
  • Allow clients to send messages
  • Broadcast messages to other clients
  • Track when users join or leave
  • Handle unexpected disconnects (like Ctrl+C)

This introduces a key challenge:

«Multiple clients interacting with shared state at the same time.»

TCP Server Basics

The server listens for incoming connections using:

listener, _ := net.Listen("tcp", ":8989")

Then continuously accepts clients:

for {
conn, _ := listener.Accept()
go handleConnection(conn)
}

Important concept:

  • "Accept()" blocks until a client connects
  • Each client is handled in a separate goroutine

This allows multiple users to connect simultaneously.

Goroutines and Concurrency

Each client runs in its own goroutine:

go handleConnection(conn)

This means:

  • One slow client does not block others
  • Each connection is handled independently

However, this introduces a problem:

«Multiple goroutines modifying shared data can cause race conditions.»

The Shared State Problem

We needed to track all connected clients:

var connections map[net.Conn]string

But multiple goroutines might:

  • Add clients
  • Remove clients
  • Broadcast messages

At the same time.

This can cause:

  • Data corruption
  • Crashes ("fatal error: concurrent map writes")

Solution 1: Mutex (Not Used)

One approach is using a mutex:

mu.Lock()
connections[conn] = name
mu.Unlock()

But this introduces:

  • Complexity
  • Potential deadlocks
  • Performance bottlenecks

Solution 2: Channels (The Go Way)

Instead of sharing memory, we used channels to communicate changes.

This follows Go’s philosophy:

«“Do not communicate by sharing memory; share memory by communicating.”»

ChatRoom Architecture

We designed a "ChatRoom" struct:

type ChatRoom struct {
chatters map[*Client]struct{}
history []string

Register   chan *Client
Unregister chan *Client
Broadcast  chan Message

}

Key Idea

  • Only one goroutine manages "chatters" and "history"
  • Other goroutines send events via channels

The Event Loop

The core of the system is the "Run()" method:

for {
select {
case client := <-Register:
case client := <-Unregister:
case msg := <-Broadcast:
}
}

This acts like a central controller.

Handling Events

  1. Client Join
  • Ask for name
  • Add to chatters map
  • Send chat history
  • Broadcast join message

cr.chatters[client] = struct{}{}

  1. Message Broadcast
  • Append to history
  • Send to all clients except sender

for c := range cr.chatters {
if c != sender {
c.receive <- message
}
}

  1. Client Leave
  • Remove from map
  • Close channel
  • Broadcast leave message

Handling Ctrl+C (Unexpected Disconnects)

When a client presses Ctrl+C:

  • TCP connection closes
  • "ReadString()" returns "io.EOF"

We detect this:

if err != nil {
// client disconnected
}

And broadcast:

"%s has left the chat"

Message Flow

Here’s how a message travels:

  1. Client sends message
  2. Goroutine reads it
  3. Sends it to "Broadcast" channel
  4. "Run()" receives it
  5. Loops through clients
  6. Sends message to each client’s "receive" channel
  7. Client writer goroutine prints it

Key Concepts Learned

  1. TCP is Just a Stream
  • Everything is bytes
  • Messages are manually structured ("\n")
  1. Blocking is Normal
  • "Accept()" blocks waiting for connections
  • "ReadString()" blocks waiting for input

But only within their goroutine.

  1. Goroutines Enable Concurrency
  • Lightweight threads managed by Go
  • Thousands can run efficiently
  1. Channels Simplify Concurrency
  • Avoid shared memory issues
  • Centralize state management
  • Create predictable flow
  1. Interfaces in Go

We learned an important lesson:

net.Conn ≠ *net.Conn

  • "net.Conn" is already an interface
  • Using "*net.Conn" causes errors

Challenges Faced

  • Handling empty messages
  • Debugging raw string issues (ASCII art)
  • Understanding blocking behavior
  • Managing client disconnects
  • Avoiding race conditions

Final Thoughts

This project goes beyond just building a chat app.

It teaches:

  • How real-time systems work
  • How servers handle multiple users
  • How to design safe concurrent programs

In many ways, this is a mini version of real-world systems like chat apps, multiplayer servers, and messaging platforms.

What’s Next?

Possible improvements:

  • Multiple chat rooms
  • Private messaging
  • Username changes
  • Persistent storage
  • Web interface

Conclusion

Building this TCP chat server helped me understand how powerful Go is for concurrent systems.

By combining:

  • TCP networking
  • Goroutines
  • Channels

we can build scalable, real-time applications with relatively simple code.

Thanks for reading!

Why You Should Start Using Negative If Statements in Your Code

2026-03-24 21:04:06

We've all been there: the code looks fine, the tests pass, but somehow bugs still make it to production. So what can you do to write more correct code and significantly reduce the number of bugs?

One technique I use regularly to prevent exactly these situations is writing negative if statements — also known as the Early Return pattern.

What Does That Actually Mean?

Instead of first checking the case where the action should happen, you check the invalid cases first and eliminate them as early as possible. This approach makes your code significantly more readable and focused.

For example, instead of writing this:

if (user.isLoggedIn && user.hasPermission) {
  performSensitiveAction();
}

It's better to use a negative check:

if (!user.isLoggedIn || !user.hasPermission) {
  // Handle the invalid situation
  // Make sure to log it
  // throw | return | continue
}

performSensitiveAction();

The happy path — the thing you actually want to do — sits at the bottom, unindented and obvious. Each guard at the top handles one specific failure case.

Nested If vs. Early Returns animation

Why Is This Better?

  • Readability: The code becomes clearer and more focused because edge cases are checked and dismissed upfront. You don't have to mentally track nested conditions to understand what the function actually does.
  • Safety: It's easier to spot bugs and prevent them from escaping, because critical conditions are checked explicitly and visibly at the top.
  • Maintainability: It's much easier to add new conditions or handle additional cases when your checks are clearly laid out at the start of the function.

Log the Failure While You're There

When you use negative if statements, you get a natural place to document why an action didn't succeed. This is the perfect spot to add detailed logs that help you debug the system both in real time and after the fact.

Here's a more complete example:

if (!user.isLoggedIn) {
  logger.warn(`Access attempt without login: ${request.ip}`, {
    userId: user.id,
  });
  return res.status(401).send("Please login first");
}

if (!user.hasPermission) {
  logger.warn(`Permission denied for user: ${user.id}`, {
    action: "performSensitiveAction",
    requiredPermission: "admin",
  });
  return res.status(403).send("Insufficient permissions");
}

performSensitiveAction();

Good logging at these guard points enables:

  • Security tracking — detecting unauthorized access attempts
  • Bug understanding — logs show exactly what happened when something went wrong
  • Better UX — you can identify where users get stuck
  • Faster response times — support teams can resolve issues faster with full context

Pair It with Proper Error Handling

Logging is part of the picture, but error handling matters too:

  • Use try/catch — especially for async operations or calls to external resources
  • Return clear error messages — so users know what happened and what to do
  • Monitor automatically — tools like Sentry or LogRocket track errors in real time so nothing slips by silently

Combining negative if statements with solid error handling makes your code not just more readable, but more resilient and reliable.

The Counter-Argument

Sometimes, writing too many early returns makes a long or complex function harder to follow. When a function spans many lines and has a dozen early exits, it can become difficult to track the overall flow.

So Which Is Better?

Like most things in code: balance matters.

For short, focused functions, negative if statements are almost always a win. For long, complex functions, it's sometimes better to keep a positive if structure — and invest instead in breaking the function into smaller pieces.

Since I started using this pattern consistently, the number of bugs reaching production dropped significantly, and code reviews became much smoother.

For a deeper dive into this idea, I highly recommend this video by CodeAesthetic:

Do you use early returns consistently? Are there cases where you intentionally avoid them? I'd love to hear your take in the comments.

OneCLI vs HashiCorp Vault: Why AI Agents Need a Different Approach

2026-03-24 21:00:25

OneCLI vs HashiCorp Vault: why AI agents need a different approach

HashiCorp Vault is one of the most respected tools in infrastructure security. It handles secrets rotation, dynamic credentials, encryption as a service, and access policies at massive scale. If you are running a traditional microservices architecture, Vault is a proven choice.

But AI agents are not traditional microservices. They introduce a fundamentally different trust model, and that changes the requirements for credential management.

This post explains why OneCLI exists alongside Vault - not as a replacement, but as a purpose-built layer for the specific problem of giving AI agents access to external services without exposing raw secrets.

The core problem with AI agents

When you deploy an AI agent (whether it is a LangChain pipeline, an AutoGPT instance, or a custom orchestration layer), you typically need it to call external APIs: OpenAI, Stripe, GitHub, Slack, databases, internal services. The standard approach is to pass API keys through environment variables or config files.

This creates a problem. The agent process has direct access to the raw credential. If the agent is compromised through prompt injection, a malicious plugin, or a supply chain attack on one of its dependencies, the attacker can exfiltrate every key the agent has access to.

Vault does not solve this by itself. Vault is a secret store - it hands the secret to the requesting process, and from that point the process holds the raw credential in memory. The threat model assumes the requesting process is trusted. AI agents, by their nature, run untrusted or semi-trusted code (LLM-generated tool calls, third-party plugins, user-provided prompts that influence execution).

How OneCLI takes a different approach

OneCLI never hands the raw credential to the agent. Instead, it acts as a transparent HTTPS proxy:

  1. The agent makes a normal HTTP request with a placeholder key.
  2. The request routes through OneCLI (via standard HTTPS_PROXY environment variable).
  3. OneCLI authenticates the agent using a Proxy-Authorization header (a scoped, low-privilege token).
  4. OneCLI matches the request's host and path to a stored credential.
  5. The real credential is decrypted from the vault (AES-256-GCM), injected into the request header, and the request is forwarded to the destination.

The agent never sees the real key. It is never in the agent's memory, never in its logs, never extractable through prompt injection.

Feature comparison

Capability HashiCorp Vault OneCLI
Primary purpose General secret management AI agent credential injection
Agent code changes required Yes - must integrate Vault SDK or API No - uses standard HTTPS_PROXY
Credential exposure to agent Yes - agent receives raw secret No - proxy injects at request time
Credential scoping Policy-based (path ACLs) Host/path pattern matching per credential
Dynamic secrets Yes (databases, cloud IAM, PKI) No (static credential injection)
Secret rotation Built-in Update in vault, agents unaffected
Encryption at rest Shamir/auto-unseal AES-256-GCM
Setup complexity High (cluster, unseal, policies, auth backends) Low (Docker Compose (gateway + PostgreSQL))
Self-hosted Yes Yes
Open source Yes (BSL since 1.14+) Yes (Apache 2.0)
Audit logging Yes Yes (all proxied requests)
Infrastructure overhead Consul/Raft cluster, HA setup Docker Compose (gateway + PostgreSQL)
Learning curve Steep (HCL policies, auth methods, secret engines) Minimal (add credentials, set proxy env var)
Language/framework support SDKs for major languages Any language (HTTP proxy is universal)
Enterprise features Namespaces, Sentinel, replication Cloud dashboard, team management
Price Free (OSS) / Paid (Enterprise) Free (OSS) / Paid (Cloud)

Where Vault excels

Vault is the better choice when you need:

  • Dynamic database credentials that are created on demand and automatically revoked.
  • PKI certificate issuance for service mesh or internal TLS.
  • Encryption as a service (transit secret engine) for application-level encryption without managing keys in app code.
  • Multi-datacenter secret replication across large infrastructure.
  • Compliance frameworks that specifically require Vault's audit and policy model.

These are capabilities OneCLI does not attempt to replicate. Vault is a general-purpose secret management platform; OneCLI is a focused tool for a specific use case.

Where OneCLI excels

OneCLI is the better choice when you need:

  • Zero-code credential management for AI agents. No SDK integration, no Vault API calls. Set an environment variable and the agent works.
  • Credential isolation from untrusted processes. The agent never holds the raw secret, which matters when the process runs LLM-generated code.
  • Fast setup for developer and small-team environments. Docker Compose with gateway and PostgreSQL, ready in minutes.
  • Host/path scoped credentials. Each credential is locked to specific API endpoints, so even if an agent's proxy token is compromised, it can only reach the services you have explicitly allowed.

Using Vault and OneCLI together

The strongest architecture for security-conscious teams combines both:

  1. Vault stores and rotates your master credentials, issues dynamic secrets, and manages your PKI.
  2. OneCLI pulls credentials from Vault (via planned integrations) and acts as the injection proxy for AI agents.

This gives you Vault's secret lifecycle management without exposing raw credentials to agent processes. Vault handles the "store and rotate" layer. OneCLI handles the "inject without exposing" layer.

This integration is on the OneCLI roadmap. Today, you can manually sync credentials from Vault into OneCLI's encrypted store. Native Vault backend support will allow OneCLI to fetch credentials directly from Vault at request time.

When to use what

Use Vault alone if you have no AI agents and need enterprise secret management for traditional services.

Use OneCLI alone if you are a small team running AI agents and want the simplest path to keeping credentials out of agent memory.

Use both together if you are running AI agents at scale and want Vault's secret lifecycle management combined with OneCLI's agent-specific credential isolation.

Summary

Vault and OneCLI solve different problems with some overlap. Vault is about storing and managing secrets across your infrastructure. OneCLI is about ensuring AI agents can use credentials without ever possessing them. The proxy-based injection model is what makes the difference - it is not a pattern Vault was designed for, and retrofitting it onto Vault would mean building most of what OneCLI already provides.

If you are giving API keys to AI agents today, the question is not whether to replace Vault. It is whether your agents should hold raw credentials at all.

Learn more at onecli.sh or read the docs.

How did it feel looking a old projects before AI code tools?

2026-03-24 21:00:00

I wrote an article while back to reflect on an old project I decide to do for fun. In my sophomore year in college, I decided to create my first personal project. It was a chatbot. I used ChatterBot, an open-source chatbot library. This project was my first time using Python, and I wanted to create a simple application that answers interview questions. I decided to return to my old chatbot project, which I completed years ago, and try to update it. I knew I had come very far from where I started.

This was before ChatGPT and the others. If I where to update the process now it would be very different. I have moved on years later with new interests. Was this chatbot project groundbreaking? No. I decided to take on learning about AI. In fact, this helped me heavily in my undergrad senior year during an AI course.It's a surreal experience looking at old projects and then think. How different will it look if I used AI to guide the process. It's an interesting thought to try and redo old projects, but now with vibe coding.

Have you used AI code tools to update an old project and what where the results?