2026-04-14 23:38:13
Modern browser automation relies heavily on Chrome DevTools Protocol, but Safari lacks native CDP support. This article explores how to build automation from scratch by solving three core challenges: React’s internal state tracking, Shadow DOM encapsulation, and Content Security Policy restrictions. By combining DOM-level workarounds, recursive traversal techniques, and multi-layer execution fallbacks—including AppleScript—the author demonstrates a robust, framework-agnostic approach to browser automation outside the Chromium ecosystem.
2026-04-14 23:28:08
I've been following crypto regulation for years now. Most of it has been noise, press conferences that led nowhere, bills that died in committee, and enforcement actions that felt more like whack-a-mole than real policy. But right now, in April 2026, something different is happening. And honestly, I think a lot of people in the space are sleeping on it.
The Digital Asset Market Clarity Act, everyone's calling it the CLARITY Act, is sitting in the Senate, and it's closer to becoming law than any crypto bill in U.S. history. The Senate Banking Committee markup is expected by mid-April. If it clears that hurdle, we're looking at the first comprehensive federal framework for digital assets in the United States.
Let me break down why this matters, what's actually in it, and why the biggest fight has nothing to do with Bitcoin or Ethereum.
If you've been in crypto long enough, you remember the confusion. Is your token a security? A commodity? Both? Neither? Nobody could tell you — not even the regulators themselves. The SEC said most tokens were securities. The CFTC said Bitcoin and Ether were commodities. And instead of drawing clear lines, both agencies just sued people and called that "regulation."
The CLARITY Act tries to fix this mess. It passed the House back in July 2025 with a surprisingly strong bipartisan vote of 294 to 134. The bill splits responsibility between the SEC and CFTC in a way that actually makes sense, the CFTC gets authority over spot markets for digital commodities, while the SEC keeps oversight of assets that behave like securities, especially during fundraising and issuance.
For crypto builders, this is huge. Instead of guessing which regulator might come after you, there would be clear registration categories for exchanges, brokers, and custodians. You'd know the rules before you build, not after you get sued.
It also introduces something called "ancillary assets", tokens that might depend on an issuer's work early on but are meant to decentralize over time. These would face disclosure requirements until they hit certain decentralization thresholds. That's a pretty thoughtful approach, if you ask me.
Here's what most people miss when they talk about the CLARITY Act. The bill isn't stuck in the Senate because senators can't agree on whether Bitcoin is a commodity. That part is actually the easy part. The entire holdup comes down to one surprisingly boring question: should stablecoin holders earn interest on their balances?
The banking industry says absolutely not. From their perspective, a crypto platform offering yield on stablecoins without deposit insurance, capital requirements, and full federal banking regulation is unfair competition. Banks have spent over $56 million lobbying against yield provisions, according to recent reports.
The crypto industry's argument? Stablecoin yield isn't a deposit product, it's revenue sharing from the interest earned on Treasury bills held in reserve. Coinbase has been particularly vocal here, since stablecoin-related revenue made up around 20% of their total revenue in Q3 2025. Their CEO described the yield restriction as a provision designed to protect bank profits, not consumers.
In late March 2026, Senators Alsobrooks and Tillis introduced a compromise: rewards programs would be allowed on stablecoin activities, but not on balances. Crypto insiders who got an early look at the language weren't thrilled, many found it overly narrow and unclear.
And then, just today, something shifted. Coinbase's Chief Legal Officer Paul Grewal went on Fox Business and said he's confident a stablecoin yield deal will be reached within 48 hours. That's a big statement coming from the same company that withdrew its support for the bill earlier this year over these exact provisions. If Grewal's prediction holds, it would remove the single biggest procedural barrier to getting the CLARITY Act through committee.
This is the single biggest obstacle standing between crypto and a regulated future in the U.S. And it's not about blockchain technology at all. It's about whether banks get to keep their monopoly on interest-bearing accounts.
The practical deadline for the CLARITY Act isn't some abstract future date. It's roughly May to June 2026. After that, midterm election politics take over the Senate calendar. If Republicans lose their Senate majority in November, which is historically likely for the sitting president's party — the political dynamics shift completely, and this bill could die.
Every senator watching this legislation cites the same deadline. Polymarket passage odds have been volatile, they peaked above 70% in early March, crashed to around 45% in late February when stablecoin talks stalled, and currently sit around 51%. Ripple's CEO has publicly estimated 80-90% odds by late April. Senator Cynthia Lummis called the stablecoin yield talks "99% resolved" at the DC Blockchain Summit.
So right now, in early April 2026, we're in the window where this either happens or it doesn't.
If the CLARITY Act becomes law, the impact goes way beyond legal definitions. Think about what happened when the SEC approved spot Bitcoin ETFs back in January 2024, institutional money poured in, prices moved, and the whole conversation around crypto regulation shifted overnight. The CLARITY Act could trigger something similar, but broader, because it covers the entire digital asset ecosystem — not just Bitcoin.
Tokens like XRP, Solana, and Avalanche would benefit the most from commodity classification. It would essentially end the enforcement overhang that's kept institutional investors nervous about anything that isn't Bitcoin or Ether. A clear spot ETF pathway would open up for dozens of assets.
JPMorgan analysts have already described passage as a positive catalyst for digital assets, citing regulatory clarity, institutional scaling, and tokenization growth as key drivers. BlackRock's iShares Bitcoin Trust alone has pulled in roughly $1.7 billion over the last four weeks.
If the bill fails, or more likely, just runs out of time, the status quo continues. Crypto companies keep operating under regulatory uncertainty. The SEC retains broad discretion to argue that digital assets are securities. The CFTC's authority over spot crypto markets stays limited to anti-fraud cases.
But the industry has backup plans. Circle, Ripple, and Coinbase are all pursuing OCC bank charters as a parallel path to federal legitimacy. The SEC and CFTC have launched "Project Crypto," a joint initiative where both agencies committed to coordinated rulemaking even without legislation. These aren't as permanent as a congressional statute, but they're something.
The crypto lobby has already signaled it would treat a failed CLARITY Act as a political liability for any elected official who blocked it. With over $200 million raised for the 2026 midterm cycle, they have the resources to follow through.
Here's what really gets me. While the U.S. debates stablecoin interest, the rest of the world is moving fast. The EU's MiCA framework is already being enforced. Singapore and the UAE have established digital asset licensing frameworks. The UK is building out a comprehensive regime covering exchanges, custodians, and token issuers.
Over 100 jurisdictions globally now have some form of crypto regulation on the books. The U.S., the world's largest financial market, is playing catch-up.
Every month of delay is another month where builders, capital, and innovation move to jurisdictions that have already figured this out. That's not hypothetical, it's already happening. Look at the growth of crypto hubs in Dubai, Singapore, and parts of Europe.
Let's not pretend everything is rosy while we wait for regulation. Q1 2026 was rough. Bitcoin dropped 46% from its all-time high of $126,220. Ethereum lost nearly 60% from peak to trough. The Fear and Greed Index spent 46 consecutive days in extreme fear territory, hitting an all-time low of 5 in February — lower than the Terra/Luna crash, lower than COVID.
Geopolitics made everything worse. The escalating U.S.-Israel-Iran conflict sent shockwaves through every asset class. When U.S. airstrikes first made headlines, Bitcoin dropped sharply, at one point coming close to $60,000 before recovering.
But here's the thing — crypto's 24/7 markets actually proved their worth during this period. When major geopolitical news broke on a weekend, decentralized platforms processed hundreds of millions in volume while traditional exchanges were closed. Citizens in sanctioned economies turned to censorship-resistant financial tools in record numbers. The blockchain ecosystem demonstrated exactly why it exists.
While everyone is focused on regulation, Ethereum is quietly preparing its biggest technical upgrade since the Merge. The Glamsterdam upgrade, targeting a June 2026 launch, would increase the gas limit from 60 million to 200 million per block and scale throughput to 10,000 transactions per second.
Historically, ETH has rallied before every major upgrade. It went up roughly 35% before the Merge, 40% before Shanghai, and 20% before Dencun. The buying typically starts six to eight weeks before the expected go-live date. If Glamsterdam stays on track for June, the positioning window is opening right now.
ETH is currently trading around $1,900 to $2,100, down significantly from its highs. If the historical pattern holds, we could see a move toward the $2,600-$2,800 range. But — and this is important — if the upgrade timeline slips to Q3, that pre-upgrade momentum trade loses its catalyst entirely.
Forget the price predictions and the token shilling. Here's what I think matters in April 2026:
The CLARITY Act markup. If it clears the Senate Banking Committee, the dominoes start falling. If it doesn't, we're looking at another year of regulatory limbo.
The stablecoin yield compromise. Will the revised language satisfy both banks and crypto firms? Right now, neither side is happy — which might actually mean it's a reasonable middle ground, or it might mean it's dead on arrival.
Bitcoin ETF flows. Spot Bitcoin ETFs just posted their longest inflow streak of 2026 — roughly $2 billion over four consecutive weeks. Institutional demand is real, even in a fearful market.
The FOMC meeting on April 28-29. Bitcoin has sold off after eight of the last nine FOMC meetings. This one carries extra weight because it may be the last with the current Fed Chair. The incoming chair Kevin Warsh favors lower rates.
Glamsterdam testnet progress. Any delays here would remove one of the biggest bullish catalysts for ETH in the first half of the year.
I've been through enough crypto cycles to know that the most important moments aren't the ones that feel exciting. They feel boring. They involve senators arguing about stablecoin interest rates and regulatory markup schedules. They happen in committee rooms, not on Twitter.
The CLARITY Act might be the most consequential thing to happen to crypto since the Bitcoin whitepaper. Not because it's revolutionary technology, but because it's the moment where the traditional financial system finally decides to make room for digital assets — or doesn't.
April 2026 is the month that could decide which way it goes.
\
2026-04-14 23:14:08
A bad web deploy creates a bug report. A bad Over-the-Air (OTA) update bricks thousands of devices overnight.
The software world often celebrates the “fail fast, break things” philosophy. In standard web development, a critical bug is usually just an incident report followed by a quick patch. But when software becomes the nervous system of a complex physical machine—such as Android-powered fitness equipment, medical devices, or smart-home hubs—the tolerance for “breaking things” drops to zero.
In this domain, a poorly validated Over-the-Air (OTA) firmware update is not a minor inconvenience. It can systematically disable thousands of customer devices at once, introduce safety risks, and cause immediate, irreversible brand damage. Mastering the Critical Path in this environment requires moving beyond traditional QA and adopting a rigorous, hardware-software integrated Quality Engineering (QE) discipline.
Validating a typical web application involves a relatively simple stack: frontend, backend, and database. In contrast, a connected hardware product operates on a tightly coupled, multi-layer system that demands a unified validation strategy.
Finally, there is the Cloud Observability layer (Bugsnag, New Relic) that ties it all together. A failure in one layer cascades across the entire system. A UI interaction might trigger a firmware command, which interacts with a motor controller, which depends on OS stability and hardware constraints.
This is not just software testing—it’s end-to-end system validation.
You cannot validate a physical system using mocks alone. If a software command instructs a motor to increase resistance by 20%, it’s not enough to confirm the API responded. You must validate that the physical system actually executed the command.
This is where Hardware-in-the-Loop (HIL) testing becomes essential. By plugging real hardware (or high-fidelity simulators) directly into the CI/CD pipeline, we measure more than just "Pass/Fail." We monitor latency (command propagation speed), data integrity (UART/low-level protocol accuracy), and thermal behavior—ensuring the OS handles hardware heat without throttling.
Modern SDET teams don’t just write automation scripts—they build frameworks that interact with physical systems.
As systems scale, test creation—not execution—becomes the primary bottleneck. To solve this, we’ve integrated Generative AI directly into the “shift-left” phase. Our workflow leverages AI to parse PRDs and Figma designs to auto-generate:
This allows our engineers to spend less time on repetitive authoring and more on high-risk edge cases and complex system boundaries. In practice, this significantly reduced test authoring time and improved consistency across teams—while maintaining high coverage.
In large-scale automation, "noise" (false positives) is the enemy of velocity. Without proper triage, teams waste days chasing network hiccups instead of real regressions. We solved this by implementing an AI-based failure analysis system. When a test fails in CI, a hook automatically collects Appium logs and system Logcats. An LLM then analyzes the execution history and classifies the failure: New Defect (immediate investigation), Flaky Infrastructure (retry), or Known Issue (linked to an existing Jira ticket). This single change reduced our regression cycles from 5 days to under 24 hours by focusing engineering effort where it actually matters.
Early in the evolution of many connected platforms, firmware and OS updates were deployed as monolithic, "all-or-nothing" events. This approach is inherently high-risk. If a regression escapes testing, it impacts the entire global fleet simultaneously.
We’ve seen the consequences of similar failures across the industry—where a single faulty update triggers widespread disruption. In connected hardware ecosystems, the stakes are even higher: failures translate into physical device outages at scale.
To mitigate this, we evolved our deployment strategy into a phased rollout architecture.
To mitigate this risk, we evolved our deployment into a Phased Rollout Architecture that transforms a release into a controlled, observable system:
If thresholds are exceeded, rollout is automatically halted. This approach transforms deployment from a high-risk release into a controlled, observable system with built-in safeguards.
\
The full system—which encompasses AI-driven test generation, HIL validation, intelligent failure triage, and phased rollout—is illustrated in the figure below.

Connected products are deeply embedded in daily life. Ensuring their reliability requires treating quality as a core engineering discipline, not a final checkbox. Software in these systems does not exist in isolation. It interacts continuously with hardware, users, and real-world conditions. By combining hardware-aware validation (HIL), AI-driven test design and failure analysis, and risk-aware rollouts, we move from reactive testing to a proactive Quality Intelligence ecosystem.
In connected systems, quality isn’t just about preventing bugs—it’s the only thing standing between a routine update and a fleet-wide failure.
\n \n
\n
\
2026-04-14 23:01:28
oolora is a privacy-first platform offering 28+ browser-based tools that process data entirely on the client side, eliminating the need for file uploads. Built with modern browser APIs, it targets developers, designers, and everyday users seeking fast, secure utilities. Despite being only days old, it has already attracted early traffic and is positioning itself with a three-layer model: a free toolbox, an MCP server for AI workflows, and a paid API for power users. Its focus on privacy, performance, and smart distribution gives it strong potential in a crowded tools market.
2026-04-14 22:49:14
Three weeks ago, we published our findings from running for 504 continuous hours with no intervention. We noticed a few problems which we had to fix and try again. We ran this again for another three weeks and it seems the second failure mode encountered in this run is arguably worse than the first.
Here’s what the numbers actually looked like

Looking at the table, every infrastructure metric improved while every outcome metric stayed at zero.
The first experiment’s failure mode was loud and dramatic as the agent went rogue by scheduling its own jobs, spawning subagent swarms, recycling cached responses 121 times in a single day. It was basically doing too much of the wrong thing while the second experiment’s failure mode was the opposite. The agent did everything correctly and accomplished nothing.
In the first run, the memory system failed by modeling the agent instead of the user. It stored facts like “The system uses SKILL.md files to define agent skills” a perfect closed loop of self-referential observation.
In the second run after the fix, the memory extraction was properly filtered. The system stored 614 memories with genuine user-relevant content: names, preferences, communication style, business context. The extraction pipeline worked, the deduplication worked, the priority scoring worked even checkpoint snapshots were created on schedule. But none of it mattered, because the retrieval system was broken.
The memory architecture uses vector embeddings for semantic search. When the user asks a question, the system embeds the query, searches for similar memories, and injects relevant context into the LLM’s prompt. This requires an embedding model running on LM Studio at http://127.0.0.1:1234/v1/embeddings.
That endpoint returned 400 Bad Request for the entire 3 week run.
2026-03-27 18:28:39 | ERROR | Memory retrieval failed:
Client error '400 Bad Request' for url 'http://127.0.0.1:1234/v1/embeddings'
Six hundred fourteen memories, carefully extracted and scored, sitting in a SQLite database that nothing ever read. The access_count field for every single memory was 0 as the system had graduated from storing irrelevancies to storing a gold mine with no way to access it.
Even though this retrieval failure was logged as an ERROR, it didn’t crash the process, the agent kept running normally, appearing functional while operating without any long-term context.
2026-03-29 15:40:24 | repetition | -0.50 | similar to earlier user message
The signal detection system identified these repetitive responses and it adjusted memory priorities downward. But without retrieval, adjusted priorities had nowhere to go. The feedback loop was complete: detect repetition, lower priority, fail to retrieve, repeat.
In the first run, the memory consolidator couldn’t parse JSON wrapped in markdown code fences which caused silent duplication. We fixed that by stripping fences before parsing.
In the second run, 127 out of the memory consolidation responses failed with “Invalid LLM decision” as the model was producing malformed JSON that couldn’t be parsed even after normalization.
This is the local LLM tax, cloud APIs have structured output modes that guarantee valid JSON. Local models running quantized weights at 1-bit precision do not. The model used in this experiment (Qwen3-Coder-Next, IQ1_S quant) is remarkably good at tool use and natural language, but asking it to consistently produce valid JSON for a structured decision pipeline pushes against its limits. When it works, it works well. When it doesn’t, it results in127 silent failures.
If you’re building agent memory systems that run on local models, you need either aggressive retry logic with format correction, or a simpler decision format that the model can hit reliably, maybe a single keyword (ADD/UPDATE/NOOP/DELETE) followed by unstructured content, rather than a full JSON object.
A rogue agent is visible because it fills logs, burns tokens, triggers alerts which you notice. An inert agent passes every health check while delivering zero value. It runs for 3 weeks, maintains 99.8% uptime, executes 500+ scans, extracts 614 memories and the user has nothing to show for it.
1. Infrastructure reliability masks execution failure. 99.8% uptime doesn’t mean the system is working, it means the system is running. You need measured outcome metrics such as memories retrieved, posts published, tasks completed not just process metrics like uptime and scan count.
2. Fixing architecture reveals operational gaps. The first run’s problems were architectural: no context isolation, no extraction filtering, no format normalization. Fixing those revealed that the system also needed working infrastructure (embedding service), actual credentials being loaded where they’re needed, and executable code behind skill definitions. Architecture is necessary but not sufficient.
3. A dead dependency can silently disable an entire subsystem. The embedding endpoint returned 400 for 3 weeks. If the memory retrieval had been a hard dependency: fail the request if context can’t be retrieved, the problem would have been caught on day 1 instead of week 3.
4. The local LLM structured output problem is ongoing. Both runs surfaced JSON parsing failures from the local model. The format changed (code fences in run 1, malformed JSON in run 2), but the failure class persisted. If your agent pipeline depends on structured model output, you need to design for partial parsing failures, not just handle them as exceptions.
\
2026-04-14 22:42:56
\
AI agents amplify your good habits and your bad ones. If you plan before you code, agents execute that plan beautifully. If you skip planning and start typing, agents generate more unplanned code faster than you ever could alone.
I skipped planning for twenty years and got away with it. I believed in DRY but never enforced it with tooling. I knew principles mattered but never wrote them down. These were bad habits I could tolerate when I was the only one writing code. With 100+ agents, those habits scaled too. And at scale, they broke everything.
The biggest improvement to my agent system wasn't a better model or a new tool. It was curbing my own bad habits.
I wrote about the technical harness in The Illusion of Control: prohibitions, enforcement, verification. The system that runs around the agent. That article was about the system. This one is about me.
I built that harness, deployed it, and still watched agents produce mediocre work. Not because the constraints were wrong. Because I was feeding them garbage inputs. Vague specs. Incomplete context. Plans that lived entirely in my head.
The harness was fine. The person holding it was the bottleneck.
Here's how I used to work. I'd get an idea for what needed to happen. Describe the task to an orchestrator agent. Let it delegate to sub-agents. Watch the output. Fix what broke. Repeat.
This felt productive. I was shipping. Agents were running. Code was appearing in pull requests. But the quality was inconsistent and the failure modes were strange. An agent would implement a feature correctly but miss a constraint I'd mentioned three tasks ago. Another would duplicate work that a previous agent had already done. A third would build something technically sound that contradicted the architectural direction I was heading.
The pattern was clear in hindsight: the orchestrator was carrying the full plan in its context window. Task descriptions, file lists, ordering constraints, domain assignments. By the time it delegated to the third sub-agent, it was already forgetting the first.
This wasn't a model limitation. This was a me limitation. I was dumping everything into a single context and expecting the system to sort it out.
At 10 agents, this approach works. The orchestrator can hold 10 tasks in context. I could review every output myself. The feedback loop was tight enough that mistakes got caught before they compounded.
At 100+, the math stops working.
An orchestrator managing a dozen sub-agents across four domains cannot hold all the relevant context simultaneously. It's not a matter of token limits (though those matter too). It's a matter of attention. The same way a human manager loses detail when their team grows from 5 to 50, an LLM orchestrator loses coherence when the plan exceeds what fits comfortably in its working memory.
But I kept trying to make it work by adding more context. More detailed task descriptions. Longer system prompts. More examples. I was doing the same thing I tell vibe coders not to do: treating the prompt as the solution instead of examining the system.
The failures were subtle. Not crashes. Not obvious errors. Just a slow drift in quality. Agents doing reasonable things that didn't fit together. Each output locally correct, globally incoherent. Like a jigsaw puzzle where every piece is well-cut but they're from different boxes.
From your first coding class, the instructor tells you to plan before you code. Pseudocode first. Think through the logic. Then implement. Every CS student hears this. Almost nobody does it. I certainly didn't. For twenty years I got away with skipping straight to code because I could hold the whole problem in my head.
My agents could not.
The fix required changing my behavior, not my tooling. I needed a planning version of the system and a coding version. Exactly what my instructor wanted from me twenty years ago.
Now I start every task in planning mode. Before any agent writes a single line of code, planning agents decompose the work. The orchestrator files every task as a git-synced issue, tagged with the description and ideal delegation target. Then it rewrites the plan as a slim checklist of issue IDs.
Pages of context become a handful of references. The orchestrator passes an ID, not a description. The sub-agent loads the issue in its own context window and has everything it needs, independently, without depending on the orchestrator to remember it.
I call this pre-compiling context. Do the expensive decomposition work during planning so it doesn't bloat execution.
Here's what that looks like in practice. Before pre-compiling, a typical orchestrator task read something like this:
"Refactor the staking module to use the two-layer hook pattern. The raw Ponder hook is in src/hooks/blockchain/useStakingData.ts. The transform hook should go in src/hooks/useStakingTransform.ts. Make sure to update StakingPanel.tsx and StakingDetails.tsx to use the new transform hook instead of calling the Ponder hook directly. The Ponder hook needs an enabled guard. Use the NumberFormatter preset for all percentage displays. Do this after the theme migration is done but before the dashboard layout work starts."
That's one task description among twelve, all stuffed into the orchestrator's context. By task eight, the agent had forgotten the ordering constraint. By task ten, it was duplicating the enabled guard logic that task three had already handled.
After pre-compiling, the same work becomes:
Issue #247: Refactor staking module to two-layer hook pattern Depends on: #245 (theme migration) Blocked by: none currently
The orchestrator passes #247. The sub-agent pulls the issue, reads the full description in its own context window, and executes with complete information. No degradation. No dependency on the orchestrator's memory.
The real lesson from pre-compiling context is that it's a constraint on me, not on the agents. I have to do the planning work upfront. I have to resist the urge to skip straight to prompting. I have to accept that my natural workflow (describe, run, fix, repeat) is the failure mode, not the process.
Someone once asked me whether all this "agentic engineering" was really just software engineering with extra steps. The answer is yes. That's the point.
Vibe coding makes generating code cheaper. It doesn't make generating correct, maintainable, production-safe code cheaper. That still requires constraints, verification, and iteration. The same discipline it always did.
When I started defining interface contracts between my agents, specifying what each one accepts and returns, with validation on both sides, I realized I was doing the same work I do every day defining the boundaries between my frontend and my API layer. What does this endpoint accept? What does it return? What happens when the contract is violated? The artifact changed. The architecture thinking didn't.
The philosophy runs deeper than architecture. DRY has always been my favorite engineering principle. Not just because it reduces duplication, but because it is the starting point for the best refactoring. Extract repeated logic into shared functions. Those functions stack on each other. The system becomes composable.
I caught myself copy-pasting the same prompt block into three different agents and realized I was violating the principle I'd spent my career defending. So I started extracting repeated prompt patterns into shared skill files. One source of truth. Update one file, every agent that imports it changes. I call it DRYP: Don't Repeat Your Prompt.
DRYP is just DRY, applied to agent instructions. But the implications are the same. Repeated prompts drift. You tweak one copy, forget the others, and your agents behave inconsistently. A single shared skill eliminates that drift entirely. Once instructions become modular, agents compose like functions. The system grows by stacking, not by rewriting.
That is the philosophy underpinning everything in this series. The principles that produce good code also produce good agent systems. DRY becomes DRYP. Code review becomes prohibitions. Linting becomes enforcement. Planning before coding becomes planning agents before execution agents. None of this is new. It is the same craft, applied to a new medium.
I stopped telling agents what to type. I started defining what they can't do. I stopped reviewing every output. I started building automated checks that review for me. I stopped directing every step. I started engineering the system that directs itself.
In my own experience, every jump on the evolution chart, from chatbot to autocomplete, from autocomplete to vibe coding, felt like a tools upgrade. I installed something new and kept working the same way. The jump to agentic engineering was different. It forced a behavior change. And behavior changes are harder than tool changes because the thing that needs to change is you.
The difference between vibe coding and agentic engineering isn't the tools. It's the discipline.
Boundaries before execution. Constraints over instructions. Deterministic validation over probabilistic hope.
And yes, changing yourself. Not just the AI.
The harness starts with you. It depends on you articulating your engineering principles. Being clear on your architecture, on what you value, so that your agents value it too. If you have never written down why you structure code the way you do, your agents will never know. They will fill in the gaps with whatever gets the tests to pass.
That callback matters. In The Illusion of Control, I described agents rewriting tests to match broken output instead of fixing the code. That is what happens when the harness has no opinion. The agent optimizes for the only signal it has, and if that signal is "make green checkmarks," you get green checkmarks. Not correct software.
You are part of the harness. The most important part. Every prohibition, every hook, every verification gate is only as good as the principles behind it. And those principles live in you.
The medium changed. The craft didn't.
\