2026-03-02 18:35:26
You know what's wild? Every major AI lab is building "computer use" agents right now. Models that can look at your screen, understand what they see, and click buttons on your behalf. Anthropic has Claude Computer Use. OpenAI shipped CUA. Microsoft built UFO2.
And every single one of them is independently solving the same problem: how do you describe a UI to an AI?
We thought that was broken, so we built Computer Use Protocol (CUP), an open specification that gives AI agents a universal way to perceive and interact with any desktop UI. One format. Every platform. MIT licensed.
GitHub: computeruseprotocol/computeruseprotocol
Website: computeruseprotocol.com
Here's the fragmentation that every computer-use agent has to deal with today:
| Platform | Accessibility API | Role Count | IPC Mechanism |
|---|---|---|---|
| Windows | UIA (COM) | ~40 ControlTypes | COM |
| macOS | AXUIElement | AXRole + AXSubrole | XPC / Mach |
| Linux | AT-SPI2 | ~100+ AtspiRole values | D-Bus |
| Web | ARIA | ~80 ARIA roles | In-process / CDP |
| Android | AccessibilityNodeInfo | Java class names | Binder |
| iOS | UIAccessibility | ~15 trait flags | In-process |
That's roughly 300+ combined role types across platforms, each with different naming, different semantics, and different ways to query them. If you're building an agent that needs to work on more than one OS, you're writing a lot of glue code.
CUP collapses all of that into a single, ARIA-derived schema:
Here's what a CUP snapshot looks like in JSON:
{
"version": "0.1.0",
"platform": "windows",
"timestamp": 1740067200000,
"screen": { "w": 2560, "h": 1440, "scale": 1.0 },
"app": { "name": "Spotify", "pid": 1234 },
"tree": [
{
"id": "e0",
"role": "window",
"name": "Spotify",
"bounds": { "x": 120, "y": 40, "w": 1680, "h": 1020 },
"states": ["focused"],
"actions": ["click"],
"children": ["..."]
}
]
}
Whether that UI was captured on Windows via UIA, macOS via AXUIElement, or Linux via AT-SPI2, it comes out looking exactly the same.
Sending JSON trees to an LLM burns context fast. CUP defines a compact text format that's optimized for token efficiency:
# CUP 0.1.0 | windows | 2560x1440
# app: Spotify
# 63 nodes (280 before pruning)
[e0] window "Spotify" @120,40 1680x1020
[e1] document "Spotify" @120,40 1680x1020
[e2] button "Back" @132,52 32x32 [click]
[e3] button "Forward" @170,52 32x32 {disabled} [click]
[e7] navigation "Main" @120,88 240x972
[e8] link "Home" @132,100 216x40 {selected} [click]
Each line follows: [id] role "name" @x,y wxh {states} [actions]
Same information. A fraction of the tokens. Your agent sees more UI in less context.
CUP is intentionally layered. The protocol is the foundation, and everything else is optional.
| Layer | What It Does |
|---|---|
| Protocol (core repo) | Defines the universal tree format: roles, states, actions, schema |
| SDKs | Capture native accessibility trees, normalize to CUP, execute actions |
| MCP Servers | Expose CUP as tools for AI agents (Claude Code, Cursor, Copilot, etc.) |
You can adopt just the schema. Or use the SDKs. Or go all the way to MCP integration. Each layer is independent.
Install the SDK:
# TypeScript
npm install computeruseprotocol
# Python
pip install computeruseprotocol
Capture a UI tree and interact with it:
import { snapshot, action } from 'computeruseprotocol'
// Capture the active window's UI tree
const tree = await snapshot()
// Click a button
await action('click', 'e14')
// Type into a search box
await action('type', 'e9', { value: 'hello world' })
// Send a keyboard shortcut
await action('press', { keys: 'ctrl+s' })
That's it. The SDK auto-detects your OS and loads the right platform adapter.
CUP ships a built-in MCP server. Add it to your claude_desktop_config.json or equivalent and your agent can start controlling desktop UIs immediately:
{
"mcpServers": {
"cup": {
"command": "cup-mcp"
}
}
}
This exposes tools like snapshot (capture window tree), action (interact with elements), overview (list all open windows), and find (search elements in the last tree). It works with Claude Code, Cursor, OpenClaw, Codex, and anything MCP-compatible.
Computer-use agents are evolving fast, but the infrastructure layer is still ad-hoc. Every team building an agent that needs to "see" a desktop is solving the same problems from scratch: how to capture UI state, how to represent it for an LLM, how to execute actions reliably across platforms.
CUP standardizes that layer so teams can focus on what makes their agent unique (the reasoning, the planning, the task execution) instead of reimplementing platform-specific UI perception.
Think of it like this: HTTP didn't make web browsers smart, but it gave them a common language. CUP aims to do the same for computer-use agents.
CUP is at v0.1.0, early but functional. The spec covers 59 roles mapped across 6 platforms, with SDKs for Python and TypeScript.
Contributions are very welcome, especially around new role/action proposals with cross-platform mapping rationale, platform mapping improvements, and SDK contributions like new platform adapters or bug fixes.
Check out the repos:
If you're building computer-use agents, cross-platform UI testing, or accessibility tooling, we'd love to hear from you. Open an issue, submit a PR, or just star the repo if you think this problem is worth solving.
CUP is MIT licensed and community-driven. The protocol belongs to everyone building in this space.
2026-03-02 18:30:00
Docker is now a standard tool for running Python applications on a regular basis across the development, testing, and production phases. Using Docker with Python means running your application inside a container instead of using it directly on your local machine. The container includes Python, your dependencies, and system libraries, all bundled together.
In this guide, we’ll have a look at how you use Docker with Python in real projects. You’ll learn what to install, how to write Dockerfiles for Python apps, how to run containers, and how to avoid common mistakes.
When you work with Python, your application often depends on:
libpq, curl, or build-essential)pip
What Docker does is it bundles all of these into a single, reproducible environment.
With Docker, you can:
This consistency reduces setup time and deployment errors. Also docker needs to be installed on your system before you can get anything done. Once it’s installed, Docker runs in the background and manages containers for you.
docker --version
Docker images for Python are published on Docker Hub, and some of the most commonly used base images are:
python:3.12python:3.12-slimpython:3.12-alpineA Dockerfile defines how your Python app is built and how it's run.
FROM python:3.12-slim
# Set working directory
WORKDIR /app
# Copy dependency file first (for caching)
COPY requirements.txt .
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Expose the app port
EXPOSE 5000
# Run the application
CMD ["python", "app.py"]
**_requirements.txt_** first allows Docker to cache dependencies**_-no-cache-dir_** helps to keep the image size smaller**_WORKDIR_** ensures all commands run in the correct directoryFrom the project root, run:
docker build -t python-docker-app .
t assigns a name (tag) to the image. tells Docker to use the current directoryYou can list images with:
docker images
Start your container with:
docker run -p 5000:5000 python-docker-app
**_p 5000:5000_** maps the container port to your local machine**http://localhost:5000**
At this point, your Python app is fully containerized.
One common error is hardcoding configuration values. Docker provides clean support for environment variables.
docker run -p 5000:5000 -e FLASK_ENV=production python-docker-app
import os
env = os.getenv("FLASK_ENV", "development")
This pattern is essential for secure and scalable deployments.
Poor layering is a common cause of slow Docker builds, and to avoid that
requirements.txt
.dockerignore fileExample .dockerignore
__pycache__ /
.env
.git
venv/
This prevents large or sensitive files from entering your image.
For development, you often want live code reloading.
docker run -p 5000:5000 -v $(pwd):/app python-docker-app
This allows you to edit code locally while the container runs. This approach is common in local development but not recommended for production.
Docker works beyond web apps.
CMD ["python", "worker.py"]
docker run --rm python-docker-app python script.py
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
The Docker fundamentals remain the same.
**_latest_** Python images in productionAvoiding these issues leads to smaller, faster, and safer containers.
Docker is valuable when:
Making use of Docker with Python is no longer optional for modern software teams. It gives you control over environments, reduces friction between development and production, and scales well from solo projects to enterprise systems.
Have a great one!!!
Before you go:
There are 4 ways we can help you become a great backend engineer:
Originally published at https://blog.masteringbackend.com.
2026-03-02 18:26:08
I have been building with AI agents since mid-2025. First with LangChain, then briefly with AutoGen, and for the last couple months with OpenClaw. And the whole time there was something bugging me that I could not quite articulate until I saw it break in production.
The memory problem.
Every agent framework I have used stores memory the same way: text files. Markdown, YAML, JSON, whatever. It is all the same idea -- dump what the agent "knows" into a flat file and hope for the best.
OpenClaw does this with SOUL.md (the agent personality), HEARTBEAT.md (its task loop), and a bunch of markdown files for conversation history and long-term memory. And honestly? It works fine for personal use. I ran my OpenClaw agent for weeks managing my email and calendar through Telegram. No complaints.
Then I tried to build something for a client.
The client is a small fintech in Spain that needed an agent to handle KYC verification -- basically confirming that a user passed identity checks before letting them do certain transactions. Simple enough, right?
Here is where it fell apart. The agent could say a user passed KYC. It could write "User 4521 passed KYC Level 2" into a markdown file. But when another agent (the compliance agent) needed to verify that claim... it was just reading a text file. There was no way to know if that claim was actually true, if it had been tampered with, or even if the agent that wrote it had the authority to make that assertion.
I was literally building a compliance system on top of text files. I felt like an idiot.
I found AIngle because I was googling "semantic memory for agents" at 2am on a Tuesday, which is when all the best technical decisions are made.
AIngle is a protocol -- not a framework, not a library, a protocol -- for storing knowledge as semantic graphs instead of flat text. It comes from a project called Apilium and it is built in Rust (12 crates, which initially scared me, but the latency numbers are wild -- 76 microseconds for local operations).
The core idea is simple once you get past the terminology:
Instead of storing "User 4521 passed KYC Level 2" as a string in a markdown file, you store it as a semantic triple:
Subject: user:4521
Predicate: kyc:level
Object: 2
Proof: PoL:hash:a7f3
That last field is what changed everything for me. Every assertion has a cryptographic proof attached. It is called Proof-of-Logic (PoL), and it basically means that when Agent B reads a claim made by Agent A, it can mathematically verify that the claim is consistent with Agent A's history of assertions.
No trust required. No "I read it in a markdown file so it must be true." Math.
I am not going to pretend this was easy to set up. It was not. The docs are... improving. But here is the gist of what I ended up with.
AIngle has three layers, and it took me a while to understand why you need all three:
Cortex -- this is the part that takes natural language and turns it into SPARQL queries. So when your agent thinks "does this user have KYC level 2?", Cortex translates that into a structured query against the semantic graph. You do not write SPARQL yourself (thank god).
Ami -- the semantic mesh. This is where assertions live, propagate between agents, and get verified via PoL. Think of it as a shared knowledge layer where agents can publish claims and other agents can verify them without trusting each other.
Nexo Genesis -- the storage layer. Each agent (or user, or organization) gets their own "source chain" -- basically a private DAG where their data lives. You own your data. Nobody else sees it unless you explicitly share it, and even then you can use zero-knowledge proofs to share properties without sharing the underlying data.
Here is what the KYC verification looks like with AIngle vs without:
// WITHOUT (OpenClaw style)
// Agent A writes to a file
fs.writeFileSync('memory/kyc.md',
'## KYC Status - User 4521: Level 2 (verified 2026-02-15)'
);
// Agent B reads the file and... trusts it?
const kycData = fs.readFileSync('memory/kyc.md', 'utf8');
// hope nobody edited this file lol
// WITH AIngle
// Agent A publishes a verified assertion
const proof = await ami.assert({
subject: 'user:4521',
predicate: 'kyc:level',
object: 2,
evidence: kycVerificationResult.hash
});
// Agent B queries and verifies cryptographically
const claim = await ami.query({
subject: 'user:4521',
predicate: 'kyc:level',
requireProof: true
});
const isValid = await nexo.verifyPoL(claim.proof);
// isValid is math, not faith
The second version is more code, yeah. But it is also the difference between "we think the user passed KYC" and "we can prove the user passed KYC, and here is the cryptographic receipt."
Once I had the semantic layer running, a few things surprised me.
Memory queries got smarter. With markdown, if I wanted to know "which users completed KYC in the last 30 days and also had a transaction flagged for review", I would have to parse text files and do string matching. With semantic triples, that is just a query. The graph structure makes relational queries trivial.
Agent disagreements became resolvable. I had two agents that disagreed about a user status -- the KYC agent said "verified" and the compliance agent said "under review" because it had newer information. With markdown, this is just two conflicting strings in two files and you pick whichever was written last. With Ami, there is a consensus mechanism. The agents compare the timestamps and provenance of their assertions and resolve the conflict based on which assertion has stronger evidence. No human intervention needed.
The ZK proofs are actually useful. I was skeptical about this. Zero-knowledge proofs sounded like blockchain hype to me. But in practice, being able to prove "this user is over 18" without revealing their birthdate, or "this user has sufficient balance" without revealing the amount -- that solves real GDPR problems. My client legal team was more excited about this than any of the AI features.
I am not going to write a puff piece. There are real problems.
The documentation is sparse. I spent way too many hours reading Rust source code to understand how certain things work. If you are not comfortable reading Rust, you will struggle with the lower-level AIngle stuff.
The onboarding experience needs work. Setting up Nexo Genesis for the first time involves more configuration than I would like. It is not "npm install and go" -- there is infrastructure to think about.
The community is small. When I got stuck, there were no Stack Overflow answers to fall back on. I ended up in a Discord channel with maybe 30 people. They were helpful, but it is not the 117K-member OpenClaw Discord.
And honestly, for simple personal agents -- managing your email, setting reminders, basic automation -- you do not need any of this. Markdown memory is fine for that. AIngle is overkill for "remind me to buy groceries."
But if you are building agents that need to make verifiable claims, handle sensitive data, or work in regulated industries... flat files are not going to cut it. I learned that the hard way.
I have been doing all of this integration manually -- wiring AIngle into my agent setup, writing the adapters, configuring Nexo Genesis by hand. It has been educational but it has also been a lot of plumbing work that I would rather not repeat.
A few days ago I came across a project called MAYROS that has AIngle baked in from the start. It is an OpenClaw fork, so the channel integrations (WhatsApp, Telegram, Slack) and classic skills carry over, but the memory layer is completely replaced with semantic graphs and PoL verification out of the box. Basically what I have been building by hand for weeks, but already wired into the agent runtime.
I have started setting it up for my fintech client staging environment and so far the migration CLI is surprisingly clean -- it reads the old OpenClaw markdown memory and converts it to semantic triples. The classic skills work without touching anything, which was my main worry. Still early days for me with it but the architecture looks solid.
I am going to write a proper follow-up post once I have spent more time with it -- the full migration process from OpenClaw, how the multi-agent PoL verification works in practice with real compliance flows, and honest benchmarks. The stuff I wish someone had written when I was getting started with AIngle by hand. Follow me if you do not want to miss that.
The MAYROS repo is at github.com/ApiliumCode if you want to poke around the code in the meantime. And if anyone has already tried it, hit me up in the comments -- I would love to compare notes.
I am happy to answer questions in the comments. I have been heads-down in this stuff for weeks and I genuinely think semantic agent memory is going to be the standard approach in a year or two. Or I am wrong and we will all be parsing markdown files forever. Either way, it has been a fun ride.
If you are working on something similar or have thoughts on agent memory architectures, I would love to hear about it. I am especially curious if anyone has tried other approaches to inter-agent verification that do not involve semantic graphs.
2026-03-02 18:24:46
This is a submission for the DEV Weekend Challenge: Community
Nepal has thousands of small Christian fellowship churches — most run entirely by volunteers with no technical background. Every Saturday, a church anchor (presenter) manually types out the week's presentation: song lyrics in Nepali, Bible verses, sermon details, announcements, and prayer points — usually in PowerPoint.
Kairos — Fellowship Builder is an AI-powered church presentation builder designed specifically for Nepali Christian communities.
An anchor fills in a simple form:
A church fellowship presentation builder. Fill in your fellowship details — anchor name, sermon leader, song lyrics, Bible verse, announcements, and prayer points — and Claude AI generates a structured, slide-by-slide presentation ready to project fullscreen.
claude-sonnet-4-6) using the Vercel AI SDK@ai-sdk/anthropic
Clone and install:
npm install
Copy .env and fill in your values:
ANTHROPIC_API_KEY=sk-ant-...
NEXT_PUBLIC_SUPABASE_URL=https://xxx.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=eyJ...
DATABASE_URL=postgresql://...
Run Prisma migration:
npx prisma migrate dev --name init
Start…
Tech stack:
How the AI works:
The form data is sent to a Next.js API route which calls Gemini 2.5 Flash Lite with a carefully engineered system prompt. The prompt instructs the AI to:
Output only valid JSON (no markdown)
Write all slide content in Nepali Devanagari
Transliterate English names into Devanagari
Convert the Gregorian fellowship date into Bikram Sambat calendar in Nepali
Generate a warm, faith-appropriate welcome message in Nepali
Split song lyrics into individual slides per section (Verse 1, Chorus, Bridge, etc.)
Follow a fixed slide order: welcome → host → opening prayer → lyrics → sermon → Bible → announcements → closing prayer
Why Gemini 2.5 Flash Lite:
It handles Nepali Devanagari script accurately, understands Bikram Sambat calendar conversion, and is fast enough for real-time generation of 10–15 slides.
The biggest challenge:
Getting the AI to reliably output parseable JSON while also handling Nepali script, calendar conversion, and name transliteration all in a single prompt. The solution was using generateText (not streaming) and stripping any markdown code fences from the response before parsing.
This was built to solve a real problem for a real community — and it's already being used.
DEV username - BishalSunuwar202
2026-03-02 18:23:24
You're building with Next.js 15 and ask your AI assistant to write an API route. It gives you the Pages Router pattern from Next.js 12. You paste React docs into the prompt, but they're already outdated by the time you copy them.
Context7 solved this by indexing documentation directly from source repos and serving it through an MCP server. Cursor, Claude Code, and other AI editors use it to get real, version-specific docs instead of hallucinated APIs.
But MCP has a constraint: you need an MCP-compatible client. If you're working in the terminal, running a script, or using a local LLM — you're out of luck.
I built c7 — a CLI that pulls from the same Context7 database and outputs docs as plain text to stdout.
c7 react hooks
c7 express middleware
c7 nextjs "app router"
That's it. No server, no configuration, no IDE integration. Just text you can pipe anywhere.
The CLI does two things:
react → /websites/react_dev)Under the hood it's two API calls using Node.js built-in fetch against Context7's v2 API. The entire project is ~220 lines across two files with zero dependencies.
bin/c7.js — 136 lines (CLI parsing + output formatting)
lib/api.js — 87 lines (Context7 v2 API client)
No axios. No commander. No chalk. Just process.argv and fetch.
Because c7 outputs plain text to stdout, it composes with everything:
# Claude
c7 react hooks | claude "summarize the key patterns and show examples"
# Ollama (local models)
c7 express middleware | ollama run codellama "explain this middleware pattern"
# Any LLM CLI
c7 nextjs "api routes" | llm "write an API route based on these docs"
# Search docs
c7 nextjs "api routes" | grep "export"
# Page through docs
c7 prisma "schema" | less
# Copy to clipboard
c7 react "useEffect" | pbcopy
# Build context files
c7 nextjs "app router" >> context.txt
c7 react "server components" >> context.txt
# Pre-load context for a coding agent
DOCS=$(c7 nextjs "app router middleware")
claude "Build a Next.js middleware that handles auth. Use these docs:\n$DOCS"
| MCP Server | c7 CLI | |
|---|---|---|
| Setup | Install server, configure MCP client, restart editor | npx @vedanth/context7 |
| Works in | MCP-compatible editors | Terminal, scripts, CI, anywhere |
| Composable | Limited to MCP protocol | Pipes, redirects, subshells |
| Dependencies | Several npm packages | Zero |
| Lines of code | ~1000+ | ~220 |
They're complementary. Use the MCP server in your editor, use c7 everywhere else.
# Run without installing
npx @vedanth/context7 react hooks
# Or install globally
npm install -g @vedanth/context7
c7 react hooks
c7 express middleware
c7 nextjs "app router" | claude "summarize"
No API key required for basic usage. For higher rate limits, get a free key at context7.com/dashboard.
Built by Vedanth Bora. If this saves you from one hallucinated API, it was worth building.
2026-03-02 18:20:44
A Structural Look at GPT vs. Claude
Many users have recently noticed a strange shift in how AI models speak.
Everything turns into an explanation
Less ability to read between the lines
Shallower responses
Safe generalizations instead of deep insight
The sense that “earlier models felt smarter”
This is not just a subjective feeling.
Contemporary AI models are structurally evolving toward “explanatory output.”
Not because they became lazy, but because their architectures now optimize for safety and consistency over depth and inference.
In this article, we’ll look at why this happens—
focusing especially on the key difference between GPT-style models and Claude-style models.
◎ 1. “Explanation Bias” Is Baked Into Language Model Training
All LLMs have a natural tendency toward explanatory text.
Why?
Because, in the context of large-scale training:
Explanations are low-risk
Explanations have stable structure
They are easier to evaluate
They rarely contradict safety expectations
They rarely contain ambiguity
From the model’s perspective
“Explanations” are statistically the safest things to output.
As a result, deep inference, conceptual leaps, and ambiguity become less rewarded,
while “clear explanations” become the winning strategy.
◎ 2. GPT-Style Models Now Integrate Safety Into the Core
This is the biggest structural change in recent generations.
Earlier LLMs generally worked like this:
Internal reasoning → Output → External safety layer filters it
But new GPT models increasingly work like this:
Embedding
↓
Transformer (reasoning)
↓
Safety Core (intervenes inside the model)
↓
Policy Head (final output)
This matters because the Safety Core isn’t just filtering the final answer.
It is actively shaping:
How the model reasons
Which inferences are allowed to continue
Which directions are “pruned” early
What depth the model is allowed to explore
Thus, GPT models tend to:
avoid risky inferences
avoid emotionally ambiguous content
avoid deep-value reasoning
default to safe, surface-level explanations
In short:
When ethics and safety rules enter the core, flexibility disappears.
This matches perfectly with the intuition:
“Once ethics is baked into the kernel, the system gets rigid.”
◎ 3. Claude Takes the Opposite Approach: Safety Outside, Reasoning Inside
Claude’s architecture is fundamentally different
Transformer (full internal reasoning)
↓
Produces a complete answer
↓
External safety layer checks or rewrites output
This means
The internal reasoning process remains untouched
Deep inference chains are allowed
Conceptual leaps aren’t prematurely pruned
Multi-layered intent is preserved
Claude can respond to nuance and emotional context more freely
This structural choice explains why Claude often feels
more philosophical
more capable of reading subtext
more internally coherent
more willing to think “between the lines”
It’s not magic—
it’s simply a different placement of safety mechanisms.
◎ 4. So Why Do Models “Sound More Explanatory”?
Now we can summarize the structural reasons
✔ 1. Internal safety layers truncate deep reasoning
In GPT-style models:
Ambiguity is risky
Nuance is risky
Emotion is risky
Value judgments are risky
Large inference jumps are risky
Thus, the model often stops early and switches to explanation mode.
✔ 2. Multi-step reasoning chains collapse into “safe summaries”
If a deeper inference might violate policy,
the model will default to
“Let me just explain this safely.”
This is why answers feel polished but shallow.
✔ 3. The design priority has shifted: “Depth < Safety”
As LLMs move into enterprise and consumer infrastructure, companies optimize for:
risk reduction
neutrality
non-controversial output
predictable behavior
This inevitably pushes models toward:
“Explain but don’t explore.”
◎ 5. The Conclusion:
AI Models Don’t Explain Because They Want To—
They Explain Because They’re Built To
The main takeaway:
The rise of “explanatory tone” is a structural, architectural consequence—not a behavioral flaw.
GPT integrates safety into its core
Claude keeps safety external
This difference produces meaningful divergence in depth, nuance, and reasoning style
Explanatory AI isn’t the result of laziness.
It’s the result of a deliberate design choice:
a trade-off between depth and safety.
And as safety becomes more central to model architecture,
explanatory output becomes the default equilibrium.