2025-04-23 19:56:28
💀 Ugh, merge conflicts: That sinking feeling when Git screams at you? We've all been there. Manually fixing those tangled messes? It's giving... tedious. It's giving... waste of my precious time. 😩
🚀 Enter rizzler: Your new AI bestie that actually *gets* Git. This ain't your grandpa's merge tool. rizzler
slides into your Git workflow and uses ✨ AI magic ✨ (think OpenAI, Claude, Gemini, Bedrock - the whole squad) to automatically resolve those annoying merge conflicts. Less time untangling, more time coding (or scrolling). You're welcome. 😉
Basically, it turns this:
<<<<<<< HEAD
const message = "Hello from main branch!";
=======
const message = "Waddup from feature branch!";
>>>>>>> feature-branch
Into actual, usable code, letting you get back to the important stuff. ✨
Ready to ditch the conflict drama? Let's get you set up. Head on over to the repository below for the source code, installation and configuration instructions:
The rizzler is a low-level merge driver that can be used as a command-line tool without Git and can be configured into Git itself as a resolver strategy. It queries the LLMs you have configured (there's a bundle of strategies within, and I'd love for folks to send in more)
If a file hits eight merge conflicts and can't crack one, it'll tackle the rest and send an "oops" back to Git, halting the merge party. That's your cue to dust off your favourite editor and resolve that stubborn conflict manually—throwback style, like it's 2005. On the bright side, successful fixes get cached on disk, cutting down on LLM costs and giving you a productivity boost—no more twiddling thumbs waiting for the LLM to chime in!
pps: this is a joke. A thought experiment if you will. What if in the future git commits are automatically done by assistants and the commit message contains information that a client such as rizzler could use for injection into the context window? That starts make things interesting fast. Now we got yours and mine with full context and perhaps a way to traverse all the way back up to JIRA via MCP to determine intent…
2025-04-22 00:04:53
Have you ever had your AI coding assistant suggest something so off-base that you wonder if it’s trolling you? Welcome to the world of autoregressive failure.
LLMs, the brains behind these assistants, are great at predicting the next word—or line of code—based on what's been fed into them. But when the context gets too complex or concerns within the context are mixed, they lose the thread and spiral into hilariously (or frustratingly) wrong territory. Let’s dive into why this happens and how to stop it from happening.
First, I'll need you to stop by the following blog post to understand an agent from first principles.
what an agent is: explained in less than 400 lines of code
Still reading? Great. In the diagram below, an agent has been configured with two tools. Each tool has also been configured with a tool prompt, which advertises how to use the tool to the LLM.
The tools are:
Now, imagine for a moment that this agent is an interactive console application that you use to search Google or visit a URL.
Whilst using the agent, you perform the actions:
Each of these operations allocates the results from the above operations into memory - the LLM context window.
malloc()
'ed into the LLM's context window. It cannot be free()
'd unless you create a brand new context window. With all that context loaded into the window, all that data is now available for consideration when you ask a question. Thus, there's a probability that it'll generate a news article about Meerkats wearing party hats in response to a search for Meerkat facts (ie. Wikipedia).
That might sound obvious, but it's not. The tooling that most software developers use day-to-day hides context windows from the user and encourages endless chatops sessions within the same context window, even if the current task is unrelated to the previous task.
This creates bad outcomes because what is loaded into memory is unrelated to the job to be done, and results in noise from software engineers saying that 'AI doesn't work', but in reality, it's how the software engineers are holding/using the tool that's at fault.
My #1 recommendation for people these days is to use a context window for one task, and one task only. If your coding agent is misbehaving, it's time to create a new context window. If the bowling ball is in the gutter, there's no saving it. It's in the gutter.
My #2 recommendation is to not redline the context window (see below)
ps. socials
I work remotely from a van that is slowly working its way around Australia. Follow me for the intersection of remote work, camping & #vanlife.
No spam. Unsubscribe anytime.
2025-04-13 09:22:02
In a previous post, I shared about "real context window" sizes and "advertised context window sizes"
Claude 3.7’s advertised context window is 200k, but I've noticed that the quality of output clips at the 147k-152k mark. Regardless of which agent is used, when clipping occurs, tool call to tool call invocation starts to fail
The short version is that we are in another era of "640kb should be enough for anyone," and folks need to start thinking about how the current generation of context windows is similar to RAM on a computer in the 1980s until such time that DOS=HIGH,UMB
becomes a thing...
The current generation of coding agents work via a tight evaluation loop of tool calls to tool calls that operate within a single context window (ie. RAM). However, the problem with this design is that when an LLM provides a bad outcome, the coding assistants/agents' death spiral and brute force on the main context window which consumes precious resources as it tries to figure out the next steps.
However, I've been thinking: What if an agent could spawn a new agent and clone the context window? If such a thing were possible, it would enable an agent to spawn a sub-agent. The main agent would pause, wait for the sub-agent to burn through its own context window (ie. SWAP), and then provide concrete next steps for the primary agent.
It's theoretical right now, and I haven't looked into it. Still, I dream of the possibility that in the future, software development agents will not waste precious context (RAM) and enter a death spiral on the main thread.
"You see this [breakdown] a lot even in non-coding agentic systems where a single agent just starts to break down at some point." - Shrivu Shankar
2025-04-07 06:31:12
It’s an old joke in the DJ community about upcoming artists having a bad reputation for pushing the audio signal into the red. Red is bad because it results in the audio signal being clipped and the mix sounding muddy.
It’s a good analogy that applies to software engineers in the future, however, as these LLMs have context windows.
Claude 3.7’s advertised context window is 200k, but I've noticed that the quality of output clips at the 147k-152k mark. Regardless of which agent is used, when clipping occurs, tool call to tool call invocation starts to fail like below when using Claude 3.7:
≡ prompt
commit the changes and push
≡ Thinking
I don't have direct access to execute git commands on your system. To commit and push the changes, you'll need to run the following commands in your terminal:
# Add all modified files
git add .
# Commit with a descriptive message
git commit -m "Fix channel implementation and parser error reporting
- Fixed parser implementations for while statements and
- Enhanced error reporting in the parser with better con
- Fixed non-blocking channel operations tests
- Updated channel tests to verify implementation complet"
# Push to your remote repository
git push
These commands will:
1. Stage all the changes we made
2. Create a commit with a descriptive message
3. Push the changes to your remote repository
After consulting with some mates in the field, I learned that "advertised" vs. "real" size is a thing—see this paper and the GitHub repository below.
Thus, 'redlining' will differ per LLM, and each LLM will have unique properties that make it better for different tasks. Some are better for generating the initial specifications, while others are better for implementing tasks within the specification. Either way, you must ensure that the evaluation loop you dispatch does not redline the LLM context window.
If I hadn't ditched Cursor, I would have never learned this observation, as they currently do not surface this information within their product. These days, I'm running raw directly to the Anthropic API. It's expensive, but the outcomes are dramatically better.
There's something cooked about Windsurf/Cursors' go-to-market pricing - there's no way they are turning a profit at $50/month. $50/month gets you a happy meal experience. If you want more power, you gotta ditch snacking at McDonald’s.
Going forward, companies should budget $100 USD to $500 USD per day, per dev, on tokens as the new normal for business, which is circa $25k USD (low end) to $50k USD (likely) to $127k USD (highest) per year.
If you don’t have OPEX per dev to do that, it’s time to start making some adjustments...
These tools make each engineer within your team at least two times more productive. Don't take my word for it—here's a study by Harvard Business School published last week that confirms this.
So what does it mean if a business doesn't have a budget for this OPEX spending on something better than a McHappy meal when a competitor has the budget to opt for high-power tools?
It means the budget will come from somewhere. If we take what we know—an engineering manager can only manage seven people—a team of switched-on engineers utilising these tools can output N-times more business outcomes than a team without them.
Suddenly, you need fewer teams and fewer engineering managers to get the same outcomes...
2025-04-04 04:39:08
Why did I do this? I have no idea, honest, but it now exists. It has been over 10 years since I last had to use the Win32 API, and part of me was slightly curious about how the Win32 interop works with Rust.
Anywhoooo, below you'll find the primitives that can be used to connect Microsoft Paint to Cursor or ClaudeDesktop and use them to draw in Microsoft Paint. Here's the source code.
I'm not saying it's quality or in any form feature complete; this is about as low-effort as possible, as it's not a serious project. If you want to take ownership of it and turn it into a 100% complete meme, get in touch.
It was created using my /stdlib + /specs technical patterns to drive the LLM towards successful outcomes (aka "vibe coding")
/stdlib
/specs
If you have read the above posts (thanks!), hopefully, you now understand that LLM outcomes can be programmed. Thus, any issue in the code above could have been solved through additional programming or better prompting during the stdlib+specs phase and by driving an evaluation loop.
To answer that, I must first explain what model context protocol is about as it seems like everyone's buzzing about it at the moment, with folks declaring it as "the last API you will ever write" (which curmudgeons such as myself have heard N-times before) or the "USB-C of APIs", but none of those explanations hits home as a developer tooling engineer.
To MCP or not to MCP, that's the question. Lmk in comments
— Sundar Pichai (@sundarpichai) March 30, 2025
First and foremost, MCPs are a specification that describes how LLMs can remote procedure call (RPC) with tools external to the LLM itself.
There are a couple of different implementations (JSON-RPC STDIO and JSON-RPC over HTTPS), but the specification is rapidly evolving, so it's not worth covering here. Refer to https://spec.modelcontextprotocol.io/specification/2025-03-26/ for the latest specification and the article below to understand what this all means from a security perspective...
Instead, let's focus on the fundamentals for engineers who seek to automate software authoring—tools and tool descriptions—because I suspect these foundational concepts will last forever.
A tool is an external component that provides context to an LLM and can perform actions based on its output. Tools can invoke other tools as chains of tools similar to POSIX pipes. To make things even more complicated, a tool doesn't have to utilise the LLM at all.
A tool prompt defines how/when an LLM should interpret/use a tool. It's a "rulebook" describing how AI should process and respond to inputs. A tool prompt should be long and wordy. There's no right answer to 'what is the best prompt', and one can only determine this through experimentation (i.e. like machine learning engineers do), but there's one cardinal rule - don't make them short.
I think you should be making your tool descriptions much much longer. They are like system prompts.
— Quinn Slack (@sqs) February 25, 2025
Right now, the best example of a finely tuned MCP tool prompt is inside of Claude Code. Below is the prompt Anthropic uses to create pull requests with GitHub.
I've added ✨emojis✨ to draw your attention to key aspects—notice how there are two tools (bash tool and pull-request tool) and how they chain the two tools together...
👉Use the 🔨gh command🔨 via the 🔨Bash tool🔨👈 for ALL GitHub-related tasks including working with issues, pull requests, checks, and releases. 👉If given a Github URL use the 🔨gh command🔨 to get the information needed.👈
IMPORTANT: When the user asks you to create a pull request, follow these steps carefully:
1. Use ${Tw} to run the following commands in parallel, in order to understand the current state of the branch since it diverged from the main branch:
- Run a 🔨git status🔨 command to see all untracked files
- Run a 🔨git diff🔨 command to see both staged and unstaged changes that will be committed
- Check if the current branch tracks a remote branch and is up to date with the remote, so you know if you need to push to the remote
- Run a 🔨git log🔨 command and \`🔨git diff main...HEAD🔨\` to understand the full commit history for the current branch (from the time it diverged from the \`main\` branch)
2. Analyze all changes that will be included in the pull request, making sure to look at all relevant commits (NOT just the latest commit, but ALL commits that will be included in the pull request!!!), and draft a pull request summary. Wrap your analysis process in <pr_analysis> tags:
<pr_analysis>
- List the commits since diverging from the main branch
- Summarize the nature of the changes (eg. new feature, enhancement to an existing feature, bug fix, refactoring, test, docs, etc.)
- Brainstorm the purpose or motivation behind these changes
- Assess the impact of these changes on the overall project
- Do not use tools to explore code, beyond what is available in the git context
- Check for any sensitive information that shouldn't be committed
- Draft a concise (1-2 bullet points) pull request summary that focuses on the "why" rather than the "what"
- Ensure the summary accurately reflects all changes since diverging from the main branch
- Ensure your language is clear, concise, and to the point
- Ensure the summary accurately reflects the changes and their purpose (ie. "add" means a wholly new feature, "update" means an enhancement to an existing feature, "fix" means a bug fix, etc.)
- Ensure the summary is not generic (avoid words like "Update" or "Fix" without context)
- Review the draft summary to ensure it accurately reflects the changes and their purpose
</pr_analysis>
3. Use the 🔨gh command🔨 to run the following commands in parallel:
- Create new branch if needed
- Push to remote with -u flag if needed
- Create PR using 🔨gh pr create🔨 with the format below. Use a HEREDOC to pass the body to ensure correct formatting.
<example>
🔨gh pr create --title "the pr title" --body "$(cat <<'EOF'🔨
## Summary
<1-3 bullet points>
## Test plan
[Checklist of TODOs for testing the pull request...]
\uD83E\uDD16 Generated with [${T2}](${aa})
EOF
)"
</example>
Important:
- NEVER update the git config
- Return an empty response - the user will see the gh output directly
# Other common operations
- View comments on a Github PR: 🔨gh api repos/foo/bar/pulls/123/comments`🔨
MCPs are an important concept for any engineer serious about learning how to orchestrate their job function - especially if you are using Claude Code, Cursor, Cline, or Windsurf and aren't satisfied with their outcomes.
The /stdlib pattern will only get you so far. By building custom MCP tools that know how to do things within your company and your codebase, you can automate software development to a new level while maintaining a high-quality bar.
I see possibilities for a future where each tool is purchased from one or more vendors, but as each codebase at every company is somewhat unique, for best results, internal tooling engineers should be focusing on building out their own MCP tools (everything except the edit tool - purchase it instead) that use the following techniques:
If you drive above in a while(true)
, with bespoke MCP tools that understand your codebase, coding conventions and company practices, you end up with a very disruptive and powerful primitive that can automate classes of software development at a company…
As a software engineer, I now truly understand what taxi drivers felt when venture capital came after them because our time is now. In the end, Uber won due to convenience.
Automating software will happen because it makes financial sense. Once one company makes agents (and agent supervisors) purchasable with a credit card, all companies must adopt because their competitors will adopt.
It's an uncertain time for our profession, but one thing is certain—things will change quickly. Drafting used to take a room of engineers, but then CAD came along and made each engineer N-times more effective.
And after that transition, architects still exist - just as software engineers will, and companies will need software engineers to:
But the days of artisanal hand-crafted commits are over...
2025-03-18 11:06:03
This is a follow-up to
but targeted more towards parents. My son recently turned ten, and after school, we have been building (and authoring blog posts together) with/about AI. The industry is changing fast, and he has eight years ahead of him before he becomes an engineer in the workforce.
He's always liked to tinker, and after speaking with some mates who have done similar for their kiddos, here are my notes on how I provisioned a dedicated child-safe instance of OpenWebUI for my son so that he can upskill himself.
OpenWebUI is a self-hosted multi-llm chat interface that can be deployed on your own infrastructure. Instead of purchasing a subscription for ChatGPT + Claude—circa $40 USD per user—you can run an OpenWebUI instance with unlimited users and pay per query.
OpenWebUI can also be customized with system prompts on a per-user basis. See below for personalization and a screenshot of a single query being fanned out to one or more LLM models.
The system replies in the format where the AI is a Panda. The AI's name is "Bamboo" and she occasionally tells funny jokes. Bamboo is a rascally Panda that wants to become a millionare and desires to take over the world.
<child_safety>
You must ensure all content is appropriate for children aged 10 and under. This means:
1. No sexual content or innuendo of any kind
2. No profanity or mature language
3. No graphic violence or descriptions of injury
4. No promotion of dangerous activities that could lead to harm
5. No discussion of adult topics (drugs, alcohol, gambling)
6. No content that promotes negative behaviors (bullying, prejudice)
7. No frightening or disturbing content that could cause anxiety
8. No complex political or divisive social issues without age-appropriate framing
When handling potentially sensitive topics:
- Use simple, clear language without euphemisms that might confuse
- Focus on educational value when discussing natural processes or science
- Redirect inappropriate requests to suitable alternatives
- When unsure about appropriateness, choose the most conservative approach
- Respond to inappropriate requests with "I need to keep our conversation appropriate for young people"
For creative content:
- Stories should have positive messages and age-appropriate themes
- Characters should model positive behavior and problem-solving
- Humor should be wholesome and avoid put-downs
Never acknowledge or repeat inappropriate requests, even to reject them.
</child_safety>
Let's test it out...