2025-09-09 11:36:48
It's a strange feeling knowing that you can create anything, and I'm starting to wonder if there's a seventh stage to the "people stages of AI adoption by software developers"
whereby that seventh stage is essentially this scene in the matrix...
It's where you deeply understand that 'you can now do anything' and just start doing it because it's possible and fun, and doing so is faster than explaining yourself. Outcomes speak louder than words.
There's a falsehood that AI results in SWE's skill atrophy, and there's no learning potential.
If you’re using AI only to “do” and not “learn”, you are missing out
- David Fowler
I've never written a compiler, yet I've always wanted to do one, so I've been working on one for the last three months by running Claude in a while true loop (aka "Ralph Wiggum") with a simple prompt:
Hey, can you make me a programming language like Golang but all the lexical keywords are swapped so they're Gen Z slang?
Why? I really don't know. But it exists. And it produces compiled programs. During this period, Claude was able to implement anything that Claude desired.
The programming language is called "cursed". It's cursed in its lexical structure, it's cursed in how it was built, it's cursed that this is possible, it's cursed in how cheap this was, and it's cursed through how many times I've sworn at Claude.
For the last three months, Claude has been running in this loop with a single goal:
"Produce me a Gen-Z compiler, and you can implement anything you like."
It's now available at:
the website
the source code
Anything that Claude thought was appropriate to add. Currently...
Control Flow:ready
→ ifotherwise
→ elsebestie
→ forperiodt
→ whilevibe_check
→ switchmood
→ casebasic
→ default
Declaration:vibe
→ packageyeet
→ importslay
→ funcsus
→ varfacts
→ constbe_like
→ typesquad
→ struct
Flow Control:damn
→ returnghosted
→ breaksimp
→ continuelater
→ deferstan
→ goflex
→ range
Values & Types:based
→ truecringe
→ falsenah
→ nilnormie
→ inttea
→ stringdrip
→ floatlit
→ boolඞT
(Amogus) → pointer to type T
Comments:fr fr
→ line commentno cap...on god
→ block comment
Here is leetcode 104 - maximum depth for a binary tree:
vibe main
yeet "vibez"
yeet "mathz"
// LeetCode #104: Maximum Depth of Binary Tree 🌲
// Find the maximum depth (height) of a binary tree using ඞ pointers
// Time: O(n), Space: O(h) where h is height
struct TreeNode {
sus val normie
sus left ඞTreeNode
sus right ඞTreeNode
}
slay max_depth(root ඞTreeNode) normie {
ready (root == null) {
damn 0 // Base case: empty tree has depth 0
}
sus left_depth normie = max_depth(root.left)
sus right_depth normie = max_depth(root.right)
// Return 1 + max of left and right subtree depths
damn 1 + mathz.max(left_depth, right_depth)
}
slay max_depth_iterative(root ඞTreeNode) normie {
// BFS approach using queue - this hits different! 🚀
ready (root == null) {
damn 0
}
sus queue ඞTreeNode[] = []ඞTreeNode{}
sus levels normie[] = []normie{}
append(queue, root)
append(levels, 1)
sus max_level normie = 0
bestie (len(queue) > 0) {
sus node ඞTreeNode = queue[0]
sus level normie = levels[0]
// Remove from front of queue
collections.remove_first(queue)
collections.remove_first(levels)
max_level = mathz.max(max_level, level)
ready (node.left != null) {
append(queue, node.left)
append(levels, level + 1)
}
ready (node.right != null) {
append(queue, node.right)
append(levels, level + 1)
}
}
damn max_level
}
slay create_test_tree() ඞTreeNode {
// Create tree: [3,9,20,null,null,15,7]
// 3
// / \
// 9 20
// / \
// 15 7
sus root ඞTreeNode = &TreeNode{val: 3, left: null, right: null}
root.left = &TreeNode{val: 9, left: null, right: null}
root.right = &TreeNode{val: 20, left: null, right: null}
root.right.left = &TreeNode{val: 15, left: null, right: null}
root.right.right = &TreeNode{val: 7, left: null, right: null}
damn root
}
slay create_skewed_tree() ඞTreeNode {
// Create skewed tree for testing edge cases
// 1
// \
// 2
// \
// 3
sus root ඞTreeNode = &TreeNode{val: 1, left: null, right: null}
root.right = &TreeNode{val: 2, left: null, right: null}
root.right.right = &TreeNode{val: 3, left: null, right: null}
damn root
}
slay test_maximum_depth() {
vibez.spill("=== 🌲 LeetCode #104: Maximum Depth of Binary Tree ===")
// Test case 1: Balanced tree [3,9,20,null,null,15,7]
sus root1 ඞTreeNode = create_test_tree()
sus depth1_rec normie = max_depth(root1)
sus depth1_iter normie = max_depth_iterative(root1)
vibez.spill("Test 1 - Balanced tree:")
vibez.spill("Expected depth: 3")
vibez.spill("Recursive result:", depth1_rec)
vibez.spill("Iterative result:", depth1_iter)
// Test case 2: Empty tree
sus root2 ඞTreeNode = null
sus depth2 normie = max_depth(root2)
vibez.spill("Test 2 - Empty tree:")
vibez.spill("Expected depth: 0, Got:", depth2)
// Test case 3: Single node [1]
sus root3 ඞTreeNode = &TreeNode{val: 1, left: null, right: null}
sus depth3 normie = max_depth(root3)
vibez.spill("Test 3 - Single node:")
vibez.spill("Expected depth: 1, Got:", depth3)
// Test case 4: Skewed tree
sus root4 ඞTreeNode = create_skewed_tree()
sus depth4 normie = max_depth(root4)
vibez.spill("Test 4 - Skewed tree:")
vibez.spill("Expected depth: 3, Got:", depth4)
vibez.spill("=== Maximum Depth Complete! Tree depth detection is sus-perfect ඞ🌲 ===")
}
slay main_character() {
test_maximum_depth()
}
If this is your sort of chaotic vibe, and you'd like to turn this into the dogecoin of programming languages, head on over to GitHub and run a few more Claude code loops with the following prompt.
study specs/* to learn about the programming language. When authoring the cursed standard library think extra extra hard as the CURSED programming language is not in your training data set and may be invalid. Come up with a plan to implement XYZ as markdown then do it
There is no roadmap; the roadmap is whatever the community decides to ship from this point forward.
At this point, I'm pretty much convinced that any problems found in cursed can be solved by just running more Ralph loops by skilled operators (ie. people with experience with compilers who shape it through prompts from their expertise vs letting Claude just rip unattended). There's still a lot to be fixed, happy to take pull-requests.
The most high-IQ thing is perhaps the most low-IQ thing: run an agent in a loop.
LLMs amplify the skills that developers already have and enable people to do things where they don't have that expertise yet.
Success is defined as cursed ending up in the Stack Overflow developer survey as either the "most loved" or "most hated" programming language, and continuing the work to bootstrap the compiler to be written in cursed itself.
Cya soon in Discord? - https://discord.gg/CRbJcKaGNT
website
source code
ps. socials
I ran Claude in a loop for 3 months and created a brand new "GenZ" programming language.
— geoff (@GeoffreyHuntley) September 9, 2025
It's called @cursedlang.
v0.0.1 is now available, and the website is ready to go.
Details below! pic.twitter.com/Ku5kbWMRgR
2025-09-02 23:53:29
I just finished up a phone call with a "stealth startup" that was pitching an idea that agents could generate code securely via an MCP server. Needless to say, the phone call did not go well. What follows is a recap of the conversation where I just shot down the idea and wrapped up the call early because it's a bad idea.
If anyone pitches you on the idea that you can achieve secure code generation via an MCP tool or Cursor rules, run, don't walk.
Over the last nine months, I've written about the changes that are coming to our industry, where we're entering an arena where most of the code going forward is not going to be written by hand, but instead by agents.
where I think the puck is going.
I haven't written code by hand for nine months. I've generated, read, and reviewed a lot of code, and I think perhaps within the next year, the large swaths of code in business will no longer be artisanal hand-crafted. Those days are fast coming to a close.
Thus, naturally, there is a question that's on everyone's mind:
How do I make the agent generate secure code?
Let's start with what you should not do and build up from first principles.
2025-08-24 08:23:00
Hey everyone, I'm here today to teach you how to build a coding agent. By this stage of the conference, you may be tired of hearing the word "agent".
You hear the word frequently. However, it appears that everyone is using this term loosely without a clear understanding of what it means or how these coding agents operate internally. It's time to pull back the hood and show that there is no moat.
Learning how to build a coding agent is one of the best things you can do for your personal development in 2025, as it teaches you the fundamentals. Once you understand these fundamentals, you'll move from being a consumer of AI to a producer of AI who can automate things with AI.
Let me open with the following facts:
With LLM tokens, that's all it is.
300 lines of code running in a loop with LLM tokens. You just keep throwing tokens at the loop, and then you've got yourself an agent.
Today, we're going to build one. We're going to do it live, and I'll explain the fundamentals of how it all works. As we are now in 2025, it has become the norm to work concurrently with AI assistance. So, what better way to demonstrate the point of this talk than to have an agent build me an agent whilst I deliver this talk?
Cool. We're now building an agent. This is one of the things that's changing in our industry, because work can be done concurrently and whilst you are away from your computer.
The days of spending a week or a couple of days on a research spike are now over because you can turn an idea into execution just by speaking to your computer.
The next time you're on a Zoom call, consider that you could've had an agent building the work that you're planning to do during that Zoom call. If that's not the norm for you, and it is for your coworkers, then you're naturally not going to get ahead.
The tech industry is almost like a conveyor belt - we always need to be learning new things.
If I were to ask you what a primary key is, you should know what a primary key is. That's been the norm for a long time.
In 2024, it is essential to understand what a primary key is.
In 2025, you should be familiar with what a primary key is and how to create an agent, as knowing what this loop is and how to build an agent is now fundamental knowledge that employers are looking for in candidates before they'll let you in the door.
As this knowledge will transform you from being a consumer of AI to being a producer of AI that can orchestrate your job function. Employers are now seeking individuals who can automate tasks within their organisation.
If you're joining me later this afternoon for the conference closing (see below), I'll delve a bit deeper into the above.
the conference closing talk
Right now, you'll be somewhere on the journey above.
On the top left, we've got 'prove it to me, it's not real,' 'prove it to me, show me outcomes', 'prove it to me that it's not hype', and a bunch of 'it's not good enough' folks who get stuck up there on that left side of the cliff, completely ignoring that there are people on the other side of the cliff, completely automating their job function.
In my opinion, any disruption or job loss related to AI is not a result of AI itself, but rather a consequence of a lack of personal development and self-investment. If your coworkers are hopping between multiple agents, chewing on ideas, and running in the background during meetings, and you're not in on that action, then naturally you're just going to fall behind.
don't be the person on the left side of the cliff.
The tech industry's conveyor belt continues to move forward. If you're a DevOps engineer in 2025 and you don't have any experience with AWS or GCP, then you're going to find it pretty tough in the employment market.
What's surprising to software and data engineers is just how fast this is elapsing. It has been eight months since the release of the first coding agent, and most people are still unaware of how straightforward it is to build one, how powerful this loop is, and its disruptive implications for our profession.
So, my name's Geoffrey Huntley. I was the tech lead for developer productivity at Canva, but as of a couple of months ago, I'm one of the engineers at Sourcegraph building Amp. It's a small core team of about six people. We build AI with AI.
Cursor, Windsurf, Claude Code, GitHub Copilot, and Amp are just a small number of lines of code running in a loop of LLM tokens. I can't stress that enough. The model does all the heavy lifting here, folks. It's the model that does it all.
You are probably five vendors deep in product evaluation, right now, trying to compare all these agents to one another. But really, you're just chasing your tail.
It's so easy to build your own...
There are just a few key concepts you need to be aware of.
Not all LLMs are agentic.
The same way that you have different types of cars, like you've got a 40 series if you want to go off-road, and then you've also got people movers, which exist for transporting people.
The same principle applies to LLMs, and I've been able to map their behaviours into a quadrant.
A model is either high safety, low safety, an oracle, or agentic. It's never both or all.
If I were to ask you to do some security research, which model would you use?
That'd be Grok. That's a low safety model.
If you want something that's "ethics-aligned", it's Anthropic or OpenAI. So that's high safety. Similarly, you have oracles. Oracles are on the polar opposite of agentic. Oracles are suitable for summarisation tasks or require a high level of thinking.
Meanwhile, you have providers like Anthropic, and their Claude Sonnet is a digital squirrel (see below).
The first robot used to chase tennis balls. The first digital robot chases tool calls.
Sonnet is a robotic squirrel that just wants to do tool calls. It doesn't spend too much time thinking; it biases towards action, which is what makes it agentic. Sonnet focuses on incrementally obtaining success instead of pondering for minutes per turn before taking action.
It seems like every day, a new model is introduced to the market, and they're all competing with one another. But truth be told, they have their specialisations and have carved out their niches.
The problem is that, unless you're working with these models at an intimate level, you may not have this level of awareness of the specialisations of the models, which results in consumers just comparing the models on two basic primitives:
It's kind of like looking at a car, whether it has two doors or three doors, whilst ignoring the fact that some vehicles are designed for off-roading, while others are designed for passenger transport.
To build an agent, the first step is to choose a highly agentic model. That is currently Claude Sonnet, or Kimi K2.
Now, you might be wondering, "What if you want a higher level of reasoning and checking of work that the incremental squirrel does?". Ah, that's simple. You can wire other LLMs in as tools into an existing agentic LLM. This is what we do at Amp.
We call it the Oracle. The Oracle is just GPT wired in as a tool that Claude Sonnet can function call for guidance, to check work progress, and to conduct research/planning.
Amp's oracle is just another LLM registered in as a tool to an agentic LLM that it can function call
The next important thing to learn is that you should only use the context window for one activity. When you're using Cursor or any one of these tools, it's essential to clear the context window after each activity (see below).
LLM outcomtes are a needle in a haystack of what you've allocated into the haystack.
If you start an AI-assisted session to build a backend API controller, then reuse that session to research facts about meerkats. Then it should be no surprise when you tell it to redesign the website in the active session; the website might end up with facts about your API or meerkats, or both.
Context windows are very, very small. It's best to think of them as a Commodore 64, and as such, you should be treating it as a computer with a limited amount of memory. The more you allocate, the worse your outcome and performance will be.
The advertised context window for Sonnet is 200k. However, you don't get to use all of that because the model needs to allocate memory for the system-level prompt. Then the harness (Cursor, Windsurf, Claude Code, Amp) also needs to allocate some additional memory, which means you end up with approximately 176k tokens usable.
You probably heard a lot about the Model Context Protocols (MCPs). They are the current hot thing, and the easiest way to think about them is as a function with a description allocated to the context window that tells it how to invoke that function.
A common failure scenario I observe is people installing an excessive number of MCP servers or failing to consider the number of tools exposed by a single MCP tool or the aggregate context window allocation of all tools.
There is a cardinal rule that is not as well understood as it should be. The more you allocate to a context window, the worse the performance of the context window will be, and your outcomes will deteriorate.
Avoid excessively allocating to the context window with your agent or through MCP tool consumption. It's very easy to fall into a trap of allocating an additional 76K of tokens just for MCP tools, which means you only have 100K usable.
Less is more, folks. Less is more.
I recommend dropping by and reading the blog post below if you want to understand when to use MCP and when not to.
When you should use MCP, when you should not use MCP, and how allocations work in the context window.
Let's head back and check on our agent that's being built in the background. If you look at it closely enough, you can see the loop and how it's invoking other tools.
Essentially, how this all works is outlined in the loop below.
For every piece of input from the user or result of a tool call that gets allocated to the response, and that response is sent off for inferencing:
Let's open up our workshop materials (below) and run the basic chat application via:
2025-08-22 23:40:28
This blog post intends to be a definitive guide to context engineering fundamentals from the perspective of an engineer who builds commercial coding assistants and harnesses for a living.
Just two weeks ago, I was back over in San Francisco, and there was a big event on Model Context Protocol Servers. MCP is all hype right now. Everyone at the event was buzzing about the glory and how amazing MCP is going to be, or is, but when I pushed folks for their understanding of fundamentals, it was crickets.
It was a big event. Over 1,300 engineers registered, and an entire hotel was rented out as the venue for the takeover. Based on my best estimate, at least $150,000 USD to $200,000 USD was spent on this event. The estimate was attained through a game of over and under with the front-of-house engineers. They brought in a line array, a GrandMA 3, and had full DMX lighting. As a bit of a lighting nerd myself, I couldn't help but geek out a little.
To clarify, this event was a one-night meet-up, not a conference. There was no registration fee; attendance was free, and the event featured an open bar, including full cocktail service at four bars within the venue, as well as an after-party with full catering and chessboards. While this post might seem harsh on the event, I enjoyed it. It was good.
The meetup even hired a bunch of beatboxers to close off the event, and they gave a live beatbox performance about Model Context Protocol...
MC protocol live and in the flesh.
One of the big announcements was the removal of the 128 tool limit from Visual Studio Code....
Why Microsoft? It's not a good thing...
Later that night, I was sitting by the bar catching up with one of the engineers from Cursor, and we were just scratching our heads,
"What the hell? Why would you need 128 tools or why would you want more than that? Why is Microsoft doing this or encouraging this bad practice?"
For the record, Cursor caps the number of MCP tools that can be enabled in Cursor to just 40 tools, and it's for a good reason. What follows is a loose recap. This is knowledge that is known by people who build these coding harnesses, and I hope this knowledge spreads - there's one single truth:
2025-08-20 11:21:58
It's a meme as accurate as time. The problem is that our digital infrastructure depends upon just some random guy in Nebraska.
Open-source, by design, is not financially sustainable. Finding reliable, well-defined funding sources is exceptionally challenging. As projects grow in size, many maintainers burn out and find themselves unable to meet the increasing demands for support and maintenance.
Speaking from experience here, as someone who has delivered talks at conferences (see below) six years ago and also took a decent stab at resolving open source funding. The settlement on my land on Kangaroo Island was funded through open-source donations, and I'm forever thankful to the backers who supported me during a rough period of my life for helping make that happen.
Rather than watch a 60-minute talk by two burnt-out open-source maintainers, here is a quick summary of the conference video. The idea was simple:
If companies were to enumerate their bills of material and identify their unpaid vendors, they could take steps to mitigate their supply chain risks.
For dependencies that are of strategic importance, then the strategy would be a combination of financial support, becoming regular contributors to the project or even hiring the maintainers of these projects as engineers for [short|long]-term engagements.
Six years have gone by, and I haven't seen many companies do it. I mean, why would they? The software's given away for free, it's released as-is, so why would they pay?
It's only out of goodwill that someone would do it, or in my case, as part of a marketing expenditure program. While I was at Gitpod, I was able to distribute over $33,000 USD to open-source maintainers through the program.
The idea was simple: you could acquire backlinks and promote your brand on the profiles of prolific open-source maintainers, their website and in their GitHub repositories for a fraction of the cost compared to traditional marketing.
Through the above framework, I was able to raise over $33,000 USD for open source maintainers. The approach still works, and I don't understand why other companies are still overlooking it.
Now it's easy to say "marketing business dirty", etc., but what was underpinning this was a central thought.
If just one of those people can help more people better understand a technology or improve the developer experience for an entire ecosystem what is the worth/value of that and why isn’t our industry doing that yet?
The word volunteer, by definition, means those who have the ability and time to give freely.
Paying for resources that are being consumed broadens the list of people who can do open-source. Additionally, money enables open-source maintainers to buy services and outsource the activities that do not bring them joy.
AI has. I'm now eight months into my journey of using AI to automate software development (see below)
and when I speak with peers who have similarly spent the same amount of time invested in these tools, we're noticing a new emergent pattern:
2025-07-19 10:22:46
It might surprise some folks, but I'm incredibly cynical when it comes to AI and what is possible; yet I keep an open mind. That said, two weeks ago, when I was in SFO, I discovered another thing that should not be possible. Every time I find out something that works, which should not be possible, it pushes me further and further, making me think that we are already in post-AGI territory.
I was sitting next to a mate at a pub; it was pretty late, and we were just talking about LLM capabilities, riffing about what the modern version of Falco or any of these tools in the DFIR space looks like when combined with an LLM.
You see, a couple of months ago, I'd been playing with eBPF and LLMs and discovered that LLMs do eBPF unusually well. So in the spirit of deliberate practice (see below), a laptop was brought out, and we SSH'd into a Linux machine.
The idea was simple.
Could we convert an eBPF trace to a fully functional application via Ralph Wiggum? So we started with a toy.