2026-04-18 11:15:36
Agentic Engineering Patterns >
Here's an example of a deceptively short prompt that got a lot of work done in a single shot.
First, some background. I send out a free Substack newsletter around once a week containing content copied-and-pasted from my blog. I'm effectively using Substack as a lightweight way to allow people to subscribe to my blog via email.
I generate the newsletter with my blog-to-newsletter tool - an HTML and JavaScript app that fetches my latest content from this Datasette instance and formats it as rich text HTML, which I can then copy to my clipboard and paste into the Substack editor. Here's a detailed explanation of how that works.
I recently added a new type of content to my blog to capture content that I post elsewhere, which I called "beats". These include things like releases of my open source projects, new tools that I've built, museums that I've visited (from niche-museums.com) and other external content.
I wanted to include these in the generated newsletter. Here's the prompt I ran against the simonw/tools repository that hosts my blog-to-newsletter tool, using Claude Code on the web.
Clone simonw/simonwillisonblog from github to /tmp for reference Update blog-to-newsletter.html to include beats that have descriptions - similar to how the Atom everything feed on the blog works Run it with python -m http.server and use `uvx rodney --help` to test it - compare what shows up in the newsletter with what's on the homepage of https://simonwillison.netThis got me the exact solution I needed. Let's break down the prompt.
Clone simonw/simonwillisonblog from github to /tmp for reference
I use this pattern a lot. Coding agents can clone code from GitHub, and the best way to explain a problem is often to have them look at relevant code. By telling them to clone to /tmp I ensure they don't accidentally end up including that reference code in their own commit later on.
The simonw/simonwillisonblog repository contains the source code for my Django-powered simonwillison.net blog. This includes the logic and database schema for my new "beats" feature.
Update blog-to-newsletter.html to include beats that have descriptions - similar to how the Atom everything feed on the blog works
Referencing blog-to-newsletter.html is all I need here to tell Claude which of the 200+ HTML apps in that simonw/tools repo it should be modifying.
Beats are automatically imported from multiple sources. Often they aren't very interesting - a dot-release bug fix for one of my smaller open source projects, for example.
My blog includes a way for me to add additional descriptions to any beat, which provides extra commentary but also marks that beat as being more interesting than those that I haven't annotated in some way.
I already use this as a distinction to decide which beats end up in my site's Atom feed. Telling Claude to imitate that saves me from having to describe the logic in any extra detail.
Run it with python -m http.server and use `uvx rodney --help` to test it - compare what shows up in the newsletter with what's on the homepage of https://simonwillison.net
Coding agents always work best if they have some kind of validation mechanism they can use to test their own work.
In this case I wanted Claude Code to actively check that the changes it made to my tool would correctly fetch and display the latest data.
I reminded it to use python -m http.server as a static server because I've had issues in the past with applications that fetch data and break when served as a file from disk instead of a localhost server. In this particular case that may not have been necessary, but my prompting muscle memory has python -m http.server baked in at this point!
I described the uvx rodney --help trick in the agentic manual testing chapter. Rodney is browser automation software that can be installed using uvx, and that has --help output designed to teach an agent everything it needs to know in order to use the tool.
I figured that telling Claude to compare the results in the newsletter to the content of my blog's homepage would be enough for it to confidently verify that the new changes were working correctly, since I had recently posted content that matched the new requirements.
You can see the full session here, or if that doesn't work I have an alternative transcript showing all of the individual tool calls.
The resulting PR made exactly the right change. It added an additional UNION clause to the SQL query that fetched the blog's content, filtering out draft beats and beats that have nothing in their note column:
...
union all
select
id,
'beat' as type,
title,
created,
slug,
'No HTML' as html,
json_object(
'created', date(created),
'beat_type', beat_type,
'title', title,
'url', url,
'commentary', commentary,
'note', note
) as json,
url as external_url
from blog_beat
where coalesce(note, '') != '' and is_draft = 0
union all
...
const beatTypeDisplay = {
release: 'Release',
til: 'TIL',
til_update: 'TIL updated',
research: 'Research',
tool: 'Tool',
museum: 'Museum'
};
Tags: ai, llms, prompt-engineering, coding-agents, ai-assisted-programming, generative-ai, agentic-engineering, github
2026-04-18 07:59:03
This year's PyCon US is coming up next month from May 13th to May 19th, with the core conference talks from Friday 15th to Sunday 17th and tutorial and sprint days either side. It's in Long Beach, California this year, the first time PyCon US has come to the West Coast since Portland, Oregon in 2017 and the first time in California since Santa Clara in 2013.
If you're based in California this is a great opportunity to catch up with the Python community, meet a whole lot of interesting people and learn a ton of interesting things.
In addition to regular PyCon programming we have two new dedicated tracks at the conference this year: an AI track on Friday and a Security track on Saturday.
The AI program was put together by track chairs Silona Bonewald (CitableAI) and Zac Hatfield-Dodds (Anthropic). I'll be an in-the-room chair this year, introducing speakers and helping everything run as smoothly as possible.
Here's the AI track schedule in full:
(And here's how I scraped that as a Markdown list from the schedule page using Claude Code and Rodney.)
I've been going to PyCon for over twenty years now - I first went back in 2005. It's one of my all-time favourite conference series. Even as it's grown to more than 2,000 attendees PyCon US has remained a heavily community-focused conference - it's the least corporate feeling large event I've ever attended.
The talks are always great, but it's the add-ons around the talks that really make it work for me. The lightning talks slots are some of the most heavily attended sessions. The PyLadies auction is always deeply entertaining. The sprints are an incredible opportunity to contribute directly to projects that you use, coached by their maintainers.
In addition to scheduled talks, the event has open spaces, where anyone can reserve space for a conversation about a topic - effectively PyCon's version of an unconference. I plan to spend a lot of my time in the open spaces this year - I'm hoping to join or instigate sessions about both Datasette and agentic engineering.
I'm on the board of the Python Software Foundation, and PyCon US remains one of our most important responsibilities - in the past it's been a key source of funding for the organization, but it's also core to our mission to "promote, protect, and advance the Python programming language, and to support and facilitate the growth of a diverse and international community of Python programmers".
If you do come to Long Beach, we'd really appreciate it if you could book accommodation in the official hotel block, for reasons outlined in this post on the PSF blog.
Tags: conferences, open-source, pycon, python, ai, psf
2026-04-17 12:01:56
Release: datasette 1.0a28
I was upgrading Datasette Cloud to 1.0a27 and discovered a nasty collection of accidental breakages caused by changes in that alpha. This new alpha addresses those directly:
- Fixed a compatibility bug introduced in 1.0a27 where
execute_write_fn()callbacks with a parameter name other thanconnwere seeing errors. (#2691)- The database.close() method now also shuts down the write connection for that database.
- New datasette.close() method for closing down all databases and resources associated with a Datasette instance. This is called automatically when the server shuts down. (#2693)
- Datasette now includes a pytest plugin which automatically calls
datasette.close()on temporary instances created in function-scoped fixtures and during tests. See Automatic cleanup of Datasette instances for details. This helps avoid running out of file descriptors in plugin test suites that were written before theDatabase(is_temp_disk=True)feature introduced in Datasette 1.0a27. (#2692)
Most of the changes in this release were implemented using Claude Code and the newly released Claude Opus 4.7.
Tags: datasette
2026-04-17 04:37:12
Release: llm-anthropic 0.25
- New model:
claude-opus-4.7, which supportsthinking_effort:xhigh. #66- New
thinking_displayandthinking_adaptiveboolean options.thinking_displaysummarized output is currently only available in JSON output or JSON logs.- Increased default
max_tokensto the maximum allowed for each model.- No longer uses obsolete
structured-outputs-2025-11-13beta header for older models.
2026-04-17 01:16:52
For anyone who has been (inadvisably) taking my pelican riding a bicycle benchmark seriously as a robust way to test models, here are pelicans from this morning's two big model releases - Qwen3.6-35B-A3B from Alibaba and Claude Opus 4.7 from Anthropic.
Here's the Qwen 3.6 pelican, generated using this 20.9GB Qwen3.6-35B-A3B-UD-Q4_K_S.gguf quantized model by Unsloth, running on my MacBook Pro M5 via LM Studio (and the llm-lmstudio plugin) - transcript here:

And here's one I got from Anthropic's brand new Claude Opus 4.7 (transcript):

I'm giving this one to Qwen 3.6. Opus managed to mess up the bicycle frame!
I tried Opus a second time passing thinking_level: max. It didn't do much better (transcript):

A lot of people are convinced that the labs train for my stupid benchmark. I don't think they do, but honestly this result did give me a little glint of suspicion. So I'm burning one of my secret backup tests - here's what I got from Qwen3.6-35B-A3B and Opus 4.7 for "Generate an SVG of a flamingo riding a unicycle":


I'm giving this one to Qwen too, partly for the excellent <!-- Sunglasses on flamingo! --> SVG comment.
The pelican benchmark has always been meant as a joke - it's mainly a statement on how obtuse and absurd the task of comparing these models is.
The weird thing about that joke is that, for the most part, there has been a direct correlation between the quality of the pelicans produced and the general usefulness of the models. Those first pelicans from October 2024 were junk. The more recent entries have generally been much, much better - to the point that Gemini 3.1 Pro produces illustrations you could actually use somewhere, provided you had a pressing need to illustrate a pelican riding a bicycle.
Today, even that loose connection to utility has been broken. I have enormous respect for Qwen, but I very much doubt that a 21GB quantized version of their latest model is more powerful or useful than Anthropic's latest proprietary release.
If the thing you need is an SVG illustration of a pelican riding a bicycle though, right now Qwen3.6-35B-A3B running on a laptop is a better bet than Opus 4.7!
Tags: ai, generative-ai, local-llms, llms, anthropic, claude, qwen, pelican-riding-a-bicycle, llm-release, lm-studio
2026-04-16 08:18:03
Tool: datasette.io news preview
The datasette.io website has a news section built from this news.yaml file in the underlying GitHub repository. The YAML format looks like this:
- date: 2026-04-15
body: |-
[Datasette 1.0a27](https://docs.datasette.io/en/latest/changelog.html#a27-2026-04-15) changes how CSRF protection works in a way that simplifies form and API integration, and introduces a new `RenameTableEvent` for when a table is renamed by a SQL query.
- date: 2026-03-18
body: |-
...
This format is a little hard to edit, so I finally had Claude build a custom preview UI to make checking for errors have slightly less friction.
I built it using standard claude.ai and Claude Artifacts, taking advantage of Claude's ability to clone GitHub repos and look at their content as part of a regular chat:
Clone https://github.com/simonw/datasette.io and look at the news.yaml file and how it is rendered on the homepage. Build an artifact I can paste that YAML into which previews what it will look like, and highlights any markdown errors or YAML errors

Tags: vibe-coding, claude, tools, datasette