MoreRSS

site iconSimon WillisonModify

Creator of Datasette and Lanyrd, co-creator of the Django Web Framework.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Simon Willison

Adding a new content type to my blog-to-newsletter tool

2026-04-18 11:15:36

Agentic Engineering Patterns >

Here's an example of a deceptively short prompt that got a lot of work done in a single shot.

First, some background. I send out a free Substack newsletter around once a week containing content copied-and-pasted from my blog. I'm effectively using Substack as a lightweight way to allow people to subscribe to my blog via email.

I generate the newsletter with my blog-to-newsletter tool - an HTML and JavaScript app that fetches my latest content from this Datasette instance and formats it as rich text HTML, which I can then copy to my clipboard and paste into the Substack editor. Here's a detailed explanation of how that works.

I recently added a new type of content to my blog to capture content that I post elsewhere, which I called "beats". These include things like releases of my open source projects, new tools that I've built, museums that I've visited (from niche-museums.com) and other external content.

I wanted to include these in the generated newsletter. Here's the prompt I ran against the simonw/tools repository that hosts my blog-to-newsletter tool, using Claude Code on the web.

Clone simonw/simonwillisonblog from github to /tmp for reference

Update blog-to-newsletter.html to include beats that have descriptions - similar to how the Atom everything feed on the blog works

Run it with python -m http.server and use `uvx rodney --help` to test it - compare what shows up in the newsletter with what's on the homepage of https://simonwillison.net
This got me the exact solution I needed. Let's break down the prompt.

Clone simonw/simonwillisonblog from github to /tmp for reference

I use this pattern a lot. Coding agents can clone code from GitHub, and the best way to explain a problem is often to have them look at relevant code. By telling them to clone to /tmp I ensure they don't accidentally end up including that reference code in their own commit later on.

The simonw/simonwillisonblog repository contains the source code for my Django-powered simonwillison.net blog. This includes the logic and database schema for my new "beats" feature.

Update blog-to-newsletter.html to include beats that have descriptions - similar to how the Atom everything feed on the blog works

Referencing blog-to-newsletter.html is all I need here to tell Claude which of the 200+ HTML apps in that simonw/tools repo it should be modifying.

Beats are automatically imported from multiple sources. Often they aren't very interesting - a dot-release bug fix for one of my smaller open source projects, for example.

My blog includes a way for me to add additional descriptions to any beat, which provides extra commentary but also marks that beat as being more interesting than those that I haven't annotated in some way.

I already use this as a distinction to decide which beats end up in my site's Atom feed. Telling Claude to imitate that saves me from having to describe the logic in any extra detail.

Run it with python -m http.server and use `uvx rodney --help` to test it - compare what shows up in the newsletter with what's on the homepage of https://simonwillison.net

Coding agents always work best if they have some kind of validation mechanism they can use to test their own work.

In this case I wanted Claude Code to actively check that the changes it made to my tool would correctly fetch and display the latest data.

I reminded it to use python -m http.server as a static server because I've had issues in the past with applications that fetch data and break when served as a file from disk instead of a localhost server. In this particular case that may not have been necessary, but my prompting muscle memory has python -m http.server baked in at this point!

I described the uvx rodney --help trick in the agentic manual testing chapter. Rodney is browser automation software that can be installed using uvx, and that has --help output designed to teach an agent everything it needs to know in order to use the tool.

I figured that telling Claude to compare the results in the newsletter to the content of my blog's homepage would be enough for it to confidently verify that the new changes were working correctly, since I had recently posted content that matched the new requirements.

You can see the full session here, or if that doesn't work I have an alternative transcript showing all of the individual tool calls.

The resulting PR made exactly the right change. It added an additional UNION clause to the SQL query that fetched the blog's content, filtering out draft beats and beats that have nothing in their note column:

...
union all
select
  id,
  'beat' as type,
  title,
  created,
  slug,
  'No HTML' as html,
  json_object(
    'created', date(created),
    'beat_type', beat_type,
    'title', title,
    'url', url,
    'commentary', commentary,
    'note', note
  ) as json,
  url as external_url
from blog_beat
where coalesce(note, '') != '' and is_draft = 0
union all
...
And it figured out a mapping of beat types to their formal names, presumably derived from the Django ORM definition that it read while it was exploring the reference codebase:
const beatTypeDisplay = {
  release: 'Release',
  til: 'TIL',
  til_update: 'TIL updated',
  research: 'Research',
  tool: 'Tool',
  museum: 'Museum'
};
Telling agents to use another codebase as reference is a powerful shortcut for communicating complex concepts with minimal additional information needed in the prompt.

Tags: ai, llms, prompt-engineering, coding-agents, ai-assisted-programming, generative-ai, agentic-engineering, github

Join us at PyCon US 2026 in Long Beach - we have new AI and security tracks this year

2026-04-18 07:59:03

This year's PyCon US is coming up next month from May 13th to May 19th, with the core conference talks from Friday 15th to Sunday 17th and tutorial and sprint days either side. It's in Long Beach, California this year, the first time PyCon US has come to the West Coast since Portland, Oregon in 2017 and the first time in California since Santa Clara in 2013.

If you're based in California this is a great opportunity to catch up with the Python community, meet a whole lot of interesting people and learn a ton of interesting things.

In addition to regular PyCon programming we have two new dedicated tracks at the conference this year: an AI track on Friday and a Security track on Saturday.

The AI program was put together by track chairs Silona Bonewald (CitableAI) and Zac Hatfield-Dodds (Anthropic). I'll be an in-the-room chair this year, introducing speakers and helping everything run as smoothly as possible.

Here's the AI track schedule in full:

(And here's how I scraped that as a Markdown list from the schedule page using Claude Code and Rodney.)

You should come to PyCon US!

I've been going to PyCon for over twenty years now - I first went back in 2005. It's one of my all-time favourite conference series. Even as it's grown to more than 2,000 attendees PyCon US has remained a heavily community-focused conference - it's the least corporate feeling large event I've ever attended.

The talks are always great, but it's the add-ons around the talks that really make it work for me. The lightning talks slots are some of the most heavily attended sessions. The PyLadies auction is always deeply entertaining. The sprints are an incredible opportunity to contribute directly to projects that you use, coached by their maintainers.

In addition to scheduled talks, the event has open spaces, where anyone can reserve space for a conversation about a topic - effectively PyCon's version of an unconference. I plan to spend a lot of my time in the open spaces this year - I'm hoping to join or instigate sessions about both Datasette and agentic engineering.

I'm on the board of the Python Software Foundation, and PyCon US remains one of our most important responsibilities - in the past it's been a key source of funding for the organization, but it's also core to our mission to "promote, protect, and advance the Python programming language, and to support and facilitate the growth of a diverse and international community of Python programmers".

If you do come to Long Beach, we'd really appreciate it if you could book accommodation in the official hotel block, for reasons outlined in this post on the PSF blog.

Tags: conferences, open-source, pycon, python, ai, psf

datasette 1.0a28

2026-04-17 12:01:56

Release: datasette 1.0a28

I was upgrading Datasette Cloud to 1.0a27 and discovered a nasty collection of accidental breakages caused by changes in that alpha. This new alpha addresses those directly:

  • Fixed a compatibility bug introduced in 1.0a27 where execute_write_fn() callbacks with a parameter name other than conn were seeing errors. (#2691)
  • The database.close() method now also shuts down the write connection for that database.
  • New datasette.close() method for closing down all databases and resources associated with a Datasette instance. This is called automatically when the server shuts down. (#2693)
  • Datasette now includes a pytest plugin which automatically calls datasette.close() on temporary instances created in function-scoped fixtures and during tests. See Automatic cleanup of Datasette instances for details. This helps avoid running out of file descriptors in plugin test suites that were written before the Database(is_temp_disk=True) feature introduced in Datasette 1.0a27. (#2692)

Most of the changes in this release were implemented using Claude Code and the newly released Claude Opus 4.7.

Tags: datasette

llm-anthropic 0.25

2026-04-17 04:37:12

Release: llm-anthropic 0.25

  • New model: claude-opus-4.7, which supports thinking_effort: xhigh. #66
  • New thinking_display and thinking_adaptive boolean options. thinking_display summarized output is currently only available in JSON output or JSON logs.
  • Increased default max_tokens to the maximum allowed for each model.
  • No longer uses obsolete structured-outputs-2025-11-13 beta header for older models.

Tags: llm, anthropic, claude

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7

2026-04-17 01:16:52

For anyone who has been (inadvisably) taking my pelican riding a bicycle benchmark seriously as a robust way to test models, here are pelicans from this morning's two big model releases - Qwen3.6-35B-A3B from Alibaba and Claude Opus 4.7 from Anthropic.

Here's the Qwen 3.6 pelican, generated using this 20.9GB Qwen3.6-35B-A3B-UD-Q4_K_S.gguf quantized model by Unsloth, running on my MacBook Pro M5 via LM Studio (and the llm-lmstudio plugin) - transcript here:

The bicycle frame is the correct shape. There are clouds in the sky. The pelican has a dorky looking pouch. A caption on the ground reads Pelican on a Bicycle!

And here's one I got from Anthropic's brand new Claude Opus 4.7 (transcript):

The bicycle frame is entirely the wrong shape. No clouds, a yellow sun. The pelican is looking behind itself, and has a less pronounced pouch than I would like.

I'm giving this one to Qwen 3.6. Opus managed to mess up the bicycle frame!

I tried Opus a second time passing thinking_level: max. It didn't do much better (transcript):

The bicycle frame is entirely the wrong shape but in a different way. Lines are more bold. Pelican looks a bit more like a pelican.

I don't think Qwen are cheating

A lot of people are convinced that the labs train for my stupid benchmark. I don't think they do, but honestly this result did give me a little glint of suspicion. So I'm burning one of my secret backup tests - here's what I got from Qwen3.6-35B-A3B and Opus 4.7 for "Generate an SVG of a flamingo riding a unicycle":

Qwen3.6-35B-A3B
(transcript)
The unicycle spokes are a too long. The pelican has sunglasses, a bowtie and appears to be smoking a cigarette. It has two heart emoji surrounding the caption Flamingo on a Unicycle. It has a lot of charisma.
Opus 4.7
(transcript)
The unicycle has a black wheel. The flamingo is a competent if slightly dull vector illustration of a flamingo. It has no flair.

I'm giving this one to Qwen too, partly for the excellent <!-- Sunglasses on flamingo! --> SVG comment.

What can we learn from this?

The pelican benchmark has always been meant as a joke - it's mainly a statement on how obtuse and absurd the task of comparing these models is.

The weird thing about that joke is that, for the most part, there has been a direct correlation between the quality of the pelicans produced and the general usefulness of the models. Those first pelicans from October 2024 were junk. The more recent entries have generally been much, much better - to the point that Gemini 3.1 Pro produces illustrations you could actually use somewhere, provided you had a pressing need to illustrate a pelican riding a bicycle.

Today, even that loose connection to utility has been broken. I have enormous respect for Qwen, but I very much doubt that a 21GB quantized version of their latest model is more powerful or useful than Anthropic's latest proprietary release.

If the thing you need is an SVG illustration of a pelican riding a bicycle though, right now Qwen3.6-35B-A3B running on a laptop is a better bet than Opus 4.7!

Tags: ai, generative-ai, local-llms, llms, anthropic, claude, qwen, pelican-riding-a-bicycle, llm-release, lm-studio

datasette.io news preview

2026-04-16 08:18:03

Tool: datasette.io news preview

The datasette.io website has a news section built from this news.yaml file in the underlying GitHub repository. The YAML format looks like this:

- date: 2026-04-15
  body: |-
    [Datasette 1.0a27](https://docs.datasette.io/en/latest/changelog.html#a27-2026-04-15) changes how CSRF protection works in a way that simplifies form and API integration, and introduces a new `RenameTableEvent` for when a table is renamed by a SQL query.
- date: 2026-03-18
  body: |-
    ...

This format is a little hard to edit, so I finally had Claude build a custom preview UI to make checking for errors have slightly less friction.

I built it using standard claude.ai and Claude Artifacts, taking advantage of Claude's ability to clone GitHub repos and look at their content as part of a regular chat:

Clone https://github.com/simonw/datasette.io and look at the news.yaml file and how it is rendered on the homepage. Build an artifact I can paste that YAML into which previews what it will look like, and highlights any markdown errors or YAML errors

Screenshot showing two side-by-side views of a datasette.io news preview tool. The left panel shows a dark-themed YAML editor with news entries containing date and body fields in Markdown format, with a red validation error at the bottom indicating the date field has an invalid format. The right panel shows the rendered preview output with formatted headings by date (April 2026, 18th March 2026), displaying 115 news entries with linked release names, inline code snippets, and changelog descriptions. A red badge with "1" appears on the left panel header indicating one validation error.

Tags: vibe-coding, claude, tools, datasette