MoreRSS

site iconSimon WillisonModify

Creator of Datasette and Lanyrd, co-creator of the Django Web Framework.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Simon Willison

Gemini 3.5 Flash: more expensive, but Google plan to use it for everything

2026-05-20 06:40:25

Today at Google I/O, Google released Gemini 3.5 Flash. This one skipped the -preview modifier and went straight to general availability, and Google appear to be using it for a whole lot of their key products:

3.5 Flash is available today to billions of people globally:

  • For everyone via the Gemini app and AI Mode in Google Search
  • For developers in our agent-first development platform Google Antigravity and Gemini API in Google AI Studio and Android Studio
  • For enterprises in Gemini Enterprise Agent Platform and Gemini Enterprise.

As usual with Gemini, the most interesting details are tucked away in the What's new in Gemini 3.5 Flash developer documentation. It mostly has the same set of platform features as the previous Gemini 3.x series, albeit with no computer use. The model ID is gemini-3.5-flash. The knowledge cut-off is January 2025, and it supports 1,048,576 input tokens and 65,536 maximum output tokens.

Google are also pushing a new Interactions API, currently in beta, which looks to me like their version of the patterns introduced by OpenAI Responses - in particular server-side history management.

The price has gone up

Gemini 3.5 Flash is accompanied by a notable price bump. The previous models in the "Flash" family were Gemini 3 Flash Preview and Gemini 3.1 Flash-Lite. The new 3.5 Flash is 3x the price of 3 Flash Preview and 6x the price of 3.1 Flash-Lite (see price comparison here).

At $1.50/million input and $9/million output it's getting close in price to Google's Gemini 3.1 Pro, which is $2 and $12.

The Gemini team promise that 3.5 Pro will roll out "next month" - presumably at an even higher price.

This fits a trend: OpenAI's GPT-5.5 was 2x the price of GPT-5.4, and Claude Opus 4.7 is around 1.46x the price of 4.6 when you take the new tokenizer into account.

Given the price increase it's interesting to see Google roll it out for so many of their own free-to-consumer products. It feels like all three of the major AI labs are starting to probe the price tolerance of their API customers.

Artificial Analysis publish the cost to run their proprietary benchmark against models, which is a useful way to take things like tokenization and increased volume of reasoning tokens into account. Some numbers worth comparing:

Running the benchmark for 3.5 Flash (high) cost significantly more than 3.1 Pro Preview!

Here are some numbers from other vendors:

A pelican on a bicycle

I ran "Generate an SVG of a pelican riding a bicycle" against the Gemini API and got back this pelican, which is a lot:

Black background, bats in the sky against a stylized moon. Pelican is funky looking. Very good beak. Bicycle frame is a bit twisted, and the bar from pedals to back wheel is missing. Bike lamp illuminates the road in front. Quite stylish.

From the code comments: <!-- Pelican Eye / Sunglasses (Cool Retro Aviators) -->

hedgehog on Hacker News:

That pelican looks like it's in Miami for a crypto conference.

That one cost me 11 input tokens and 14,403 output tokens, for a total cost of just under 13 cents.

Tags: google, ai, generative-ai, llms, gemini, llm-pricing, pelican-riding-a-bicycle, llm-release

The last six months in LLMs in five minutes

2026-05-19 09:09:44

I put together these annotated slides from my five minute lightning talk at PyCon US 2026, using the latest iteration of my annotated presentation tool.

The last six months in LLMs in
five minutes

Simon Willison - simonwillison.net

PyCon US 2026 Lightning Talk
#

I presented this lightning talk at PyCon US 2026, attempting to summarize the last six months of developments in LLMs in five minutes.

The November inflection point
#

Six months is a pretty convenient time period to cover, because it captures what I've been calling the November 2025 inflection point. November was a critical month in LLMs, especially for coding.

The “best” model changed hands 5 times
between Anthropic, OpenAl and Google
#

For one thing, the supposedly "best" model (depending mostly on vibes) changed hands five times between the three big providers.

Generate an SVG of a
pelican riding a bicycle
#

As always, I'm using my Generate an SVG of a pelican riding a bicycle test to help illustrate the differences between the models.

Why this test? Because pelicans are hard to draw, bicycles are hard to draw, pelicans can't ride bicycles... and there's zero chance any AI lab would train a model for such a ridiculous task.

Five pelicans, one for each of the following models. Varying qualities!
#

At the start of November the widely acknowledged "best" model was Claude Sonnet 4.5, released on 29th September. It drew me this pelican.

In November it was overtaken by GPT-5.1, then Gemini 3, then GPT-5.1 Codex Max, and then Anthropic took the crown back again with Claude Opus 4.5.

I think Gemini 3 drew the best pelican out of this lot, but pelicans aren't everything. Most practitioners will agree that Opus 4.5 held the crown for the next couple of months.

The coding agents got good
#

It took a little while for this to become clear, but the real news from November was that the coding agents got good.

OpenAI and Anthropic had spent most of 2025 running Reinforcement Learning from Verifiable Rewards to increase the quality of code written by their models, especially when paired up with their Codex and Claude Code agent harnesses.

In November the results of this work became apparent. Coding agents went from often-work to mostly-work, crossing a quality barrier where you could use them as a daily-driver to get real work done, without needing to spend most of your time fixing their stupid mistakes.

Screenshot of "Initial commit" on GitHub to steipete/Warelay, commit f6dd362, steipete authored on Nov 24, 2025

It's a copy of the MIT license
#

Also in November, this happened - the first commit to an obscure (back then) repo called "Warelay" by some guy called Pete.

December/January
(A little bit of LLM psychosis)
#

Over the holiday period, from December to January, a whole lot of us took advantage of the break to have a poke at these new models and coding agents and see what they could do.

They could do a lot! Some of us got a little bit over-excited. I had my own short-lived bout of a form of LLM psychosis as I started spinning up wildly ambitious projects to see how far I could push them.

micro-javascript playground
Execute JavaScript code in a sandboxed micro-javascript environment powered by Pyodide

var numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
var doubled = numbers.map(n => n * 2);
console.log('Doubled: "', doubled);
var evens = numbers.filter(n => n % 2 === 0);
console.log('Evens: ', evens);
var sum = numbers.reduce((a, b) => a + b, @);
console.log('Sum:", sum);

Output 27
Doubled: [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
Evens: [2, 4, 6, 8, 10]
Sum: 55
Execution time: 8.00ms
About: micro-javascript is a pure Python JavaScript interpreter with configurable memory and time limits. This playground runs entirely in your browser using
Pyodide (Python compiled to WebAssembly). View on GitHub
#

One of my projects was a vibe-coded implementation of JavaScript in Python - a loose port of MicroQuickJS - which I called micro-javascript. You can try it out in your browser in this playground.

JavaScript running in Python running in Pyodide running in WebAssembly running in JavaScript
#

That playground demo shows JavaScript code run using my micro-javascript library, in Python, running inside Pyodide, running in WebAssembly, running in JavaScript, running in a browser!

It's pretty cool! But did anyone out there need a buggy, slow, insecure half-baked implementation of JavaScript in Python?

They did not. I have quite a few other projects from that holiday period that I have since quietly retired!

February 2026
#

On to February. Remember that Warelay project that had its first commit at the end of November?

Warelay → CLAWDIS → CLAWDBOT →
Clawdbot → Moltbot →🦞 OpenClaw
#

In December and January it had gone through quite a few name changes... and by February it was taking the world by storm under its final name, OpenClaw.

The amount of attention it got is pretty astonishing for a project that was less than three months old.

Generic term: Claw
#

OpenClaw is a "personal AI assistant", and we actually got a generic term for these, based on NanoClaw and ZeroClaw and suchlike... they're called Claws.

An aquarium for your Claw
#

Mac Minis started to sell out around Silicon Valley, because people were buying them to run their Claws.

Drew Breunig joked to me that this is because they're the new digital pets, and a Mac Mini is the perfect aquarium for your Claw.

Alfred Molina's Doc Ock in Spider-Man 2, tearing apart a New York subway train with his four claws.
#

My favourite metaphor for Claws is Alfred Molina's Doc Ock in the 2004 movie Spider-Man 2. His claws were powered by AI, and were perfectly safe provided nothing damaged his inhibitor chip... after which they turned evil and took over.

Gemini 3.1 Pro

A really good illustration of a pelican riding a bicycle.
#

Also in February: Gemini 3.1 Pro came out, and drew me a really good pelican riding a bicycle. Look at this! It's even got a fish in its basket.

Gemini 3 Pro pelican contrasted with Gemini 3.1 Pro, as animated SVGs
#

And then Google's Jeff Dean tweeted this video of an animated pelican riding a bicycle, plus a frog on a penny-farthing and a giraffe driving a tiny car and an ostrich on roller skates and a turtle kickflipping a skateboard and a dachshund driving a stretch limousine.

So maybe the AI labs have been paying attention after all!

April 2026
#

A lot of stuff happened just in the past month.

Gemma 4 26B-A4B (17.99GB)

A pretty decent pelican riding a bicycle, though the bike is a bit mis-shapen.
#

Google released the Gemma 4 series of models, which are the most capable open weight models I've seen from a US company.

GLM-5.1
MIT, 754B parameter, 1.51TB!
#

Also last month, Chinese AI lab GLM came out with GLM-5.1 - an open weight 1.5TB monster! This is a very effective model... if you can afford the hardware to run it.

#

GLM-5.1 drew me this very competent pelican on a bicycle.

The bike is wonky, the pelican is floating.
#

... though when it tried to animate it the bicycle bounced off into the top and the bicycle got warped.

Screenshot of Bluesky

Charles
‪@charles.capps.me‬
I think you should pester it with another animal using another method of locomotion. 

Something tells me it was trained for this. I can't quite put my finger on it. /s

NORTH VIRGINIA OPOSSUM ON AN E-SCOOTER!!
#

Charles on Bluesky suggested I try it with a North Virginia Opossum on an E-scooter

NORTH VIRGINIA OPOSSUM
CRUISING THE COMMONWEALTH SINCE DUSK

And a really cool illustration of a possum.
#

And it did this! I've tried this on other models and they don't even come close. "Cruising the commonwealth since dusk" is perfect. It's animated too.

Qwen3.6-35B-A3B is a 20.9GB file that runs on my laptop

It drew a better pelican on a bicycle than Opus 4.7, which messed up the bicycle frame.
#

The other neat Chinese open weight models in April came from Qwen. Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7. That's a 20.9GB open weights model that runs on my laptop!

(I think this mainly demonstrates that the pelican on the bicycle has firmly exceeded its limits as a useful benchmark.)

Claude Sonnet 4.5 pelican for comparison.
#

Here's that Claude Sonnet 4.5 pelican from September for comparison.

The themes of the past 6 months:
Coding agents got really good
Local models wildly outperform expectations
#

So those were the two main themes of the past six months. The coding agents got really good... and the laptop-available models, while a lot weaker than the frontier, have started wildly outperforming expectations.

Tags: lightning-talks, pycon, speaking, ai, generative-ai, local-llms, llms, annotated-talks, pelican-riding-a-bicycle, coding-agents

Glaucous-winged Gull, Brown Pelican, Snowy Egret, Canada Goose

2026-05-18 22:51:00

Glaucous-winged Gull

Glaucous-winged Gull

Brown Pelican

Snowy Egret

Snowy Egret

Canada Goose

Canada Goose

Glaucous-winged Gull, Brown Pelican, Snowy Egret, Canada Goose, in Los Angeles River, CA, US

I'm heading home from PyCon US today so I went on a last morning walk to try and spot a pelican. I saw one! Didn't get a great photo of that, but I did see some goslings down by the swan boat lake.

GDS weighs in on the NHS's decision to retreat from Open Source

2026-05-17 23:59:41

GDS weighs in on the NHS's decision to retreat from Open Source

Terence Eden continues his coverage of the NHS' poorly considered decision to close down access to their open source repositories in response to vulnerabilities reported to them as part of Project Glasswing.

Now the Government Digital Service have joined the conversation with AI, open code and vulnerability risk in the public sector, published May 14th. Their key recommendation:

Keep open by default. Making everything private adds additional delivery and policy costs, and can reduce reuse and scrutiny. Openness should remain the default posture, with closure used sparingly and deliberately.

While they don't mention the NHS by name, Terence speaks the language of the civil service and interprets this as a major escalation:

Within the UK's Civil Service you occasionally hear the expression "being invited to a meeting without biscuits". It implies a rather frosty discussion without any of the polite niceties of a normal meeting. In general though, even when people have severe disagreements, it is rare for tempers to fray. It is even rarer for those internal disagreements to spill over into public.

Tags: open-source, security, ai, generative-ai, llms, gov-uk, terence-eden, ai-ethics, ai-security-research

Warelay -&gt; OpenClaw

2026-05-17 04:23:30

In preparation for a lightning talk I'm giving at PyCon US this afternoon I decided to figure out how many names OpenClaw has actually had since that first commit back in November.

Thanks to this first_line_history.py tool (code here) the answer, according to the Git history of the OpenClaw README, is:

Warelay → CLAWDIS → CLAWDBOT → Clawdbot → Moltbot →🦞 OpenClaw

Or in detail (the output from the tool):

2025-11-24T11:23:15+01:00 16dfc1a # Warelay — WhatsApp Relay CLI (Twilio)
2025-11-24T11:41:37+01:00 d4153da # 📡 Warelay — WhatsApp Relay CLI (Twilio)
2025-11-24T17:47:57+01:00 343ef9b # 📡 warelay — WhatsApp Relay CLI (Twilio)
2025-11-25T04:44:10+01:00 14b3c6f # 📡 warelay — WhatsApp Relay CLI
2025-11-25T12:48:40+01:00 4814021 # 📡 warelay — Send, receive, and auto-reply on WhatsApp—Twilio-backed or QR-linked.
2025-11-25T13:50:18+01:00 d51a3e9 # warelay 📡 - Send, receive, and auto-reply on WhatsApp via Twilio or QR-linked WhatsApp Web; webhook setup in one command
2025-11-25T13:51:13+01:00 4d2a8a8 # 📡 warelay — Send, receive, and auto-reply on WhatsApp—Twilio-backed or QR-linked.
2025-11-25T14:52:43+01:00 1ef7f4d # 📡 warelay — Send, receive, and auto-reply on WhatsApp.
2025-12-03T15:45:32+00:00 a27ee23 # 🦞 CLAWDIS — WhatsApp Gateway for AI Agents
2025-12-08T12:43:13+01:00 17fa2f4 # 🦞 CLAWDIS — WhatsApp & Telegram Gateway for AI Agents
2025-12-19T18:41:17+01:00 7710439 # 🦞 CLAWDIS — Personal AI Assistant
2026-01-04T14:32:47+00:00 246adaa # 🦞 CLAWDBOT — Personal AI Assistant
2026-01-10T05:14:09+01:00 cdb915d # 🦞 Clawdbot — Personal AI Assistant
2026-01-27T13:37:47-05:00 3fe4b25 # 🦞 Moltbot — Personal AI Assistant
2026-01-30T03:15:10+01:00 9a71607 # 🦞 OpenClaw — Personal AI Assistant

Tags: openclaw, git, tools

Quoting Julia Evans

2026-05-17 00:45:37

[...] in the last 10 years I’ve learned to really love and respect CSS as a technology.

So I decided years ago that I wanted to react to “CSS is hard” by getting better at CSS and taking it seriously as a technology, instead of devaluing it. Doing that changed everything for me: I learned that so many of my frustrations (“centering is impossible”) had been addressed in CSS a long time ago, and that also what “centering” means is not always straightforward and it makes sense that there are many ways to do it. CSS is hard because it’s solving a hard problem!

Julia Evans, Moving away from Tailwind, and learning to structure my CSS

Tags: css, julia-evans