2026-04-22 04:32:24
OpenAI released ChatGPT Images 2.0 today, their latest image generation model. On the livestream Sam Altman said that the leap from gpt-image-1 to gpt-image-2 was equivalent to jumping from GPT-3 to GPT-5. Here's how I put it to the test.
My prompt:
Do a where's Waldo style image but it's where is the raccoon holding a ham radio
First as a baseline here's what I got from the older gpt-image-1 using ChatGPT directly:
I wasn't able to spot the raccoon - I quickly realized that testing image generation models on Where's Waldo style images (Where's Wally in the UK) can be pretty frustrating!
I tried getting Claude Opus 4.7 with its new higher resolution inputs to solve it but it was convinced there was a raccoon it couldn't find thanks to the instruction card at the top left of the image:
Yes — there's at least one raccoon in the picture, but it's very well hidden. In my careful sweep through zoomed-in sections, honestly, I couldn't definitively spot a raccoon holding a ham radio. [...]
Next I tried Google's Nano Banana 2, via Gemini:
That one was pretty obvious, the raccoon is in the "Amateur Radio Club" booth in the center of the image!
Claude said:
Honestly, this one wasn't really hiding — he's the star of the booth. Feels like the illustrator took pity on us after that last impossible scene. The little "W6HAM" callsign pun on the booth sign is a nice touch too.
I also tried Nano Banana Pro in AI Studio and got this, by far the worst result from any model. Not sure what went wrong here!
With the baseline established, let's try out the new model.
I used an updated version of my openai_image.py script, which is a thin wrapper around the OpenAI Python client library. Their client library hasn't yet been updated to include gpt-image-2 but thankfully it doesn't validate the model ID so you can use it anyway.
Here's how I ran that:
OPENAI_API_KEY="$(llm keys get openai)" \
uv run https://tools.simonwillison.net/python/openai_image.py \
-m gpt-image-2 \
"Do a where's Waldo style image but it's where is the raccoon holding a ham radio"Here's what I got back. I don't think there's a raccoon in there - I couldn't spot one, and neither could Claude.
The OpenAI image generation cookbook has been updated with notes on gpt-image-2, including the outputQuality setting and available sizes.
I tried setting outputQuality to high and the dimensions to 3840x2160 - I believe that's the maximum - and got this - a 17MB PNG which I converted to a 5MB WEBP:
OPENAI_API_KEY="$(llm keys get openai)" \
uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
-m gpt-image-2 "Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
--quality high --size 3840x2160That's pretty great! There's a raccoon with a ham radio in there (bottom left, quite easy to spot).
The image used 13,342 output tokens, which are charged at $30/million so a total cost of around 40 cents.
I think this new ChatGPT image generation model takes the crown from Gemini, at least for the moment.
Where's Waldo style images are an infuriating and somewhat foolish way to test these models, but they do help illustrate how good they are getting at complex illustrations combining both text and details.
rizaco on Hacker News asked ChatGPT to draw a red circle around the raccoon in one of the images in which I had failed to find one. Here's an animated mix of their result and the original image:

Looks like we definitely can't trust these models to usefully solve their own puzzles!
Tags: ai, openai, generative-ai, chatgpt, llms, text-to-image, llm-release, nano-banana
2026-04-22 00:39:33
AI agents are already too human. Not in the romantic sense, not because they love or fear or dream, but in the more banal and frustrating one. The current implementations keep showing their human origin again and again: lack of stringency, lack of patience, lack of focus. Faced with an awkward task, they drift towards the familiar. Faced with hard constraints, they start negotiating with reality.
— Andreas Påhlsson-Notini, Less human AI agents, please.
Tags: ai-agents, coding-agents, ai
2026-04-21 23:54:43
scosman/pelicans_riding_bicycles
I firmly approve of Steve Cosman's efforts to pollute the training set of pelicans riding bicycles.
(To be fair, most of the examples I've published count as poisoning too.)
Tags: ai, generative-ai, llms, training-data, pelican-riding-a-bicycle
2026-04-21 02:00:26
Release: llm-openrouter 0.6
llm openrouter refreshcommand for refreshing the list of available models without waiting for the cache to expire.
I added this feature so I could try Kimi 2.6 on OpenRouter as soon as it became available there.
Here's its pelican - this time as an HTML page because Kimi chose to include an HTML and JavaScript UI to control the animation. Transcript here.

Tags: openrouter, llm, llm-release, pelican-riding-a-bicycle, kimi, ai-in-china, llms, ai, generative-ai
2026-04-20 10:33:58
TIL: SQL functions in Google Sheets to fetch data from Datasette
I put together some notes on patterns for fetching data from a Datasette instance directly into Google Sheets - using the importdata() function, a "named function" that wraps it or a Google Apps Script if you need to send an API token in an HTTP header (not supported by importdata().)
Here's an example sheet demonstrating all three methods.
Tags: spreadsheets, datasette, google
2026-04-20 08:50:45
Claude Token Counter, now with model comparisons
I upgraded my Claude Token Counter tool to add the ability to run the same count against different models in order to compare them.As far as I can tell Claude Opus 4.7 is the first model to change the tokenizer, so it's only worth running comparisons between 4.7 and 4.6. The Claude token counting API accepts any Claude model ID though so I've included options for all four of the notable current models (Opus 4.7 and 4.6, Sonnet 4.6, and Haiku 4.5).
In the Opus 4.7 announcement Anthropic said:
Opus 4.7 uses an updated tokenizer that improves how the model processes text. The tradeoff is that the same input can map to more tokens—roughly 1.0–1.35× depending on the content type.
I pasted the Opus 4.7 system prompt into the token counting tool and found that the Opus 4.7 tokenizer used 1.46x the number of tokens as Opus 4.6.

Opus 4.7 uses the same pricing is Opus 4.6 - $5 per million input tokens and $25 per million output tokens - but this token inflation means we can expect it to be around 40% more expensive.
The token counter tool also accepts images. Opus 4.7 has improved image support, described like this:
Opus 4.7 has better vision for high-resolution images: it can accept images up to 2,576 pixels on the long edge (~3.75 megapixels), more than three times as many as prior Claude models.
I tried counting tokens for a 3456x2234 pixel 3.7MB PNG and got an even bigger increase in token counts - 3.01x times the number of tokens for 4.7 compared to 4.6:

Update: That 3x increase for images is entirely due to Opus 4.7 being able to handle higher resolutions. I tried that again with a 682x318 pixel image and it took 314 tokens with Opus 4.7 and 310 with Opus 4.6, so effectively the same cost.
Update 2: I tried a 15MB, 30 page text-heavy PDF and Opus 4.7 reported 60,934 tokens while 4.6 reported 56,482 - that's a 1.08x multiplier, significantly lower than the multiplier I got for raw text.
Tags: ai, generative-ai, llms, anthropic, claude, llm-pricing, tokenization