2026-04-23 00:45:23
Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model
Big claims from Qwen about their latest open weight model:Qwen3.6-27B delivers flagship-level agentic coding performance, surpassing the previous-generation open-source flagship Qwen3.5-397B-A17B (397B total / 17B active MoE) across all major coding benchmarks.
On Hugging Face Qwen3.5-397B-A17B is 807GB, this new Qwen3.6-27B is 55.6GB.
I tried it out with the 16.8GB Unsloth Qwen3.6-27B-GGUF:Q4_K_M quantized version and llama-server using this recipe by benob on Hacker News, after first installing llama-server using brew install llama.cpp:
llama-server \
-hf unsloth/Qwen3.6-27B-GGUF:Q4_K_M \
--no-mmproj \
--fit on \
-np 1 \
-c 65536 \
--cache-ram 4096 -ctxcp 2 \
--jinja \
--temp 0.6 \
--top-p 0.95 \
--top-k 20 \
--min-p 0.0 \
--presence-penalty 0.0 \
--repeat-penalty 1.0 \
--reasoning on \
--chat-template-kwargs '{"preserve_thinking": true}'
On first run that saved the ~17GB model to ~/.cache/huggingface/hub/models--unsloth--Qwen3.6-27B-GGUF.
Here's the transcript for "Generate an SVG of a pelican riding a bicycle". This is an outstanding result for a 16.8GB local model:

Performance numbers reported by llama-server:
For good measure, here's Generate an SVG of a NORTH VIRGINIA OPOSSUM ON AN E-SCOOTER (run previously with GLM-5.1):

That one took 6,575 tokens, 4min 25s, 24.74 t/s.
Via Hacker News
Tags: ai, generative-ai, local-llms, llms, qwen, pelican-riding-a-bicycle, llama-cpp, llm-release, ai-in-china
2026-04-22 13:40:56
As part of our continued collaboration with Anthropic, we had the opportunity to apply an early version of Claude Mythos Preview to Firefox. This week’s release of Firefox 150 includes fixes for 271 vulnerabilities identified during this initial evaluation. [...]
Our experience is a hopeful one for teams who shake off the vertigo and get to work. You may need to reprioritize everything else to bring relentless and single-minded focus to the task, but there is light at the end of the tunnel. We are extremely proud of how our team rose to meet this challenge, and others will too. Our work isn’t finished, but we’ve turned the corner and can glimpse a future much better than just keeping up. Defenders finally have a chance to win, decisively.
— Bobby Holley, CTO, Firefox
Tags: anthropic, claude, ai, firefox, llms, mozilla, security, generative-ai, ai-security-research
2026-04-22 11:30:02
Changes to GitHub Copilot Individual plans
On the same day as Claude Code's temporary will-they-won't-they $100/month kerfuffle (for the moment, they won't), here's the latest on GitHub Copilot pricing.Unlike Anthropic, GitHub put up an official announcement about their changes, which include tightening usage limits, pausing signups for individual plans (!), restricting Claude Opus 4.7 to the more expensive $39/month "Pro+" plan, and dropping the previous Opus models entirely.
The key paragraph:
Agentic workflows have fundamentally changed Copilot’s compute demands. Long-running, parallelized sessions now regularly consume far more resources than the original plan structure was built to support. As Copilot’s agentic capabilities have expanded rapidly, agents are doing more work, and more customers are hitting usage limits designed to maintain service reliability.
It's easy to forget that just six months ago heavy LLM users were burning an order of magnitude less tokens. Coding agents consume a lot of compute.
Copilot was also unique (I believe) among agents in charging per-request, not per-token. (Correction: Windsurf also operated a credit system like this which they abandoned last month.) This means that single agentic requests which burn more tokens cut directly into their margins. The most recent pricing scheme addresses that with token-based usage limits on a per-session and weekly basis.
My one problem with this announcement is that it doesn't clearly clarify which product called "GitHub Copilot" is affected by these changes. Last month in How many products does Microsoft have named 'Copilot'? I mapped every one Tey Bannerman identified 75 products that share the Copilot brand, 15 of which have "GitHub Copilot" in the title.
Judging by the linked GitHub Copilot plans page this covers Copilot CLI, Copilot cloud agent and code review (features on GitHub.com itself), and the Copilot IDE features available in VS Code, Zed, JetBrains and more.
Via Hacker News
Tags: github, microsoft, ai, generative-ai, github-copilot, llms, llm-pricing, coding-agents
2026-04-22 10:07:34
Anthropic today quietly (as in silently, no announcement anywhere at all) updated their claude.com/pricing page (but not their Choosing a Claude plan page, which shows up first for me on Google) to add this tiny but significant detail (arrow is mine, and it's already reverted):

The Internet Archive copy from yesterday shows a checkbox there. Claude Code used to be a feature of the $20/month Pro plan, but according to the new pricing page it is now exclusive to the $100/month or $200/month Max plans.
Update: don't miss the update to this post, they've already changed course a few hours after this change went live.
So what the heck is going on? Unsurprisingly, Reddit and Hacker News and Twitter all caught fire.
I didn't believe the screenshots myself when I first saw them - aside from the pricing grid I could find no announcement from Anthropic anywhere. Then Amol Avasare, Anthropic's Head of Growth, tweeted:
For clarity, we're running a small test on ~2% of new prosumer signups. Existing Pro and Max subscribers aren't affected.
And that appears to be the closest we have had to official messaging from Anthropic.
I don't buy the "~2% of new prosumer signups" thing, since everyone I've talked to is seeing the new pricing grid and the Internet Archive has already snapped a copy. Maybe he means that they'll only be running this version of the pricing grid for a limited time which somehow adds up to "2%" of signups?
I'm also amused to see Claude Cowork remain available on the $20/month plan, because Claude Cowork is effectively a rebranded version of Claude Code wearing a less threatening hat!
There are a whole bunch of things that are bad about this.
If we assume this is indeed a test, and that test comes up negative and they decide not to go ahead with it, the damage has still been extensive:
Last month I ran a tutorial for journalists on "Coding agents for data analysis" at the annual NICAR data journalism conference. I'm not going to be teaching that audience a course that depends on a $100/month subscription!
This also doesn't make sense to me as a strategy for Anthropic. Claude Code defined the category of coding agents. It's responsible for billions of dollars in annual revenue for Anthropic already. It has a stellar reputation, but I'm not convinced that reputation is strong enough for it to lose the $20/month trial and jump people directly to a $100/month subscription.
OpenAI have been investing heavily in catching up to Claude Code with their Codex products. Anthropic just handed them this marketing opportunity on a plate - here's Codex engineering lead Thibault Sottiaux:
I don't know what they are doing over there, but Codex will continue to be available both in the FREE and PLUS ($20) plans. We have the compute and efficient models to support it. For important changes, we will engage with the community well ahead of making them.
Transparency and trust are two principles we will not break, even if it means momentarily earning less. A reminder that you vote with your subscription for the values you want to see in this world.
I should note that I pay $200/month for Claude Max and I consider it well worth the money. I've had periods of free access in the past courtesy of Anthropic but I'm currently paying full price, and happy to do so.
But I care about the accessibility of the tools that I work with and teach. If Codex has a free tier while Claude Code starts at $100/month I should obviously switch to Codex, because that way I can use the same tool as the people I want to teach how to use coding agents.
Here's what I think happened. I think Anthropic are trying to optimize revenue growth - obviously - and someone pitched making Claude Code only available for Max and higher. That's clearly a bad idea, but "testing" culture says that it's worth putting even bad ideas out to test just in case they surprise you.
So they started a test, without taking into account the wailing and gnashing of teeth that would result when their test was noticed - or accounting for the longer-term brand damage that would be caused.
Or maybe they did account for that, and decided it was worth the risk.
I don't think that calculation was worthwhile. They're going to have to make a very firm commitment along the lines of "we heard your feedback and we commit to keeping Claude Code available on our $20/month plan going forward" to regain my trust.
As it stands, Codex is looking like a much safer bet for me to invest my time in learning and building educational materials around.
In the time I was typing this blog entry Anthropic appear to have reversed course - the claude.com/pricing page now has a checkbox back in the Pro column for Claude Code. I can't find any official communication about it though.
Let's see if they can come up with an explanation/apology that's convincing enough to offset the trust bonfire from this afternoon!
Amol on Twitter:
was a mistake that the logged-out landing page and docs were updated for this test [embedded self-tweet]
Getting lots of questions on why the landing page / docs were updated if only 2% of new signups were affected.
This was understandably confusing for the 98% of folks not part of the experiment, and we've reverted both the landing page and docs changes.
So the experiment is still running, just not visible to the rest of the world?
Tags: ai, generative-ai, llms, anthropic, llm-pricing, ai-ethics, coding-agents, claude-code, codex-cli
2026-04-22 04:32:24
OpenAI released ChatGPT Images 2.0 today, their latest image generation model. On the livestream Sam Altman said that the leap from gpt-image-1 to gpt-image-2 was equivalent to jumping from GPT-3 to GPT-5. Here's how I put it to the test.
My prompt:
Do a where's Waldo style image but it's where is the raccoon holding a ham radio
First as a baseline here's what I got from the older gpt-image-1 using ChatGPT directly:
I wasn't able to spot the raccoon - I quickly realized that testing image generation models on Where's Waldo style images (Where's Wally in the UK) can be pretty frustrating!
I tried getting Claude Opus 4.7 with its new higher resolution inputs to solve it but it was convinced there was a raccoon it couldn't find thanks to the instruction card at the top left of the image:
Yes — there's at least one raccoon in the picture, but it's very well hidden. In my careful sweep through zoomed-in sections, honestly, I couldn't definitively spot a raccoon holding a ham radio. [...]
Next I tried Google's Nano Banana 2, via Gemini:
That one was pretty obvious, the raccoon is in the "Amateur Radio Club" booth in the center of the image!
Claude said:
Honestly, this one wasn't really hiding — he's the star of the booth. Feels like the illustrator took pity on us after that last impossible scene. The little "W6HAM" callsign pun on the booth sign is a nice touch too.
I also tried Nano Banana Pro in AI Studio and got this, by far the worst result from any model. Not sure what went wrong here!
With the baseline established, let's try out the new model.
I used an updated version of my openai_image.py script, which is a thin wrapper around the OpenAI Python client library. Their client library hasn't yet been updated to include gpt-image-2 but thankfully it doesn't validate the model ID so you can use it anyway.
Here's how I ran that:
OPENAI_API_KEY="$(llm keys get openai)" \
uv run https://tools.simonwillison.net/python/openai_image.py \
-m gpt-image-2 \
"Do a where's Waldo style image but it's where is the raccoon holding a ham radio"Here's what I got back. I don't think there's a raccoon in there - I couldn't spot one, and neither could Claude.
The OpenAI image generation cookbook has been updated with notes on gpt-image-2, including the outputQuality setting and available sizes.
I tried setting outputQuality to high and the dimensions to 3840x2160 - I believe that's the maximum - and got this - a 17MB PNG which I converted to a 5MB WEBP:
OPENAI_API_KEY="$(llm keys get openai)" \
uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
-m gpt-image-2 "Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
--quality high --size 3840x2160That's pretty great! There's a raccoon with a ham radio in there (bottom left, quite easy to spot).
The image used 13,342 output tokens, which are charged at $30/million so a total cost of around 40 cents.
I think this new ChatGPT image generation model takes the crown from Gemini, at least for the moment.
Where's Waldo style images are an infuriating and somewhat foolish way to test these models, but they do help illustrate how good they are getting at complex illustrations combining both text and details.
rizaco on Hacker News asked ChatGPT to draw a red circle around the raccoon in one of the images in which I had failed to find one. Here's an animated mix of their result and the original image:

Looks like we definitely can't trust these models to usefully solve their own puzzles!
Tags: ai, openai, generative-ai, chatgpt, llms, text-to-image, llm-release, nano-banana
2026-04-22 00:39:33
AI agents are already too human. Not in the romantic sense, not because they love or fear or dream, but in the more banal and frustrating one. The current implementations keep showing their human origin again and again: lack of stringency, lack of patience, lack of focus. Faced with an awkward task, they drift towards the familiar. Faced with hard constraints, they start negotiating with reality.
— Andreas Påhlsson-Notini, Less human AI agents, please.
Tags: ai-agents, coding-agents, ai