2026-03-18 03:39:17
OpenAI today: Introducing GPT‑5.4 mini and nano. These models join GPT-5.4 which was released two weeks ago.
OpenAI's self-reported benchmarks show the new 5.4-nano out-performing their previous GPT-5 mini model when run at maximum reasoning effort. The new mini is also 2x faster than the previous mini.
Here's how the pricing looks - gpt-5.4-nano is notably even cheaper than Google's Gemini 3.1 Flash-Lite:
| Model | Input | Cached input | Output |
|---|---|---|---|
| gpt-5.4 | $2.50 | $0.25 | $15.00 |
| gpt-5.4-mini | $0.75 | $0.075 | $4.50 |
| gpt-5.4-nano | $0.20 | $0.02 | $1.25 |
| Claude Opus 4.6 | $5.00 | - | $25.00 |
| Claude Sonnet 4.6 | $3.00 | - | $15.00 |
| Gemini 3.1 Pro | $2.00 | - | $12.00 |
| Claude Haiku 4.5 | $1.00 | - | $5.00 |
| Gemini 3.1 Flash-Lite | $0.25 | - | $1.50 |
I used GPT-5.4 nano to generate a description of this photo I took at the John M. Mossman Lock Collection:

llm -m gpt-5.4-nano -a IMG_2324.jpeg 'describe image'
Here's the output:
The image shows the interior of a museum gallery with a long display wall. White-painted brick walls are covered with many framed portraits arranged in neat rows. Below the portraits, there are multiple glass display cases with dark wooden frames and glass tops/fronts, containing various old historical objects and equipment. The room has a polished wooden floor, hanging ceiling light fixtures/cords, and a few visible pipes near the top of the wall. In the foreground, glass cases run along the length of the room, reflecting items from other sections of the gallery.
That took 2,751 input tokens and 112 output tokens, at a cost of 0.069 cents (less than a tenth of a cent). That means describing every single photo in my 76,000 photo collection would cost around $52.44.
I released llm 0.29 with support for the new models.
Then I had OpenAI Codex loop through all five reasoning effort levels and all three models and produce this combined SVG grid of pelicans riding bicycles (generation transcripts here). I do like the gpt-5.4 xhigh one the best, it has a good bicycle (with nice spokes) and the pelican has a fish in its beak!
Tags: ai, openai, generative-ai, llms, llm, vision-llms, llm-pricing, pelican-riding-a-bicycle, llm-release
2026-03-18 00:13:37
If you do not understand the ticket, if you do not understand the solution, or if you do not understand the feedback on your PR, then your use of LLM is hurting Django as a whole. [...]
For a reviewer, it’s demoralizing to communicate with a facade of a human.
This is because contributing to open source, especially Django, is a communal endeavor. Removing your humanity from that experience makes that endeavor more difficult. If you use an LLM to contribute to Django, it needs to be as a complementary tool, not as your vehicle.
— Tim Schilling, Give Django your time and money, not your tokens
Tags: ai-ethics, open-source, generative-ai, ai, django, llms
2026-03-17 20:32:28
Agentic Engineering Patterns >
LLMs are restricted by their context limit - how many tokens they can fit in their working memory at any given time. These values have not increased much over the past two years even as the LLMs themselves have seen dramatic improvements in their abilities - they generally top out at around 1,000,000, and benchmarks frequently report better quality results below 200,000.
Carefully managing the context such that it fits within those limits is critical to getting great results out of a model.
Subagents provide a simple but effective way to handle larger tasks without burning through too much of the coding agent’s valuable top-level context.
When a coding agent uses a subagent it effectively dispatches a fresh copy of itself to achieve a specified goal, with a new context window that starts with a fresh prompt.
Claude Code uses subagents extensively as part of its standard way of working. Let's use that as an illustrative example.
Any time you start a new task against an existing repo Claude Code first needs to explore that repo to figure out its general shape and find relevant information needed to achieve that task.
It does this by constructing a prompt and dispatching a subagent to perform that exploration and return a description of what it finds.
I kicked off a Claude Code session with a screenshot of a page from my guide and this prompt:
Make the chapter diffs also show which characters have changed in this diff view with a darker color of red or green for the individually changed segments of text within the line
Here's the transcript of that session.
Claude Code started by running an “Explore” subagent with the following starter prompt:
Find the code that implements the diff view for "chapters" in this Django blog. I need to find:
- Templates that render diffs (look for diff-related HTML/CSS with red/green backgrounds)
- Python code that generates diffs (look for difflib usage or similar)
- Any JavaScript related to diff rendering
- CSS styles for the diff view (red/green line backgrounds)
Search thoroughly - check templates/, static/, blog/ directories. Look for keywords like "diff", "chapter", "revision", "history", "compare".
Subagents work similar to any other tool call: the parent agent dispatches them just as they would any other tool and waits for the response. It's interesting to see models prompt themselves in this way - they generally have good taste in prompting strategies.
The Explore subagent churned away for a few seconds and returned a response that started like this:
Perfect! Now let me create a comprehensive summary with all the findings:
Summary: Chapter Diff View Implementation
I found the complete implementation of the diff view for chapters in this Django blog. Here are the key components:
1. Python Code - Diff Generation (
/home/user/simonwillisonblog/guides/views.py, lines 139-202)The
chapter_changes()view function uses Python's built-indifflib.unified_diff()to generate diffs: […]
The full subagent response included all of the details the parent agent needed in order to start editing the code to address my original request.
This Explore subagent is the simplest example of how subagents can work, with the parent agent pausing while the subagent runs. The principle advantage of this kind of subagent is that it can work with a fresh context in a way that avoids spending tokens from the parent’s available limit.
Subagents can also provide a significant performance boost by having the parent agent run multiple subagents at the same time, potentially also using faster and cheaper models such as Claude Haiku to accelerate those tasks.
Coding agents that support subagents can use them based on your instructions. Try prompts like this:
Use subagents to find and update all of the templates that are affected by this change.
Some coding agents allow subagents to run with further customizations, often in the form of a custom system prompt or custom tools or both, which allow those subagents to take on a different role.
These roles can cover a variety of useful specialties:
While it can be tempting to go overboard breaking up tasks across dozens of different specialist subagents, it's important to remember that the main value of subagents is in preserving that valuable root context and managing token-heavy operations. Your root coding agent is perfectly capable of debugging or reviewing its own output provided it has the tokens to spare.
Several popular coding agents support subagents, each with their own documentation on how to use them:
Tags: parallel-agents, coding-agents, generative-ai, agentic-engineering, ai, llms
2026-03-17 07:41:17
Big new release from Mistral today (despite the name) - a new Apache 2 licensed 119B parameter (Mixture-of-Experts, 6B active) model which they describe like this:
Mistral Small 4 is the first Mistral model to unify the capabilities of our flagship models, Magistral for reasoning, Pixtral for multimodal, and Devstral for agentic coding, into a single, versatile model.
It supports reasoning_effort="none" or reasoning_effort="high", with the latter providing "equivalent verbosity to previous Magistral models".
The new model is 242GB on Hugging Face.
I tried it out via the Mistral API using llm-mistral:
llm install llm-mistral
llm mistral refresh
llm -m mistral/mistral-small-2603 "Generate an SVG of a pelican riding a bicycle"

I couldn't find a way to set the reasoning effort in their API documentation, so hopefully that's a feature which will land soon.
Also from Mistral today and fitting their -stral naming convention is Leanstral, an open weight model that is specifically tuned to help output the Lean 4 formally verifiable coding language. I haven't explored Lean at all so I have no way to credibly evaluate this, but it's interesting to see them target one specific language in this way.
Tags: ai, generative-ai, llms, llm, mistral, pelican-riding-a-bicycle, llm-reasoning, llm-release
2026-03-17 07:03:56
Use subagents and custom agents in Codex
Subagents were announced in general availability today for OpenAI Codex, after several weeks of preview behind a feature flag.They're very similar to the Claude Code implementation, with default subagents for "explorer", "worker" and "default". It's unclear to me what the difference between "worker" and "default" is but based on their CSV example I think "worker" is intended for running large numbers of small tasks in parallel.
Codex also lets you define custom agents as TOML files in ~/.codex/agents/. These can have custom instructions and be assigned to use specific models - including gpt-5.3-codex-spark if you want some raw speed. They can then be referenced by name, as demonstrated by this example prompt from the documentation:
Investigate why the settings modal fails to save. Have browser_debugger reproduce it, code_mapper trace the responsible code path, and ui_fixer implement the smallest fix once the failure mode is clear.
The subagents pattern is widely supported in coding agents now. Here's documentation across a number of different platforms:
Update: I added a chapter on Subagents to my Agentic Engineering Patterns guide.
Via @OpenAIDevs
Tags: ai, openai, generative-ai, llms, coding-agents, codex-cli, parallel-agents, agentic-engineering
2026-03-17 05:38:55
The point of the blackmail exercise was to have something to describe to policymakers—results that are visceral enough to land with people, and make misalignment risk actually salient in practice for people who had never thought about it before.
— A member of Anthropic’s alignment-science team, as told to Gideon Lewis-Kraus