MoreRSS

site iconSimon WillisonModify

Creator of Datasette and Lanyrd, co-creator of the Django Web Framework.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Simon Willison

Quoting Karel D'Oosterlinck

2026-02-06 08:42:22

When I want to quickly implement a one-off experiment in a part of the codebase I am unfamiliar with, I get codex to do extensive due diligence. Codex explores relevant slack channels, reads related discussions, fetches experimental branches from those discussions, and cherry picks useful changes for my experiment. All of this gets summarized in an extensive set of notes, with links back to where each piece of information was found. Using these notes, codex wires the experiment and makes a bunch of hyperparameter decisions I couldn’t possibly make without much more effort.

Karel D'Oosterlinck, I spent $10,000 to automate my research at OpenAI with Codex

Tags: codex-cli, coding-agents, ai-assisted-programming, generative-ai, openai, ai, llms

Mitchell Hashimoto: My AI Adoption Journey

2026-02-06 07:39:07

Mitchell Hashimoto: My AI Adoption Journey

Some really good and unconventional tips in here for getting to a place with coding agents where they demonstrably improve your workflow and productivity. I particularly liked:
  • Reproduce your own work - when learning to use coding agents Mitchell went through a period of doing the work manually, then recreating the same solution using agents as an exercise:

    I literally did the work twice. I'd do the work manually, and then I'd fight an agent to produce identical results in terms of quality and function (without it being able to see my manual solution, of course).

  • End-of-day agents - letting agents step in when your energy runs out:

    To try to find some efficiency, I next started up a new pattern: block out the last 30 minutes of every day to kick off one or more agents. My hypothesis was that perhaps I could gain some efficiency if the agent can make some positive progress in the times I can't work anyways.

  • Outsource the Slam Dunks - once you know an agent can likely handle a task, have it do that task while you work on something more interesting yourself.

Via Hacker News

Tags: ai, generative-ai, llms, ai-assisted-programming, mitchell-hashimoto, coding-agents

Opus 4.6 and Codex 5.3

2026-02-06 04:29:20

Two major new model releases today, within about 15 minutes of each other.

Anthropic released Opus 4.6. Here's its pelican:

Slightly wonky bicycle frame but an excellent pelican, very clear beak and pouch, nice feathers.

OpenAI release GPT-5.3-Codex, albeit only via their Codex app, not yet in their API. Here's its pelican:

Not nearly as good - the bicycle is a bit mangled, the pelican not nearly as well rendered - it's more of a line drawing.

I've had a bit of preview access to both of these models and to be honest I'm finding it hard to find a good angle to write about them - they're both really good, but so were their predecessors Codex 5.2 and Opus 4.5. I've been having trouble finding tasks that those previous models couldn't handle but the new ones are able to ace.

The most convincing story about capabilities of the new model so far is Nicholas Carlini from Anthropic talking about Opus 4.6 and Building a C compiler with a team of parallel Claudes - Anthropic's version of Cursor's FastRender project.

Tags: llm-release, anthropic, generative-ai, openai, pelican-riding-a-bicycle, ai, llms, parallel-agents, c, nicholas-carlini

Spotlighting The World Factbook as We Bid a Fond Farewell

2026-02-05 08:23:38

Spotlighting The World Factbook as We Bid a Fond Farewell

Somewhat devastating news today from CIA:

One of CIA’s oldest and most recognizable intelligence publications, The World Factbook, has sunset.

There's not even a hint as to why they decided to stop maintaining this publication, which has been their most useful public-facing initiative since 1971 and a cornerstone of the public internet since 1997.

In a bizarre act of cultural vandalism they've not just removed the entire site (including the archives of previous versions) but they've also set every single page to be a 302 redirect to their closure announcement.

The Factbook has been released into the public domain since the start. There's no reason not to continue to serve archived versions - a banner at the top of the page saying it's no longer maintained would be much better than removing all of that valuable content entirely.

Up until 2020 the CIA published annual zip file archives of the entire site. Those are available (along with the rest of the Factbook) on the Internet Archive.

I downloaded the 384MB .zip file for the year 2020 and extracted it into a new GitHub repository, simonw/cia-world-factbook-2020. I've enabled GitHub Pages for that repository so you can browse the archived copy at simonw.github.io/cia-world-factbook-2020/.

Screenshot of the CIA World Factbook website homepage. Header reads "THE WORLD FACTBOOK" with a dropdown labeled "Please select a country to view." Navigation tabs: ABOUT, REFERENCES, APPENDICES, FAQs. Section heading "WELCOME TO THE WORLD FACTBOOK" followed by descriptive text: "The World Factbook provides information on the history, people and society, government, economy, energy, geography, communications, transportation, military, and transnational issues for 267 world entities. The Reference tab includes: a variety of world, regional, country, ocean, and time zone maps; Flags of the World; and a Country Comparison function that ranks the country information and data in more than 75 Factbook fields." A satellite image of Earth is displayed on the right. Below it: "WHAT'S NEW :: Today is: Wednesday, February 4." Left sidebar links with icons: WORLD TRAVEL FACTS, ONE-PAGE COUNTRY SUMMARIES, REGIONAL AND WORLD MAPS, FLAGS OF THE WORLD, GUIDE TO COUNTRY COMPARISONS. Right side shows news updates dated December 17, 2020 about Electricity access and new Economy fields, and December 10, 2020 about Nepal and China agreeing on the height of Mount Everest at 8,848.86 meters. A "VIEW ALL UPDATES" button appears at the bottom.

Here's a neat example of the editorial voice of the Factbook from the What's New page, dated December 10th 2020:

Years of wrangling were brought to a close this week when officials from Nepal and China announced that they have agreed on the height of Mount Everest. The mountain sits on the border between Nepal and Tibet (in western China), and its height changed slightly following an earthquake in 2015. The new height of 8,848.86 meters is just under a meter higher than the old figure of 8,848 meters. The World Factbook rounds the new measurement to 8,849 meters and this new height has been entered throughout the Factbook database.

Via Hacker News

Tags: cia, github, internet-archive

Voxtral transcribes at the speed of sound

2026-02-05 06:42:34

Voxtral transcribes at the speed of sound

Mistral just released Voxtral Transcribe 2 - a family of two new models, one open weights, for transcribing audio to text. This is the latest in their Whisper-like model family, and a sequel to the original Voxtral which they released in July 2025.

Voxtral Realtime - official name Voxtral-Mini-4B-Realtime-2602 - is the open weights (Apache-2.0) model, available as a 8.87GB download from Hugging Face.

You can try it out in this live demo - don't be put off by the "No microphone found" message, clicking "Record" should have your browser request permission and then start the demo working. I was very impressed by the demo - I talked quickly and used jargon like Django and WebAssembly and it correctly transcribed my text within moments of me uttering each sound.

The closed weight model is called voxtral-mini-latest and can be accessed via the Mistral API, using calls that look something like this:

curl -X POST "https://api.mistral.ai/v1/audio/transcriptions" \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -F model="voxtral-mini-latest" \
  -F file=@"Pelican talk at the library.m4a" \
  -F diarize=true \
  -F context_bias="Datasette" \
  -F timestamp_granularities="segment"

It's priced at $0.003/minute, which is $0.18/hour.

The Mistral API console now has a speech-to-text playground for exercising the new model and it is excellent. You can upload an audio file and promptly get a diarized transcript in a pleasant interface, with options to download the result in text, SRT or JSON format.

Screenshot of a speech-to-text transcription interface for a file named "Pelican talk at the library.m4a". The toolbar shows "Speech to text" with Code, Transcribe, and Download buttons. The transcript shows timestamped segments from 5:53 to 6:53 with a speaker icon, reading: "5:53 – 6:01 So pelicans love to, they're very good at getting the most they can out of the topography when they're flying. 6:01 – 6:06 And our winds come in from the northwest and they hit those bluffs and they're deflected up. 6:07 – 6:18 And they will sit right, they'll fly north into a wind like five feet off those bluffs, but just five or ten feet off the surface because the winds dissipate. 6:19 – 6:22 And they will surf that bluff all the way north. 6:23 – 6:30 So you'll see a wind from the north at 15 miles an hour, and the pelicans are flying north into that wind and not flapping their wings. 6:31 – 6:33 And it's one of the coolest things. 6:33 – 6:35 You can only find it on San Francisco Coast. 6:36 – 6:39 Where right where the bluffs are steep. 6:41 – 6:43 Pacifica, you can find them there. 6:43 – 6:51 They like their, what we call pier bums, which are typically pelicans that have, are in some sort of trouble. 6:51 – 6:53 They're unable to catch food." The segment at 6:41–6:43 is highlighted in yellow. An audio waveform is shown at the bottom with a playhead near 6:40. Stats in the lower right show 53.90s, 7946.00s, and #45833.

Via Hacker News

Tags: ai, generative-ai, llms, hugging-face, mistral, speech-to-text

Distributing Go binaries like sqlite-scanner through PyPI using go-to-wheel

2026-02-04 22:59:47

I've been exploring Go for building small, fast and self-contained binary applications recently. I'm enjoying how there's generally one obvious way to do things and the resulting code is boring and readable - and something that LLMs are very competent at writing. The one catch is distribution, but it turns out publishing Go binaries to PyPI means any Go binary can be just a uvx package-name call away.

sqlite-scanner

sqlite-scanner is my new Go CLI tool for scanning a filesystem for SQLite database files.

It works by checking if the first 16 bytes of the file exactly match the SQLite magic number sequence SQLite format 3\x00. It can search one or more folders recursively, spinning up concurrent goroutines to accelerate the scan. It streams out results as it finds them in plain text, JSON or newline-delimited JSON. It can optionally display the file sizes as well.

To try it out you can download a release from the GitHub releases - and then jump through macOS hoops to execute an "unsafe" binary. Or you can clone the repo and compile it with Go. Or... you can run the binary like this:

uvx sqlite-scanner

By default this will search your current directory for SQLite databases. You can pass one or more directories as arguments:

uvx sqlite-scanner ~ /tmp

Add --json for JSON output, --size to include file sizes or --jsonl for newline-delimited JSON. Here's a demo:

uvx sqlite-scanner ~ --jsonl --size

running that command produces a sequence of JSON objects, each with a path and a size key

If you haven't been uv-pilled yet you can instead install sqlite-scanner using pip install sqlite-scanner and then run sqlite-scanner.

To get a permanent copy with uv use uv tool install sqlite-scanner.

How the Python package works

The reason this is worth doing is that pip, uv and PyPI will work together to identify the correct compiled binary for your operating system and architecture.

This is driven by file names. If you visit the PyPI downloads for sqlite-scanner you'll see the following files:

  • sqlite_scanner-0.1.1-py3-none-win_arm64.whl
  • sqlite_scanner-0.1.1-py3-none-win_amd64.whl
  • sqlite_scanner-0.1.1-py3-none-musllinux_1_2_x86_64.whl
  • sqlite_scanner-0.1.1-py3-none-musllinux_1_2_aarch64.whl
  • sqlite_scanner-0.1.1-py3-none-manylinux_2_17_x86_64.whl
  • sqlite_scanner-0.1.1-py3-none-manylinux_2_17_aarch64.whl
  • sqlite_scanner-0.1.1-py3-none-macosx_11_0_arm64.whl
  • sqlite_scanner-0.1.1-py3-none-macosx_10_9_x86_64.whl

When I run pip install sqlite-scanner or uvx sqlite-scanner on my Apple Silicon Mac laptop Python's packaging magic ensures I get that macosx_11_0_arm64.whl variant.

Here's what's in the wheel, which is a zip file with a .whl extension.

In addition to the bin/sqlite-scanner the most important file is sqlite_scanner/__init__.py which includes the following:

def get_binary_path():
    """Return the path to the bundled binary."""
    binary = os.path.join(os.path.dirname(__file__), "bin", "sqlite-scanner")
 
    # Ensure binary is executable on Unix
    if sys.platform != "win32":
        current_mode = os.stat(binary).st_mode
        if not (current_mode & stat.S_IXUSR):
            os.chmod(binary, current_mode | stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH)
 
    return binary
 
 
def main():
    """Execute the bundled binary."""
    binary = get_binary_path()
 
    if sys.platform == "win32":
        # On Windows, use subprocess to properly handle signals
        sys.exit(subprocess.call([binary] + sys.argv[1:]))
    else:
        # On Unix, exec replaces the process
        os.execvp(binary, [binary] + sys.argv[1:])

That main() method - also called from sqlite_scanner/__main__.py - locates the binary and executes it when the Python package itself is executed, using the sqlite-scanner = sqlite_scanner:main entry point defined in the wheel.

Which means we can use it as a dependency

Using PyPI as a distribution platform for Go binaries feels a tiny bit abusive, albeit there is plenty of precedent.

I’ll justify it by pointing out that this means we can use Go binaries as dependencies for other Python packages now.

That's genuinely useful! It means that any functionality which is available in a cross-platform Go binary can now be subsumed into a Python package. Python is really good at running subprocesses so this opens up a whole world of useful tricks that we can bake into our Python tools.

To demonstrate this, I built datasette-scan - a new Datasette plugin which depends on sqlite-scanner and then uses that Go binary to scan a folder for SQLite databases and attach them to a Datasette instance.

Here's how to use that (without even installing anything first, thanks uv) to explore any SQLite databases in your Downloads folder:

uv run --with datasette-scan datasette scan ~/Downloads

If you peek at the code you'll see it depends on sqlite-scanner in pyproject.toml and calls it using subprocess.run() against sqlite_scanner.get_binary_path() in its own scan_directories() function.

I've been exploring this pattern for other, non-Go binaries recently - here's a recent script that depends on static-ffmpeg to ensure that ffmpeg is available for the script to use.

Building Python wheels from Go packages with go-to-wheel

After trying this pattern myself a couple of times I realized it would be useful to have a tool to automate the process.

I first brainstormed with Claude to check that there was no existing tool to do this. It pointed me to maturin bin which helps distribute Rust projects using Python wheels, and pip-binary-factory which bundles all sorts of other projects, but did not identify anything that addressed the exact problem I was looking to solve.

So I had Claude Code for web build the first version, then refined the code locally on my laptop with the help of more Claude Code and a little bit of OpenAI Codex too, just to mix things up.

The full documentation is in the simonw/go-to-wheel repository. I've published that tool to PyPI so now you can run it using:

uvx go-to-wheel --help

The sqlite-scanner package you can see on PyPI was built using go-to-wheel like this:

uvx go-to-wheel ~/dev/sqlite-scanner \
  --set-version-var main.version \
  --version 0.1.1 \
  --readme README.md \
  --author 'Simon Willison' \
  --url https://github.com/simonw/sqlite-scanner \
  --description 'Scan directories for SQLite databases'

This created a set of wheels in the dist/ folder. I tested one of them like this:

uv run --with dist/sqlite_scanner-0.1.1-py3-none-macosx_11_0_arm64.whl \
  sqlite-scanner --version

When that spat out the correct version number I was confident everything had worked as planned, so I pushed the whole set of wheels to PyPI using twine upload like this:

uvx twine upload dist/*

I had to paste in a PyPI API token I had saved previously and that was all it took.

I expect to use this pattern a lot

sqlite-scanner is very clearly meant as a proof-of-concept for this wider pattern - Python is very much capable of recursively crawling a directory structure looking for files that start with a specific byte prefix on its own!

That said, I think there's a lot to be said for this pattern. Go is a great complement to Python - it's fast, compiles to small self-contained binaries, has excellent concurrency support and a rich ecosystem of libraries.

Go is similar to Python in that it has a strong standard library. Go is particularly good for HTTP tooling - I've built several HTTP proxies in the past using Go's excellent net/http/httputil.ReverseProxy handler.

I've also been experimenting with wazero, Go's robust and mature zero dependency WebAssembly runtime as part of my ongoing quest for the ideal sandbox for running untrusted code. Here's my latest experiment with that library.

Being able to seamlessly integrate Go binaries into Python projects without the end user having to think about Go at all - they pip install and everything Just Works - feels like a valuable addition to my toolbox.

Tags: go, packaging, projects, pypi, python, sqlite, datasette, ai-assisted-programming, uv