2025-11-07 05:44:33
At the start of the year, most people loosely following AI probably knew of 0 [Chinese] AI labs. Now, and towards wrapping up 2025, I’d say all of DeepSeek, Qwen, and Kimi are becoming household names. They all have seasons of their best releases and different strengths. The important thing is this’ll be a growing list. A growing share of cutting edge mindshare is shifting to China. I expect some of the likes of Z.ai, Meituan, or Ant Ling to potentially join this list next year. For some of these labs releasing top tier benchmark models, they literally started their foundation model effort after DeepSeek. It took many Chinese companies only 6 months to catch up to the open frontier in ballpark of performance, now the question is if they can offer something in a niche of the frontier that has real demand for users.
— Nathan Lambert, 5 Thoughts on Kimi K2 Thinking
Tags: llms, ai, ai-in-china, generative-ai, nathan-lambert
2025-11-07 02:26:05
I'm upgrading various plugins for compatibility with the new Datasette 1.0a20 alpha release and I decided to record a video of the process. This post accompanies that video with detailed additional notes.
I picked a very simple plugin to illustrate the upgrade process (possibly too simple). datasette-checkbox adds just one feature to Datasette: if you are viewing a table with boolean columns (detected as integer columns with names like is_active or has_attachments or should_notify) and your current user has permission to update rows in that table it adds an inline checkbox UI that looks like this:

I built the first version with the help of Claude back in August 2024 - details in this issue comment.
Most of the implementation is JavaScript that makes calls to Datasette 1.0's JSON write API. The Python code just checks that the user has the necessary permissions before including the extra JavaScript.
The first step in upgrading any plugin is to run its tests against the latest Datasette version.
Thankfully uv makes it easy to run code in scratch virtual environments that include the different code versions you want to test against.
I have a test utility called tadd (for "test against development Datasette") which I use for that purpose. I can run it in any plugin directory like this:
taddAnd it will run the existing plugin tests against whatever version of Datasette I have checked out in my ~/dev/datasette directory.
You can see the full implementation of tadd (and its friend radd described below) in this TIL - the basic version looks like this:
#!/bin/sh
uv run --no-project --isolated \
--with-editable '.[test]' --with-editable ~/dev/datasette \
python -m pytest "$@"I started by running tadd in the datasette-checkbox directory, and got my first failure... but it wasn't due to permissions, it was because the pyproject.toml for the plugin was pinned to a specific mismatched version of Datasette:
dependencies = [
"datasette==1.0a19"
]I fixed this problem by swapping == to >= and ran the tests again... and they passed! Which was a problem because I was expecting permission-related failures.
It turns out when I first wrote the plugin I was lazy with the tests - they weren't actually confirming that the table page loaded without errors.
I needed to actually run the code myself to see the expected bug.
First I created myself a demo database using sqlite-utils create-table:
sqlite-utils create-table demo.db \
demo id integer is_checked integer --pk idThen I ran it with Datasette against the plugin's code like so:
radd demo.dbSure enough, visiting /demo/demo produced a 500 error about the missing Datasette.permission_allowed() method.
The next step was to update the test to also trigger this error:
@pytest.mark.asyncio async def test_plugin_adds_javascript(): datasette = Datasette() db = datasette.add_memory_database("demo") await db.execute_write( "CREATE TABLE IF NOT EXISTS test (id INTEGER PRIMARY KEY, is_active INTEGER)" ) await datasette.invoke_startup() response = await datasette.client.get("/demo/test") assert response.status_code == 200
And now tadd fails as expected.
It this point I could have manually fixed the plugin itself - which would likely have been faster given the small size of the fix - but instead I demonstrated a bash one-liner I've been using to apply these kinds of changes automatically:
codex exec --dangerously-bypass-approvals-and-sandbox \
"Run the command tadd and look at the errors and then
read ~/dev/datasette/docs/upgrade-1.0a20.md and apply
fixes and run the tests again and get them to pass"codex exec runs OpenAI Codex in non-interactive mode - it will loop until it has finished the prompt you give it.
I tell it to consult the subset of the Datasette upgrade documentation that talks about Datasette permissions and then get the tadd command to pass its tests.
This is an example of what I call designing agentic loops - I gave Codex the tools it needed (tadd) and a clear goal and let it get to work on my behalf.
The remainder of the video covers finishing up the work - testing the fix manually, commiting my work using:
git commit -a -m "$(basename "$PWD") for datasette>=1.0a20" \
-m "Refs https://github.com/simonw/datasette/issues/2577"Then shipping a 0.1a4 release to PyPI using the pattern described in this TIL.
Finally, I demonstrated that the shipped plugin worked in a fresh environment using uvx like this:
uvx --prerelease=allow --with datasette-checkbox \
datasette --root ~/dev/ecosystem/datasette-checkbox/demo.dbExecuting this command installs and runs a fresh Datasette instance with a fresh copy of the new alpha plugin (--prerelease=allow). It's a neat way of confirming that freshly released software works as expected.
This video was shot in a single take using Descript, with no rehearsal and perilously little preparation in advance. I recorded through my AirPods and applied the "Studio Sound" filter to clean up the audio. I pasted in a simonwillison.net closing slide from my previous video and exported it locally at 1080p, then uploaded it to YouTube.
Something I learned from the Software Carpentry instructor training course is that making mistakes in front of an audience is actively helpful - it helps them see a realistic version of how software development works and they can learn from watching you recover. I see this as a great excuse for not editing out all of my mistakes!
I'm trying to build new habits around video content that let me produce useful videos while minimizing the amount of time I spend on production.
I plan to iterate more on the format as I get more comfortable with the process. I'm hoping I can find the right balance between production time and value to viewers.
Tags: plugins, python, youtube, ai, datasette, generative-ai, llms, ai-assisted-programming, uv, coding-agents, codex-cli
2025-11-06 23:53:23
I've been experimenting with a pattern for LLM usage recently that's working out really well: asynchronous code research tasks. Pick a research question, spin up an asynchronous coding agent and let it go and run some experiments and report back when it's done.
Software development benefits enormously from something I call code research. The great thing about questions about code is that they can often be definitively answered by writing and executing code.
I often see questions on forums which hint at a lack of understanding of this skill.
"Could Redis work for powering the notifications feed for my app?" is a great example. The answer is always "it depends", but a better answer is that a good programmer already has everything they need to answer that question for themselves. Build a proof-of-concept, simulate the patterns you expect to see in production, then run experiments to see if it's going to work.
I've been a keen practitioner of code research for a long time. Many of my most interesting projects started out as a few dozen lines of experimental code to prove to myself that something was possible.
It turns out coding agents like Claude Code and Codex are a fantastic fit for this kind of work as well. Give them the right goal and a useful environment and they'll churn through a basic research project without any further supervision.
LLMs hallucinate and make mistakes. This is far less important for code research tasks because the code itself doesn't lie: if they write code and execute it and it does the right things then they've demonstrated to both themselves and to you that something really does work.
They can't prove something is impossible - just because the coding agent couldn't find a way to do something doesn't mean it can't be done - but they can often demonstrate that something is possible in just a few minutes of crunching.
I've used interactive coding agents like Claude Code and Codex CLI for a bunch of these, but today I'm increasingly turning to their asynchronous coding agent family members instead.
An asynchronous coding agent is a coding agent that operates on a fire-and-forget basis. You pose it a task, it churns away on a server somewhere and when it's done it files a pull request against your chosen GitHub repository.
OpenAI's Codex Cloud, Anthropic's Claude Code for web, Google Gemini's Jules, and GitHub's Copilot coding agent are four prominent examples of this pattern.
These are fantastic tools for code research projects. Come up with a clear goal, turn it into a few paragraphs of prompt, set them loose and check back ten minutes later to see what they've come up with.
I'm firing off 2-3 code research projects a day right now. My own time commitment is minimal and they frequently come back with useful or interesting results.
You can run a code research task against an existing GitHub repository, but I find it's much more liberating to have a separate, dedicated repository for your coding agents to run their projects in.
This frees you from being limited to research against just code you've already written, and also means you can be much less cautious about what you let the agents do.
I have two repositories that I use for this - one public, one private. I use the public one for research tasks that have no need to be private, and the private one for anything that I'm not yet ready to share with the world.
The biggest benefit of a dedicated repository is that you don't need to be cautious about what the agents operating in that repository can do.
Both Codex Cloud and Claude Code for web default to running agents in a locked-down environment, with strict restrictions on how they can access the network. This makes total sense if they are running against sensitive repositories - a prompt injection attack of the lethal trifecta variety could easily be used to steal sensitive code or environment variables.
If you're running in a fresh, non-sensitive repository you don't need to worry about this at all! I've configured my research repositories for full network access, which means my coding agents can install any dependencies they need, fetch data from the web and generally do anything I'd be able to do on my own computer.
Let's dive into some examples. My public research repository is at simonw/research on GitHub. It currently contains 13 folders, each of which is a separate research project. I only created it two weeks ago so I'm already averaging nearly one a day!
It also includes a GitHub Workflow which uses GitHub Models to automatically update the README file with a summary of every new project, using Cog, LLM, llm-github-models and this snippet of Python.
Here are a some example research projects from the repo.
node-pyodide shows an example of a Node.js script that runs the Pyodide WebAssembly distribution of Python inside it - yet another of my ongoing attempts to find a great way of running Python in a WebAssembly sandbox on a server.
python-markdown-comparison (transcript) provides a detailed performance benchmark of seven different Python Markdown libraries. I fired this one off because I stumbled across cmarkgfm, a Python binding around GitHub's Markdown implementation in C, and wanted to see how it compared to the other options. This one produced some charts! cmarkgfm came out on top by a significant margin:

Here's the entire prompt I used for that project:
Create a performance benchmark and feature comparison report on PyPI cmarkgfm compared to other popular Python markdown libraries - check all of them out from github and read the source to get an idea for features, then design and run a benchmark including generating some charts, then create a report in a new python-markdown-comparison folder (do not create a _summary.md file or edit anywhere outside of that folder). Make sure the performance chart images are directly displayed in the README.md in the folder.
Note that I didn't specify any Markdown libraries other than cmarkgfm - Claude Code ran a search and found the other six by itself.
cmarkgfm-in-pyodide is a lot more fun. A neat thing about having all of my research projects in the same repository is that new projects can build on previous ones. Here I decided to see how hard it would be to get cmarkgfm - which has a C extension - working inside Pyodide inside Node.js. Claude successfully compiled a 88.4KB cmarkgfm_pyodide-2025.10.22-cp312-cp312-emscripten_3_1_46_wasm32.whl file with the necessary C extension and proved it could be loaded into Pyodide in WebAssembly inside of Node.js.
I ran this one using Claude Code on my laptop after an initial attempt failed. The starting prompt was:
Figure out how to get the cmarkgfm markdown lover [typo in prompt, this should have been "library" but it figured it out anyway] for Python working in pyodide. This will be hard because it uses C so you will need to compile it to pyodide compatible webassembly somehow. Write a report on your results plus code to a new cmarkgfm-in-pyodide directory. Test it using pytest to exercise a node.js test script that calls pyodide as seen in the existing node.js and pyodide directory
There is an existing branch that was an initial attempt at this research, but which failed because it did not have Internet access. You do have Internet access. Use that existing branch to accelerate your work, but do not commit any code unless you are certain that you have successfully executed tests that prove that the pyodide module you created works correctly.
This one gave up half way through, complaining that emscripten would take too long. I told it:
Complete this project, actually run emscripten, I do not care how long it takes, update the report if it works
It churned away for a bit longer and complained that the existing Python library used CFFI which isn't available in Pyodide. I asked it:
Can you figure out how to rewrite cmarkgfm to not use FFI and to use a pyodide-friendly way of integrating that C code instead?
... and it did. You can see the full transcript here.
blog-tags-scikit-learn. Taking a short break from WebAssembly, I thought it would be fun to put scikit-learn through its paces on a text classification task against my blog:
Work in a new folder called blog-tags-scikit-learn
Download
https://datasette.simonwillison.net/simonwillisonblog.db- a SQLite database. Take a look at the blog_entry table and the associated tags - a lot of the earlier entries do not have tags associated with them, where the later entries do. Design, implement and execute models to suggests tags for those earlier entries based on textual analysis against later onesUse Python scikit learn and try several different strategies
Produce JSON of the results for each one, plus scripts for running them and a detailed markdown description
Also include an HTML page with a nice visualization of the results that works by loading those JSON files.
This resulted in seven .py files, four .json results files and a detailed report. (It ignored the bit about an HTML page with a nice visualization for some reason.) Not bad for a few moments of idle curiosity typed into my phone!
That's just three of the thirteen projects in the repository so far. The commit history for each one usually links to the prompt and sometimes the transcript if you want to see how they unfolded.
More recently I added a short AGENTS.md file to the repo with a few extra tips for my research agents. You can read that here.
My preferred definition of AI slop is AI-generated content that is published without human review. I've not been reviewing these reports in great detail myself, and I wouldn't usually publish them online without some serious editing and verification.
I want to share the pattern I'm using though, so I decided to keep them quarantined in this one public simonw/research repository.
A tiny feature request for GitHub: I'd love to be able to mark a repository as "exclude from search indexes" such that it gets labelled with <meta name="robots" content="noindex"> tags. I still like to keep AI-generated content out of search, to avoid contributing more to the dead internet.
It's pretty easy to get started trying out this coding agent research pattern. Create a free GitHub repository (public or private) and let some agents loose on it and see what happens.
You can run agents locally but I find the asynchronous agents to be more convenient - especially as I can run them (or trigger them from my phone) without any fear of them damaging my own machine or leaking any of my private data.
Claude Code for web offers a free $250 of credits for their $20/month users for a limited time (until November 18, 2025). Gemini Jules has a free tier. There are plenty of other coding agents you can try out as well.
Let me know if your research agents come back with anything interesting!
Tags: ai, webassembly, generative-ai, llms, ai-assisted-programming, slop, ai-agents, coding-agents, claude-code, jules, codex-cli
2025-11-06 07:11:17
Open redirect endpoint in Datasette prior to 0.65.2 and 1.0a21
This GitHub security advisory covers two new releases of Datasette that I shipped today, both addressing the same open redirect issue with a fix by James Jefferies.Datasette 0.65.2 fixes the bug and also adds Python 3.14 support and a datasette publish cloudrun fix.
Datasette 1.0a21 also has that Cloud Run fix and two other small new features:
I decided to include the Cloud Run deployment fix so anyone with Datasette instances deployed to Cloud Run can update them with the new patched versions.
Tags: security, datasette, cloudrun, annotated-release-notes
2025-11-06 06:24:57
Removing XSLT for a more secure browser
Previously discussed back in August, it looks like it's now official:Chrome intends to deprecate and remove XSLT from the browser. [...] We intend to remove support from version 155 (November 17, 2026). The Firefox and WebKit projects have also indicated plans to remove XSLT from their browser engines. [...]
The continued inclusion of XSLT 1.0 in web browsers presents a significant and unnecessary security risk. The underlying libraries that process these transformations, such as libxslt (used by Chromium browsers), are complex, aging C/C++ codebases. This type of code is notoriously susceptible to memory safety vulnerabilities like buffer overflows, which can lead to arbitrary code execution.
I mostly encounter XSLT on people's Atom/RSS feeds, converting those to a more readable format in case someone should navigate directly to that link. Jake Archibald shared an alternative solution to that back in September.
Via Hacker News
Tags: browsers, chrome, security, web-standards, xml, xslt, jake-archibald
2025-11-05 11:50:31
I'm worried that they put co-pilot in Excel because Excel is the beast that drives our entire economy and do you know who has tamed that beast?
Brenda.
Who is Brenda?
She is a mid-level employee in every finance department, in every business across this stupid nation and the Excel goddess herself descended from the heavens, kissed Brenda on her forehead and the sweat from Brenda's brow is what allows us to do capitalism. [...]
She's gonna birth that formula for a financial report and then she's gonna send that financial report to a higher up and he's gonna need to make a change to the report and normally he would have sent it back to Brenda but he's like oh I have AI and AI is probably like smarter than Brenda and then the AI is gonna fuck it up real bad and he won't be able to recognize it because he doesn't understand Excel because AI hallucinates.
You know who's not hallucinating?
Brenda.
— Ada James, @belligerentbarbies on TikTok
Tags: generative-ai, ai, excel, hallucinations, llms, tiktok, ai-ethics