MoreRSS

site iconSimon WillisonModify

Creator of Datasette and Lanyrd, co-creator of the Django Web Framework.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Simon Willison

Hoard things you know how to do

2026-02-27 04:33:27

Agentic Engineering Patterns >

Many of my tips for working productively with coding agents are extensions of advice I've found useful in my career without them. Here's a great example of that: hoard things you know how to do.

A big part of the skill in building software is understanding what's possible and what isn't, and having at least a rough idea of how those things can be accomplished.

These questions can be broad or quite obscure. Can a web page run OCR operations in JavaScript alone? Can an iPhone app pair with a Bluetooth device even when the app isn't running? Can we process a 100GB JSON file in Python without loading the entire thing into memory first?

The more answers to questions like this you have under your belt, the more likely you'll be able to spot opportunities to deploy technology to solve problems in ways other people may not have thought of yet.

Knowing that something is theoretically possible is not the same as having seen it done for yourself. A key asset to develop as a software professional is a deep collection of answers to questions like this, ideally illustrated by running code.

I hoard solutions like this in a number of different ways. My blog and TIL blog are crammed with notes on things I've figured out how to do. I have over a thousand GitHub repos collecting code I've written for different projects, many of them small proof-of-concepts that demonstrate a key idea.

More recently I've used LLMs to help expand my collection of code solutions to interesting problems.

tools.simonwillison.net is my largest collection of LLM-assisted tools and prototypes. I use this to collect what I call HTML tools - single HTML pages that embed JavaScript and CSS and solve a specific problem.

My simonw/research repository has larger, more complex examples where I’ve challenged a coding agent to research a problem and come back with working code and a written report detailing what it found out.

Recombining things from your hoard

Why collect all of this stuff? Aside from helping you build and extend your own abilities, the assets you generate along the way become incredibly powerful inputs for your coding agents.

One of my favorite prompting patterns is to tell an agent to build something new by combining two or more existing working examples.

A project that helped crystallize how effective this can be was the first thing I added to my tools collection - a browser-based OCR tool, described in more detail here.

I wanted an easy, browser-based tool for OCRing pages from PDF files - in particular PDFs that consist entirely of scanned images with no text version provided at all.

I had previously experimented with running the Tesseract.js OCR library in my browser, and found it to be very capable. That library provides a WebAssembly build of the mature Tesseract OCR engine and lets you call it from JavaScript to extract text from an image.

I didn’t want to work with images though, I wanted to work with PDFs. Then I remembered that I had also worked with Mozilla’s PDF.js library, which among other things can turn individual pages of a PDF into rendered images.

I had snippets of JavaScript for both of those libraries in my notes.

Here’s the full prompt I fed into a model (at the time it was Claude 3 Opus), combining my two examples and describing the solution I was looking for:

This code shows how to open a PDF and turn it into an image per page:

<!DOCTYPE html>
<html>
<head>
  <title>PDF to Images</title>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.9.359/pdf.min.js"></script>
  <style>
    .image-container img {
      margin-bottom: 10px;
    }
    .image-container p {
      margin: 0;
      font-size: 14px;
      color: #888;
    }
  </style>
</head>
<body>
  <input type="file" id="fileInput" accept=".pdf" />
  <div class="image-container"></div>

  <script>
  const desiredWidth = 800;
    const fileInput = document.getElementById('fileInput');
    const imageContainer = document.querySelector('.image-container');

    fileInput.addEventListener('change', handleFileUpload);

    pdfjsLib.GlobalWorkerOptions.workerSrc = 'https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.9.359/pdf.worker.min.js';

    async function handleFileUpload(event) {
      const file = event.target.files[0];
      const imageIterator = convertPDFToImages(file);

      for await (const { imageURL, size } of imageIterator) {
        const imgElement = document.createElement('img');
        imgElement.src = imageURL;
        imageContainer.appendChild(imgElement);

        const sizeElement = document.createElement('p');
        sizeElement.textContent = `Size: ${formatSize(size)}`;
        imageContainer.appendChild(sizeElement);
      }
    }

    async function* convertPDFToImages(file) {
      try {
        const pdf = await pdfjsLib.getDocument(URL.createObjectURL(file)).promise;
        const numPages = pdf.numPages;

        for (let i = 1; i <= numPages; i++) {
          const page = await pdf.getPage(i);
          const viewport = page.getViewport({ scale: 1 });
          const canvas = document.createElement('canvas');
          const context = canvas.getContext('2d');
          canvas.width = desiredWidth;
          canvas.height = (desiredWidth / viewport.width) * viewport.height;
          const renderContext = {
            canvasContext: context,
            viewport: page.getViewport({ scale: desiredWidth / viewport.width }),
          };
          await page.render(renderContext).promise;
          const imageURL = canvas.toDataURL('image/jpeg', 0.8);
          const size = calculateSize(imageURL);
          yield { imageURL, size };
        }
      } catch (error) {
        console.error('Error:', error);
      }
    }

    function calculateSize(imageURL) {
      const base64Length = imageURL.length - 'data:image/jpeg;base64,'.length;
      const sizeInBytes = Math.ceil(base64Length * 0.75);
      return sizeInBytes;
    }

    function formatSize(size) {
      const sizeInKB = (size / 1024).toFixed(2);
      return `${sizeInKB} KB`;
    }
  </script>
</body>
</html>
This code shows how to OCR an image:
async function ocrMissingAltText() {
    // Load Tesseract
    var s = document.createElement("script");
    s.src = "https://unpkg.com/[email protected]/dist/tesseract.min.js";
    document.head.appendChild(s);

    s.onload = async () => {
      const images = document.getElementsByTagName("img");
      const worker = Tesseract.createWorker();
      await worker.load();
      await worker.loadLanguage("eng");
      await worker.initialize("eng");
      ocrButton.innerText = "Running OCR...";

      // Iterate through all the images in the output div
      for (const img of images) {
        const altTextarea = img.parentNode.querySelector(".textarea-alt");
        // Check if the alt textarea is empty
        if (altTextarea.value === "") {
          const imageUrl = img.src;
          var {
            data: { text },
          } = await worker.recognize(imageUrl);
          altTextarea.value = text; // Set the OCR result to the alt textarea
          progressBar.value += 1;
        }
      }

      await worker.terminate();
      ocrButton.innerText = "OCR complete";
    };
  }
Use these examples to put together a single HTML page with embedded HTML and CSS and JavaScript that provides a big square which users can drag and drop a PDF file onto and when they do that the PDF has every page converted to a JPEG and shown below on the page, then OCR is run with tesseract and the results are shown in textarea blocks below each image.

This worked flawlessly! The model kicked out a proof-of-concept page that did exactly what I needed.

I ended up iterating with it a few times to get to my final result, but it took just a few minutes to build a genuinely useful tool that I’ve benefited from ever since.

Coding agents make this even more powerful

I built that OCR example back in March 2024, nearly a year before the first release of Claude Code. Coding agents have made hoarding working examples even more valuable.

If your coding agent has internet access you can tell it to do things like:

Use curl to fetch the source of https://tools.simonwillison.net/ocr and https://tools.simonwillison.net/gemini-bbox and build a new tool that lets you select a page from a PDF and pass it to Gemini to return bounding boxes for illustrations on that page.

(I specified curl there because Claude Code defaults to using a WebFetch tool which summarizes the page content rather than returning the raw HTML.)

Coding agents are excellent at search, which means you can run them on your own machine and tell them where to find the examples of things you want them to do:

Add mocked HTTP tests to the ~/dev/ecosystem/datasette-oauth project inspired by how ~/dev/ecosystem/llm-mistral is doing it.

Often that's enough - the agent will fire up a search sub-agent to investigate and pull back just the details it needs to achieve the task.

Since so much of my research code is public I'll often tell coding agents to clone my repositories to /tmp and use them as input:

Clone simonw/research from GitHub to /tmp and find examples of compiling Rust to WebAssembly, then use that to build a demo HTML page for this project.

The key idea here is that coding agents mean we only ever need to figure out a useful trick once. If that trick is then documented somewhere with a working code example our agents can consult that example and use it to solve any similar shaped project in the future.

Tags: llms, ai, generative-ai, ai-assisted-programming, coding-agents, agentic-engineering

Quoting Andrej Karpathy

2026-02-27 03:03:27

It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn’t work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow. [...]

Andrej Karpathy

Tags: andrej-karpathy, coding-agents, ai-assisted-programming, generative-ai, agentic-engineering, ai, llms

Google API Keys Weren't Secrets. But then Gemini Changed the Rules.

2026-02-26 12:28:55

Google API Keys Weren't Secrets. But then Gemini Changed the Rules.

Yikes! It turns out Gemini and Google Maps (and other services) share the same API keys... but Google Maps API keys are designed to be public, since they are embedded directly in web pages. Gemini API keys can be used to access private files and make billable API requests, so they absolutely should not be shared.

If you don't understand this it's very easy to accidentally enable Gemini billing on a previously public API key that exists in the wild already.

What makes this a privilege escalation rather than a misconfiguration is the sequence of events. 

  1. A developer creates an API key and embeds it in a website for Maps. (At that point, the key is harmless.) 
  2. The Gemini API gets enabled on the same project. (Now that same key can access sensitive Gemini endpoints.) 
  3. The developer is never warned that the keys' privileges changed underneath it. (The key went from public identifier to secret credential).

Truffle Security found 2,863 API keys in the November 2025 Common Crawl that could access Gemini, verified by hitting the /models listing endpoint. This included several keys belonging to Google themselves, one of which had been deployed since February 2023 (according to the Internet Archive) hence predating the Gemini API that it could now access.

Google are working to revoke affected keys but it's still a good idea to check that none of yours are affected by this.

Via Hacker News

Tags: api-keys, google, security, gemini

Quoting Benedict Evans

2026-02-26 11:44:56

If people are only using this a couple of times a week at most, and can’t think of anything to do with it on the average day, it hasn’t changed their life. OpenAI itself admits the problem, talking about a ‘capability gap’ between what the models can do and what people do with them, which seems to me like a way to avoid saying that you don’t have clear product-market fit.

Hence, OpenAI’s ad project is partly just about covering the cost of serving the 90% or more of users who don’t pay (and capturing an early lead with advertisers and early learning in how this might work), but more strategically, it’s also about making it possible to give those users the latest and most powerful (i.e. expensive) models, in the hope that this will deepen their engagement.

Benedict Evans, How will OpenAI compete?

Tags: openai, chatgpt, benedict-evans, ai

tldraw issue: Move tests to closed source repo

2026-02-26 05:06:53

tldraw issue: Move tests to closed source repo

It's become very apparent over the past few months that a comprehensive test suite is enough to build a completely fresh implementation of any open source library from scratch, potentially in a different language.

This has worrying implications for open source projects with commercial business models. Here's an example of a response: tldraw, the outstanding collaborative drawing library (see previous coverage), are moving their test suite to a private repository - apparently in response to Cloudflare's project to port Next.js to use Vite in a week using AI.

They also filed a joke issue, now closed to Translate source code to Traditional Chinese:

The current tldraw codebase is in English, making it easy for external AI coding agents to replicate. It is imperative that we defend our intellectual property.

Worth noting that tldraw aren't technically open source - their custom license requires a commercial license if you want to use it in "production environments".

Update: Well this is embarrassing, it turns out the issue I linked to about removing the tests was a joke as well:

Sorry folks, this issue was more of a joke (am I allowed to do that?) but I'll keep the issue open since there's some discussion here. Writing from mobile

  • moving our tests into another repo would complicate and slow down our development, and speed for us is more important than ever
  • more canvas better, I know for sure that our decisions have inspired other products and that's fine and good
  • tldraw itself may eventually be a vibe coded alternative to tldraw
  • the value is in the ability to produce new and good product decisions for users / customers, however you choose to create the code

Via @steveruizok

Tags: open-source, cloudflare, ai-ethics

Claude Code Remote Control

2026-02-26 01:33:24

Claude Code Remote Control

New Claude Code feature dropped yesterday: you can now run a "remote control" session on your computer and then use the Claude Code for web interfaces (on web, iOS and native desktop app) to send prompts to that session.

It's a little bit janky right now. Initially when I tried it I got the error "Remote Control is not enabled for your account. Contact your administrator." (but I am my administrator?) - then I logged out and back into the Claude Code terminal app and it started working:

claude remote-control

You can only run one session on your machine at a time. If you upgrade the Claude iOS app it then shows up as "Remote Control Session (Mac)" in the Code tab.

It appears not to support the --dangerously-skip-permissions flag (I passed that to claude remote-control and it didn't reject the option, but it also appeared to have no effect) - which means you have to approve every new action it takes.

I also managed to get it to a state where every prompt I tried was met by an API 500 error.

Screenshot of a "Remote Control session" (Mac:dev:817b) chat interface. User message: "Play vampire by Olivia Rodrigo in music app". Response shows an API Error: 500 {"type":"error","error":{"type":"api_error","message":"Internal server error"},"request_id":"req_011CYVBLH9yt2ze2qehrX8nk"} with a "Try again" button. Below, the assistant responds: "I'll play "Vampire" by Olivia Rodrigo in the Music app using AppleScript." A Bash command panel is open showing an osascript command: osascript -e 'tell application "Music" activate set searchResults to search playlist "Library" for "vampire Olivia Rodrigo" if (count of searchResults) > 0 then play item 1 of searchResults else return "Song not found in library" end if end tell'

Restarting the program on the machine also causes existing sessions to start returning mysterious API errors rather than neatly explaining that the session has terminated.

I expect they'll iron out all of these issues relatively quickly. It's interesting to then contrast this to solutions like OpenClaw, where one of the big selling points is the ability to control your personal device from your phone.

Claude Code still doesn't have a documented mechanism for running things on a schedule, which is the other killer feature of the Claw category of software.

Update: I spoke too soon: also today Anthropic announced Schedule recurring tasks in Cowork, Claude Code's general agent sibling. These do include an important limitation:

Scheduled tasks only run while your computer is awake and the Claude Desktop app is open. If your computer is asleep or the app is closed when a task is scheduled to run, Cowork will skip the task, then run it automatically once your computer wakes up or you open the desktop app again.

I really hope they're working on a Cowork Cloud product.

Via @claudeai

Tags: ai, generative-ai, applescript, llms, anthropic, claude, coding-agents, claude-code, openclaw