2025-05-12 03:15:46
Cursor's security documentation page includes a surprising amount of detail about how the Cursor text editor's backend systems work.
I've recently learned that checking an organization's list of documented subprocessors is a great way to get a feel for how everything works under the hood - it's a loose "view source" for their infrastructure! That was how I confirmed that Anthropic's search features used Brave search back in March.
Cursor's list includes AWS, Azure and GCP (AWS for primary infrastructure, Azure and GCP for "some secondary infrastructure"). They host their own custom models on Fireworks and make API calls out to OpenAI, Anthropic, Gemini and xAI depending on user preferences. They're using turbopuffer as a hosted vector store.
The most interesting section is about codebase indexing:
Cursor allows you to semantically index your codebase, which allows it to answer questions with the context of all of your code as well as write better code by referencing existing implementations. […]
At our server, we chunk and embed the files, and store the embeddings in Turbopuffer. To allow filtering vector search results by file path, we store with every vector an obfuscated relative file path, as well as the line range the chunk corresponds to. We also store the embedding in a cache in AWS, indexed by the hash of the chunk, to ensure that indexing the same codebase a second time is much faster (which is particularly useful for teams).
At inference time, we compute an embedding, let Turbopuffer do the nearest neighbor search, send back the obfuscated file path and line range to the client, and read those file chunks on the client locally. We then send those chunks back up to the server to answer the user’s question.
When operating in privacy mode - which they say is enabled by 50% of their users - they are careful not to store any raw code on their servers for longer than the duration of a single request. This is why they store the embeddings and obfuscated file paths but not the code itself.
Reading this made me instantly think of the paper Text Embeddings Reveal (Almost) As Much As Text about how vector embeddings can be reversed. The security documentation touches on that in the notes:
Embedding reversal: academic work has shown that reversing embeddings is possible in some cases. Current attacks rely on having access to the model and embedding short strings into big vectors, which makes us believe that the attack would be somewhat difficult to do here. That said, it is definitely possible for an adversary who breaks into our vector database to learn things about the indexed codebases.
Via lobste.rs
Tags: ai-assisted-programming, security, generative-ai, ai, embeddings, llms
2025-05-11 12:17:38
Achievement unlocked: tap danced in the local community college dance recital.
Tags: dance
2025-05-11 06:34:06
Poker Face season two just started on Peacock (the US streaming service). It's my favorite thing on TV right now. I've started threads on MetaFilter FanFare for episodes one, two and three.
Tags: tv, metafilter
2025-05-10 14:29:10
This llama.cpp server vision support via libmtmd pull request - via Hacker News - was merged earlier today. The PR finally adds full support for vision models to the excellent llama.cpp project. It's documented on this page, but the more detailed technical details are covered here. Here are my notes on getting it working on a Mac.
llama.cpp
models are usually distributed as .gguf
files. This project introduces a new variant of those called mmproj
, for multimodal projector. libmtmd
is the new library for handling these.
You can try it out by compiling llama.cpp
from source, but I found another option that works: you can download pre-compiled binaries from the GitHub releases.
On macOS there's an extra step to jump through to get these working, which I'll describe below.
Update: it turns out the Homebrew package for llama.cpp
turns things around extremely quickly. You can run brew install llama.cpp
or brew upgrade llama.cpp
and start running the below tools without any extra steps.
I downloaded the llama-b5332-bin-macos-arm64.zip
file from this GitHub release and unzipped it, which created a build/bin
directory.
That directory contains a bunch of binary executables and a whole lot of .dylib
files. macOS wouldn't let me execute these files because they were quarantined. Running this command fixed that for the llama-mtmd-cli
and llama-server
executables and the .dylib
files they needed:
sudo xattr -rd com.apple.quarantine llama-server llama-mtmd-cli *.dylib
Now I can run an interactive terminal LLM session using this command:
./llama-mtmd-cli -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XL
The first time this runs it fetches four files from unsloth/gemma-3-4b-it-GGUF - a quantized copy of Google DeepMind's Gemma 3 4B - and stores them in ~/Library/Caches/llama.cpp
:
2.4G unsloth_gemma-3-4b-it-GGUF_gemma-3-4b-it-UD-Q4_K_XL.gguf
217B unsloth_gemma-3-4b-it-GGUF_gemma-3-4b-it-UD-Q4_K_XL.gguf.json
812M unsloth_gemma-3-4b-it-GGUF_mmproj-F16.gguf
203B unsloth_gemma-3-4b-it-GGUF_mmproj-F16.gguf.json
Once they've downloaded an interactive session starts. You can chat directly with the model or use the /image path-to-image.jpg
command to load an image into the context.
I loaded in a pelican photo and prompted "describe it" and got back this:
It’s a striking photograph of a brown pelican standing in shallow water on a sandy beach. Here's a breakdown of what I see:
Subject: The main focus is a large brown pelican. It’s standing with its wings fully extended, creating a dramatic pose.
Pose: The pelican is in a slightly awkward, almost comical stance, with its legs planted firmly in the wet sand and its wings spread wide. It looks like it might be preparing to take off or is simply enjoying the water.
Color & Texture: The pelican is a rich, dark brown with lighter markings on its wings and head. The sand is a muted beige color, and the water reflects the sunlight, creating a shimmering effect. You can see the texture of the sand and the ripples in the water.
Lighting: The lighting appears to be bright and sunny, casting shadows and highlighting the details of the pelican's feathers.
Background: The background is blurred, suggesting a shallow depth of field, which keeps the focus on the pelican. It appears to be a beach scene.
Overall Impression: The image has a dynamic feel due to the pelican’s pose and the movement implied by the water. It’s a well-composed shot that captures a moment of wildlife in its natural habitat.
Do you want me to zoom in on a particular detail, or perhaps tell you more about pelicans?
Not bad for a 3.2GB model running on my laptop!
Even more fun is the llama-server
command. This starts a localhost web server running on port 8080 to serve the model, with both a web UI and an OpenAI-compatible API endpoint.
The command to run it is the same:
./llama-server -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XL
Now visit http://localhost:8080
in your browser to start interacting with the model:
It miscounted the pelicans in the group photo, but again, this is a tiny 3.2GB model.
With the server running on port 8080 you can also access the OpenAI-compatible API endpoint. Here's how to do that using curl
:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Describe a pelicans ideal corporate retreat"}
]
}' | jq
I built a new plugin for LLM just now called llm-llama-server to make interacting with this API more convenient. You can use that like this:
llm install llm-llama-server
llm -m llama-server 'invent a theme park ride for a pelican'
Or for vision models use llama-server-vision
:
llm -m llama-server-vision 'describe this image' -a https://static.simonwillison.net/static/2025/pelican-group.jpg
The LLM plugin uses the streaming API, so responses will stream back to you as they are being generated.
Tags: vision-llms, llm, llama-cpp, ai, local-llms, llms, gemma, generative-ai, projects, homebrew
2025-05-10 13:20:45
I've been doing some work with SQLite triggers recently while working on sqlite-chronicle, and I decided I needed a single reference to exactly which triggers are executed for which SQLite actions and what data is available within those triggers.
I wrote this triggers.py script to output as much information about triggers as possible, then wired it into a TIL article using Cog. The Cog-powered source code for the TIL article can be seen here.
2025-05-10 02:55:24
I had some notes in a GitHub issue thread in a private repository that I wanted to export as Markdown. I realized that I could get them using a combination of several recent projects.
Here's what I ran:
export GITHUB_TOKEN="$(llm keys get github)"
llm -f issue:https://github.com/simonw/todos/issues/170 \
-m echo --no-log | jq .prompt -r > notes.md
I have a GitHub personal access token stored in my LLM keys, for use with Anthony Shaw's llm-github-models plugin.
My own llm-fragments-github plugin expects an optional GITHUB_TOKEN
environment variable, so I set that first - here's an issue to have it use the github
key instead.
With that set, the issue:
fragment loader can take a URL to a private GitHub issue thread and load it via the API using the token, then concatenate the comments together as Markdown. Here's the code for that.
Fragments are meant to be used as input to LLMs. I built a llm-echo plugin recently which adds a fake LLM called "echo" which simply echos its input back out again.
Adding --no-log
prevents that junk data from being stored in my LLM log database.
The output is JSON with a "prompt"
key for the original prompt. I use jq .prompt
to extract that out, then -r
to get it as raw text (not a "JSON string"
).
... and I write the result to notes.md
.