MoreRSS

site iconArmin RonacherModify

I'm currently located in Austria and working as a Director of Engineering for Sentry. Aside from that I do open source development.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Armin Ronacher

How I Use AI: Meet My Promptly Hired Model Intern

2025-01-30 08:00:00

After Musk's acquisition of Twitter, many people I respect and follow moved to Bluesky. I created an account there and made an honest attempt of making it my primary platform. Sadly, I found Bluesky to be surprisingly hostile towards AI content. There is an almost religious resistance to AI on there, at least in whatever corner of the platform I ended up in.

Despite these challenges, some individuals on both Twitter and Bluesky have shown genuine curiosity about my AI usage. In this post, I want to share how I use Large Language Models and why I find them so helpful.

Before moving on, I want to include an an important disclaimer: I am by no means an expert in AI; I'm mostly an enthusiastic user. Absolutely nothing here is novel! What I do here is pretty boring which to some degree is the point. I won't be covering underlying technology or my programmatic experience. This is strictly about how I use AI as a “techy consumer”.

In addition, as you read through this article you will probably figure out rather quickly that the way I use AI — despite being used in part for content creation — does not really impact intellectual property much. That said, I'm curious and open to discussions about how we should be dealing with this problem. Particularly on Bluesky a lot of the negativity towards AI is related to watering down of copyrights and human creation. I don't know the answers to these things, but I think we need to have a productive dialog here rather than wishing for the technology to go away.

Why AI?

In short: AI makes me significantly more productive. I recently read Aaron Francis's Tweet about how he gets a week's worth of productivity out of a day now thanks to AI. I'm not anywhere close to that, but I use AI a lot now. It has become indispensable for me for both content writing as well as programming.

Moreover, a common misconception is that AI is still at the level it was when ChatGPT first launched. Many people tried it early, saw its limitations, and never returned. However, AI is evolving rapidly. If you haven't kept up, you risk drawing inaccurate conclusions based on outdated impressions. In some sense I believe people who never tried to use AI yet, are in a better position to judge than the folks who used it two years ago and never returned.

My AI Toolkit

I work with a variety of AI tools, mostly because of professional curiosity and to a smaller degree because each tool excels at something slightly different. Here are the ones I use most often:

  • Open WebUI. In short this is a Python web app that offers a chat interface similar to ChatGPT. Unlike ChatGPT, however, it lets you talk to different models. First and foremost, I use this to talk to local models hosted by Ollama, but secondarily I also use it to interface with other remote services like OpenAI, Anthropic and DeepSeek.
  • Simon's llm. This is a command line tool with plenty of plugins that lets you prompt different models. Think of it as a command-line version of Open WebUI. It's particularly useful for quick scripting and basic automation.
  • Ollama. This allows me to run models locally on my MacBook Pro M1 Max. With the 64GB of RAM it has, it's a pretty potent machine for basic inference despite it being three years old. I'll explain later why I use local models.
  • Cursor. It is a fork of Visual Studio Code with AI-based auto completion and code generation built-in. It's my go-to for programming with AI assistance at the moment.
  • ChatGPT. Like probably most AI users, I use ChatGPT, particularly on my phone. I pay for the Plus subscription because I use it enough to get a lot of value out of it. One significant use of this for me is in fact the voice mode (more on that later).

It's probably worth mentioning that you can get most of the benefits of this from just paying for a single AI tool. I think as one expands their use, as especially as one gets better at writing prompts, the desire naturally grows to use more tooling. As for which models (and services) to use day to day I don't have a particular strong strategy and preferences change quickly. For instance after DeepSeek's R1 release, I started exploring it quite a bit for programming — and it's doing a phenomenal job at it —  and as of writing that's just a few days old.

If you want to run models locally, Apple Silicon machines currently offer some of the best “bang for your buck” in terms of performance, power usage, and money. With tailscale, I can even access my MacBook's Open WebUI interface from my phone, as long as it is powered on.

Guide for installing llm and Open WebUI

This is a bit of a plug for uv quite honestly. If you have it, it's crazy quick to get Open WebUI and llm running locally:

uv tool install -p python3.11 open-webui
uv tool install --with llm-ollama --with llm-claude-3 --with llm-deepseek llm

Afterwards you can launch open-webui and use the llm tool:

open-webui serve
llm keys set deepseek
llm -m deepseek-coder '# write timsort in python'
  • By default, Open WebUI only connects to OpenAI or Ollama. If you want to add the hosted Anthropic or DeepSeek models, you need to create a custom function in the “Functions” section of Open WebUI's admin panel.
  • Import the functions and configure your API keys, and you’re ready to go.

Functions you can import to the WebUI:

AI Affordances

One frequent concern I hear is “you cannot trust LLMs” as they tend to hallucinate. I get this in particular when I explain that I frequently use this as a replacement for Google! However, I approach the risk of hallucination the same way I would when seeking advice from another human: people can and are routinely wrong, and you learn to cross-check selectively.

I treat AI as I would a collaborator or a pretty good intern but I remain responsible for the final outcome. In this case the intern also happens to get better month by month as models improve. And a bit like a human, that digital intern has morals and wants to be argued with. Except, of course, that some of those AI interns don't want to talk about China, while others get a stroke if you talk about certain people. But regardless of how good they get, in the end, it's my fault and my fault alone if I do the wrong thing. I won't blame the AI and I need to spot check.

However, the logical conclusion of this is not that it's wrong all the time and you need to check everything, or that you cannot trust it at all. It's similar to how you engage in a technical discussion with others about a problem. I have seen more than one situation where the conventional wisdom in the room is just wrong for a few minutes, until someone points out that we had it wrong.

Another major advantage is that AI tools are relatively open. You can run models locally and integrate them with scripts. Even the famous OpenAI which is not at all open is much more open than a Google search is. For instance, you can create a simple script for grammar-checking right from your command line.

In other words, you can integrate it locally and nobody stops you. By contrast, many, many years ago I had a tool on my computer that allowed me to issue web searches and extract text from results. That has stopped working such a long time ago that I almost forgot about it. It has stopped working because there is basically no competition in search, and Google does not want me to use it like that.

For instance, you can create a simple script for grammar checking right from your command line:

#!/bin/sh
MODEL=phi4:latest
if ping -q -c1 google.com &>/dev/null; then
  MODEL=claude-3-5-sonnet-latest
fi
OLD_TEXT="$(cat)"
llm -m $MODEL "$OLD_TEXT" -s "fix spelling and grammar in the given text,
    and reply with the improved text and no extra commentary.
    Use double spacing."

This script can automatically switch between a local model (phi4 via Ollama) and a remote one (claude-3-5-sonnet-latest) based on internet connectivity. With a command like !llm-spell in Vim, I can fix up sentences with a single step.

Or you can manipulate the contents of the clipboard like this:

pbpaste | llm-spell | pbcopy && say "AI is done"

Writing with AI

I don't let AI write my articles. As a non-native Speaker, mistakes and linguistic quirks are part of me. That said, I do rely on AI tools for helping me write.

Often, after I have made my first pass through a page, I ask an LLM to read through it and give me comments. I have a Apple Note with various prompts I can use or I just come up with what I need in the moment. The process is not particularly scripted out. I basically talk to the LLM to figure out what it thinks of the text, etc.

Here are some of the things I use AI for when writing:

  • Grammar checking: I compare the AI’s suggested revisions side by side with my original text and pick the changes I prefer.
  • Restructuring: AI often helps me see when my writing is too wordy. In the days before AI, I often ended up with super long articles that did not read well and that I did not publish. Models like o1 are very helpful in identifying things that don't need to be said.
  • Writing Notes and finding key points: Here, I ask the AI to read through a draft “like a Computer Science 101 student” and take notes. This helps me see if what it absorbed matches what I intended to convey.
  • Roast my Article: I have a few prompts that asks the AI to “roast” or criticize my article, as if commenting on Reddit, Twitter, or Hacker News. Even though these critiques seem shallow, they can sting, and they often highlight weaknesses in my argument or lack of clarity. Even if they don't necessarily impact the writing, they prime me for some of the feedback I inevitably receive.
  • Identifying jargon: If I worry there's too much jargon, I use AI to resolve acronyms and point out technical terms I've used without explanation, helping me make the text more accessible.

Programming with AI

This is probably the bulk of what I do and it falls into three major categories.

  • Debugging and summarizing: Throwing stack traces and errors at an AI is usually a good first step in finding out where I should look, what might be wrong. Sometimes it fixes a problem entirely by itself. I also sometimes just throw a long bug report or issue at the thing to try to reason through it.
  • Fuzzy parsing: Sometimes you have an output, but you don't have the input. As an example I have a little side project to write a command line tool help generator, and AI is really good at taking the output, “parsing“ it and creating a structure as input. That is a good first step to then generate out the code that should match again that output.
  • Code generation: lastly it just programs really well. It can do single code snippets, it can generate complex stuff. Particularly for prototyping it's amazing.

Now it's not great at all languages. I find it performs exceptionally well at Python, somewhat okay at JavaScript and horrible at Rust. However, particularly if I wanna solve an algorithmic problem, I can often iterate in Python until I have what I want, then ask it to translate it to Rust. That's from my experience at least a decent way to get going quickly on some harder problems.

Talking to Her

Pair programming is fantastic, but not always possible if you're programming alone at home. An LLM can serve as a stand-in discussion partner — a poor man's interlocutor. Thankfully it's actually not all that poor, because it's really good at telling you what stuff is called. This is most fun if you do it with voice mode, but it also works quite well in text.

Here a motivating example from voice mode:

A: So, do you remember if you have, like, strcat, it always copies from the start, but if you keep concatenating to the same buffer, it's really inefficient. So, remember what the Big-O of that is if you do it in a loop, and wasn't there, like, a humorous name for this kind of algorithm?

AI: You’re thinking of the “Schlemiel the Painter’s algorithm.” It’s a humorous way of describing the inefficiency of repeatedly appending to a string using operations like strcat, which copies from the start each time. The Big O complexity for this operation is O(n^2) if done repeatedly in a loop, since each strcat operation takes time proportional to the current length of the string.

It's also fun and educational for my kids. They ask a lot of questions, and ChatGPT can tailor its explanations to their age level. If the explanation is still too advanced, I can immediately ask it to simplify further. If it was too dumbed down, I can make it go the other way. Now again, there is always a risk that it gets it wrong, but that is okay. Part of all of this is to convey to the children the idea that we need to validate it and that being critical is important.

What makes ChatGPT so enjoyable here is that it's able to keep the conversation going — it has state. Answered in large metric numbers? Can always ask it to scale to elephants or soccer fields.

ChatGPT is also incredibly helpful when having to work with multiple languages. For a recent example, my kids have Greek friends and we tried to understand the difference between some Greek words that came up. I have no idea how to write it, Google translate does not understand my attempts of pronouncing them either. However, ChatGPT does. If I ask it in voice mode what “pa-me-spee-tee” in Greek means it knows what I tried to mumble and replies in a helpful manner.

Lastly the use on the go. Sometimes I'm just not in a position where I can actually write it down on a keyboard, but if I basically talk to ChatGPT, it can transcribe it down and it will also remove some of my “um's” and it will just be there and I can copy-paste it later and edit it. (To prove a point, I transcribed the previous sentence with the mobile app). This is a very different experience than transcribing on iOS natively. It understands enough context of what I'm babbling that it will start and stop transcribing. Now this is not perfect, but pretty damn good and incredibly helpful.

The multilingual aspect is particularly helpful because our family is multilingual. Being able to fluidly switch between German, Russian and English is such a refreshing experience.

No Slop

I mentioned earlier that I don't let LLMs write my texts. I also don't use AI to make illustrations, though I have in the past. The reason is that there is a certain style that goes along with these illustrations, which is just incredibly off-putting. They are noticeably AI-generated, and typically quite low-quality. People have taken to calling these images “AI slop” and I personally respond really badly to it. When someone throws me a pull request, an email or a text message that is obviously AI-generated without disclosing this, I immediately have a very low opinion of them.

Slop like hallucinations are a problem, but they are only a problem if you don't use your brain. Even the worst slop can be the foundation of really good content. I'm a horrible artist, but I can use Illustrator. Even an AI slop image can help me trace the person in the pose I wanted. Likewise you can throw your notes into a document and let the AI imagine a story around it. You probably can't use that story right away, but you can use it as potential inspiration.

Final Thoughts

AI tools, at their best, feel less like disruptive and dark magic and more like a natural extension of the creative process as long as you see them as curious collaborators. My approach isn't about outsourcing thinking, but augmenting it: using LLMs to accelerate grunt work, untangle mental knots, and prototype ideas faster. Skepticism is healthy, but dismissing AI outright risks missing its potential as a multiplier for those willing to engage critically.

Build It Yourself

2025-01-24 08:00:00

Another day, another rant about dependencies. from me. This time I will ask you that we start and support a vibe shift when it comes to dependencies.

You're probably familiar with the concept of “dependency churn.” It's that never-ending treadmill of updates, patches, audits, and transitive dependencies that we as developers love to casually install in the name of productivity. Who doesn't enjoy waiting for yet another cargo upgrade just so you can get that fix for a bug you don't even have?

It's a plague in most ecosystems with good packaging solutions. JavaScript and Rust are particularly badly affected by that. A brand new Tokio project drags in 28 crates, a new Rocket project balloons that to 172, and a little template engine like MiniJinja can exist with just a single dependency — while its CLI variant slurps up 142.

If that doesn't sound like a big deal, let's consider terminal_size. It is a crate that does exactly what its name suggests: it figures out your terminal dimensions. The underlying APIs it uses have effectively been stable since the earliest days of computing terminals—what, 50 years or so? And yet, for one function, terminal-size manages to introduce three or four additional crates, depending on your operating system. That triggers a whole chain reaction, so you end up compiling thousands of other functions just to figure out if your terminal is 80x25 or 120x40. That crate had 26 releases. My own version of that that I have stuck away in a project from 10 years ago still works without a single update. Because shocker: nothing about figuring out terminal sizes has changed. [1]

So why does terminal-size have so many updates if it's so stable? Because it's build on top of platform abstraction libraries that constantly churn, so it needs to update to avoid code duplication and blowing up compile times even more.

But “big supply chain” will tell you that you must do it this way. Don't you dare to copy paste that function into your library. Or don't you date to use “unsafe” yourself. You're not qualified enough to write unsafe code, let the platform abstraction architects do that. Otherwise someone will slap you. There are entire companies who are making a living of supplying you with the tools needed to deal with your dependency mess. In the name of security, we're pushed to having dependencies and keeping them up to date, despite most of those dependencies being the primary source of security problems.

The goal of code in many ways should be to be written in a way that it does not need updates. It should eventually achieve some level of stability. In the Rust ecosystem stable code is punished. If you have a perfectly working dependency but you have a somewhat inactive bug tracker, RUSTSEC will come by and give you a chunk rating.

But there is a simpler path. You write code yourself. Sure, it's more work up front, but once it's written, it's done. No new crates, no waiting for upsteam authors to fix that edge case. If it's broken for you, you fix it yourself. Code that works doesn't necessarily need the maintenance treadmill. Your code has a corner case? Who cares. This is that vibe shift we need in the Rust world: celebrating fewer dependencies rather than more.

We're at a point in the most ecosystems where pulling in libraries is not just the default action, it's seen positively: “Look how modular and composable my code is!” Actually, it might just be a symptom of never wanting to type out more than a few lines.

Now one will make the argument that it takes so much time to write all of this. It's 2025 and it's faster for me to have ChatGPT or Cursor whip up a dependency free implementation of these common functions, than it is for me to start figuring out a dependency. And it makes sense as for many such small functions the maintenance overhead is tiny and much lower than actually dealing with constant upgrading of dependencies. The code is just a few lines and you also get the benefit of no longer need to compile thousands of lines of other people's code for a single function.

But let's face it: corporate code review culture has also has infected Open Source software. Companies are more likely to reward engineers than scold them for pulling in that new “shiny library” that solves the problem they never actually had. That creates problems, so dependabot and friends were born. Today I just dread getting dependabot pull requests but on projects but I have to accept it. I'm part of an ecosystem with my stuff and that ecosystem is all about churn, churn, churn. In companies you can also keep entire internal engineering teams busy with vendoring dependencies, internal audits and upgrading things throughout the company.

Fighting this fight is incredibly hard! Every new hire has been trained on the idea that dependencies are great, that code reuse is great. That having old code sitting around is a sign of bad engineering culture.

It's also hard to fight this in Open Source. Years ago I wrote sha1-smol which originally was just called sha1. It became the standard crate to calculate SHA1 hashes. Eventually I was pressured to donate that package name to rust-crypto and to depend on the rest of the crypto ecosystem as it was so established. If you want to use the new sha1 crate, you get to enjoy 10 dependencies. But there was just no way around it, because that name in the registry is precious and people also wanted to have trait compatibility. It feels tiring to be the only person in a conversation pushing to keep the churn down and dependencies low.

It's time to have a new perspective: we should give kudos to engineers who write a small function themselves instead of hooking in a transitive web of crates. We should be suspicious of big crate graphs. Celebrated are the minimal dependencies, the humble function that just quietly does the job, the code that doesn't need to be touched for years because it was done right once.

And sure, it's not black and white. There are the important libraries that solve hard problems. Graphics libraries that abstract over complex drivers, implementations of protocols like HTTP and QUIC. I won't be able to get rid of tokio and I have no desire to. But when you end up using one function, but you compile hundreds, some alarm bell should go off.

We need that vibe shift. To celebrate building it yourself when it's appropriate to do so. To give credit to library authors who build low to no-dependency Open Source libraries.

For instance minijinja celebrates it in the readme:

$ cargo tree
minimal v0.1.0 (examples/minimal)
└── minijinja v2.6.0 (minijinja)
    └── serde v1.0.144

And it has a PR to eventually get rid of the last dependency. And sometime this year I will make it my goal to go ahead proudly and trim down all that fat in my projects.

[1] Disclaimer: you will need one dependency for UNIX: libc. That's because Rust does not expose the platform's libc constants to you, and they are not standarized. That however is such a common and lightweight dependency that you won't be able to avoid it anyways.

Automatic Server Reloading in Rust on Change: What is listenfd/systemfd?

2025-01-19 08:00:00

When I developed Werkzeug (and later Flask), the most important part of the developer experience for me was enabling fast, automatic reloading. Werkzeug (and with it Flask), this is achieved by using two procsses at all times. The parent process holds on to the file descriptor of the socket on which the server listens, and a subprocess picks up that file descriptor. That subprocess restarts when it detects changes. This ensures that no matter what happens, there is no window where the browser reports a connection error. At worst, the browser will hang until the process finishes reloading, after which the page loads successfully. In case the inner process fails to come up during restarts, you get an error message.

A few years ago, I wanted to accomplish the same experience for working with Rust code which is why I wrote systemfd and listenfd. I however realized that I never really wrote here about how they work and disappointingly I think those crates, and a good auto-reloading experience in Rust are largely unknown.

Watching for Changes

Firstly one needs to monitor the file system for changes. While in theory I could have done this myself, there was already a tool that could do that.

At the time there was cargo watch. Today one might instead use it together with the more generic watchexec. Either one monitor your workspace for changes and then executes a command. So you can for instance tell it to restart your program. One of these will work:

watchexec -r -- cargo run
cargo watch -x run

You will need a tool like that to do the watching part. At this point I recommend the more generic watchexec which you can find on homebrew and elsewhere.

Passing Sockets

But what about the socket? The solution to this problem I picked comes from systemd. Systemd has a “protocol” that standardizes passing file descriptors from one process to another through environment variables. In systemd parlance this is called “socket activation,” as it allows systemd to only launch a program if someone started making a request to the socket. This concept was originally introduced by Apple as part of launchd.

To make this work with Rust, I created two crates:

  • systemfd is the command line tool that opens sockets and passes them on to other programs.
  • listenfd is a Rust crate that accepts file descriptors from systemd or systemfd.

It's worth noting that systemfd is not exclusivly useful to Rust. The systemd protocol can be implemented in other languages as well, meaning that if you have a socket server written in Go or Python, you can also use systemfd.

So here is how you use it.

First you need to add listenfd to your project:

cargo add listenfd

Then, modify your server code to accept sockets via listenfd before falling back to listening itself on ports provided through command-line arguments or configuration files. Here is an example using listenfd in axum:

use axum::{routing::get, Router};
use tokio::net::TcpListener;

async fn index() -> &'static str {
    "Hello, World!"
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let app = Router::new().route("/", get(index));

    let mut listenfd = listenfd::ListenFd::from_env();
    let listener = match listenfd.take_tcp_listener(0)? {
        Some(listener) => TcpListener::from_std(listener),
        None => TcpListener::bind("0.0.0.0:3000").await,
    }?;

    axum::serve(listener, app).await?;
    Ok(())
}

The key point here is to accept socket 0 from the environment as a TCP listener and use it if available. If the socket is not provided (e.g. when launched without systemd/systemfd), the code falls back to opening a fixed port.

Putting it Together

Finally you can use cargo watch / watchexec together with systemfd:

systemfd --no-pid -s http::8888 -- watchexec -r -- cargo run
systemfd --no-pid -s http::8888 -- cargo watch -x run

This is what the parameters mean:

  • systemfd needs to be first it's the program that opens the sockets.
  • --no-pid is a flag prevents the PID from being passed. This is necessary for listenfd to accept the socket. This is a departure of the socket passing protocol from systemd which otherwise does not allow ports to be passed through another program (like watchexec). In short: when the PID information is not passed, then listenfd will accept the socket regardless. Otherwise it would only accept it from the direct parent process.
  • -s http::8888 tells systemfd to open one TCP socket on port 8888. Using http instead of tcp is a small improvement that will cause systemfd to print out a URL on startup.
  • -- watchexec -r makes watchexec restart the process when something changes in the current working directory.
  • -- cargo run is the program that watchexec will start and re-start onm changes. In Rust this will first compile the changes and then run the application. Because we put listenfd in, it will try to first accept the socket from systemfd.

The end result is that you can edit your code, and it will recompile automatically and restart the server without dropping any requests. When you run it, and perform changes, it will look a bit like this:

$ systemfd --no-pid -s http::5555 -- watchexec -r -- cargo run
~> socket http://127.0.0.1:5555/ -> fd #3
[Running: cargo run]
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.02s
     Running `target/debug/axum-test`
[Running: cargo run]
   Compiling axum-test v0.1.0 (/private/tmp/axum-test)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.52s
     Running `target/debug/axum-test`

For easier access, I recommend putting this into a Makefile or similar so you can just run make devserver and it runs the server in watch mode.

To install systemfd you can use curl to bash:

curl -sSfL https://github.com/mitsuhiko/systemfd/releases/latest/download/systemfd-installer.sh | sh

What About Windows?

Now how does this work on Windows? The answer is that systemfd and listenfd have a custom, proprietary protocol that also makes socket passing work on Windows. That's a more complex system which involves a local RPC server. However the system does also support Windows and the details about how it works are largely irrelevant for you as a user — unless you want to implement that protocol for another programming language.

Potential Improvements

I really enjoy using this combination, but it can be quite frustrating to require so many commands, and the command line workflow isn't optimal. Ideally, this functionality would be better integrated into specific Rust frameworks like axum and provided through a dedicated cargo plugin. In a perfect world, one could simply run cargo devserver, and everything would work seamlessly.

However, maintaining such an integrated experience is a much more involved effort than what I have. Hopefully, someone will be inspired to further enhance the developer experience and achieve deeper integration with Rust frameworks, making it more accessible and convenient for everyone.

Reflecting on Life

2024-12-26 08:00:00

Last year I decided that I want to share my most important learnings about engineering, teams and quite frankly personal mental health. My hope is that those who want to learn from me find it useful. This is a continuation to this.

Over the years, I've been asked countless times: “What advice would you give to young programmers or engineers?” For the longest time, I struggled to answer. I wasn't sure I had anything definitive or profound to offer. And truthfully, even now, I'm not convinced I have enough answers. But as I've reflected on my journey to here, I've formulated some ideas that I believe are worth sharing — if only to provide a bit of guidance to those just starting out. For better or worse, I think those things are applicable regardless of profession.

My core belief is that fulfillment and happiness comes from deliberate commitment to meaningful work, relationships, and personal growth and purpose. I don't claim that these things can be replicated, but they worked for me and some others, so maybe they can be of use for you.

Put Time In

Putting time into work and skills — and by that truly investing oneself — is always worth it.

Whether it's working on a project, solving a difficult problem, or even refining soft skills like communication, the act of showing up and putting in the hours is essential. Practice makes perfect, but more so it's all about progress rather than perfection. Each hour you spend iterating, refining, failing and retrying brings you closer to excellence. It doesn't always feel that way in the moment but when you look back at what you did before, you will see your progress. And that act of looking back, and seeing how you improved, is immensely rewarding and in turn makes you enjoy your work.

I did not start out enjoying programming, not at all. I had a friend in school who was just better than me at everything. It felt demotivating. Programming turned out to be a necessary tool that I had to build things and to share with others, and through that, I eventually ended up enjoying it.

There is a narrative that working hard is inherently bad for your health or that long hours lead to burnout. I disagree. It's not about how many hours you put in, but about the enjoyment and quality of the work you're doing. Still some of my most favorite memories were some all-nighters I did when I was younger working on something. It wasn't even necessarily on projects that ended up meaningful or successful, but it was the act in itself. When you find joy in what you're building in the moment, work does not feel like a burden. Instead it feels exciting and exhilarating. These memories, that some might describe as unhealthy are some of my most pleasant ones.

Work And The Man

The key isn't avoiding hard work but finding meaning in it. Practice and effort, when coupled with a sense of purpose, not only make you better at what you do but also make the journey itself fulfilling. There is one catch however, and that is that your payout should not just be your happiness in the moment, but it should be long lasting.

The best way to completely destroy your long term satisfaction is if the effort you are putting into something is not reciprocated or the nature of the work feels meaningless. It's an obvious privilege to recommend that one shall not work for exploitative employers but you owe yourself to get this right. With time you build trust in yourself, and the best way to put this trust to use, is to break out of exploitative relationships.

If you end up doing things you do not believe in, it will get to you. It will not just demotivate you and make you unhappy at work, it will eventually make every hour you spent miserable and eventually get to your health.

Other than sleeping, work is what you spent the most time with for a significant portion of your life. If that is not fulfilling a core pillar of what can provide happiness is not supporting you. I have seen people advocate for just not caring to fix the work aspect, instead to work less and spend more free time. I have not found that to work for me. Work needs to be fulfilling, even if work is just a few hours a day.

Dare To Commit

Life isn't about sampling everything; it’s about making deliberate choices and committing to the ones that matter. You don't need to date twenty people to find the right partner, nor do you need a network of hundred acquaintances to succeed. Similarly, you don't need to work at ten different companies to build a meaningful career. Those things can be hugely beneficial, don't get me wrong, but you can do more with less too. When you focus on taking one step at a time, choosing the best option available to you in that moment you can accomplish great things. Feel free to look to others for inspiration, but do not to compare what they have versus what you don't. Nothing good will come from that. Everyone's journey is unique, shaped by the opportunities they encounter and the paths they decide to follow. Value grows not with the breadth of options explored but with the depth of commitment to the path you've chosen.

Just as mastering a skill pays dividends, so does committing on your personal or professional journey. Even if the world around you shifts — like the rise of AI in software engineering — your experience and expertise aren't wasted. Your gained experience makes it much easier for you to adjust course and it will give you the necessary trust in yourself. It allows to leverage what you've learned in new ways. While it's true that choosing from limited options might not always lead to the “best” possible outcome, the time and effort you invest in your chosen path can often outweigh the hypothetical gains of a different choice. In many cases, mastery and fulfillment come not from chasing endless possibilities but from fully embracing the one path you're on and making it your own.

Date to Marry

To me this happened through a lucky accident but it's something I strongly believe in. I'm an agnostic, I don't hold strong religious beliefs but I do believe in the purpose of and benefits of a lasting marriage. When my wife and I met I did not think I was in a position in my life where I had interest, desire or necessity in a deep relationship, let alone to marry. We did not live in the same country when we met and we had a long distance relationship for almost a year. That kind of relationship (particularly when visa issues are involved) has one incredible benefit: you really have to commit to your relationship. It's expensive and you spend a lot of time talking and sharing intimate thoughts. It also forces you to make a concious decision if the two of you believe it's worth continuing. You don't have the option to just “test drive” it. It forces you to figure out all the hard things upfront. Career, values, ambitions, children, the whole thing. That's a very different experience to swiping right and see what comes from it.

That one year of intensive dating changed me. I started to recognize the benefits of committing to something on a much deeper level. It taught me that vulnerability and opening yourself up can be a beautiful thing. It showed me that there was a whole part to myself I did not look into. It showed me that really committing to something, opens up a whole new world of opportunity and it allowed us to really invest into our relationship.

When you commit to your partner fully you get a lot in the process. Yes, there are risks and while you're dating, you need to figure these things out. You need to know on a fundamental level that the person you're dating is going to be the one you want to be with for a lifetime. That's not easy, because no human is perfect. Yet if that is the goal, you can poke at the parts where dragons can be. Only in situations of stress and challenge will you truly find out how the other person works and if that works for you.

I have heard people talk about “going to IKEA” for a date. I think that's a brilliant idea. Imagining a life together and struggling a bit through conflict and resolution is exactly the right way to go about it.

Having Children

Very few things have so profoundly changed me as our first child.

Seeing children grow up is such a moving experience. I enjoy being with them in moments of achievements or sadness alike and I love when they surprise me in the morning with their newfound wisdom or after school with their proud achievements. It's fun to play with them, to help them learn new things and you can do things together you haven't done since your own childhood.

I'm lucky to have kids. I grew up in a society that has largely painted a pretty dark picture about having children but I do not share those views. We knew we wanted children and I'm glad we didn't wait. You can't cheat nature on this thing and at the present state of scientific development, things still are much harder if you try to have children late.

Nothing will ever be perfect. There were sleepless nights, there are the sicknesses that come in autumn with daycare and school. You need to arrange things in different ways than you were used to. You will hear a lot from parents and educators about what is is like to have children but the reality however is that I don't think it's possible to know how it is to have kids until you do. In a way you have to jump into the cold water and there is no going back.

There are some important prerequisites though, but I think differently about them now then I did before. I don't think that you need a lot of money or a stable career, but you need to have your marriage and house in order. The most important thing I learned about having children is that you first and foremost need to take care of yourself. Any stress you experience, you will pass on to your children and it will harm them in the process. This is really important. There are lots of dysfunctional households and bad parents and you should not have children if you can't take care of yourself.

Those are the important parts, but then there are superficial issues. I almost entirely opted out of reading parental advise books because I could feel how they stress me out. I found it easier to take on challenges as they arrive naturally. If you have a loving relationship with your spouse you can extend that to your children and learn how to deal with challenges calmly (or as calmly as you can). You need to be there for each other. Your children will not become more successful because you mastered breast feeding on day one or if you taught them sign language before they can talk. They will however be significantly better off if you can jump on a moment's notice to take care of your spouse or child when something goes wrong.

Our experience is unlikely to be your experience, but there are some things that are shared among parents. You grow above yourself when all the sudden become fully responsible for another human being and you can't opt out of it. It also invites you to reflect on yourself more and how you came to be the person that you are. I also don't think it makes you any less ambitious, but it changes how you define success for yourself. Your horizon opens up and it really makes you think more about the totality of your family rather than yourself.


My life isn't about perfection or constantly chasing what's next; it's about being present and committing to the things that matter. This is also what I'm passing on to my children. Whatever your journey may look like, I hope you find joy, purpose, and the courage to commit fully to it and that you found something useful in my writings.

Constraints are Good: Python's Metadata Dilemma

2024-11-26 08:00:00

There is currently an effort underway to build a new universal lockfile standard for Python, most of which is taking place on the Python discussion forum. This initiative has highlighted the difficulty of creating a standard that satisfies everyone. It has become clear that different Python packaging tools are having slightly different ideas in mind of what a lockfile is supposed to look like or even be used for.

In those discussions however also a small other aspect re-emerged: Python has a metadata problem. Python's metadata system is too complex and suffers from what I would call “lack of constraints”.

JavaScript: Example of Useful Constraints

JavaScript provides an excellent example of how constraints can simplify and improve a system. In JavaScript, metadata is straightforward. Whether you develop against a package locally or if you are using a package from npm, metadata represents itself the same way. There is a single package.json file that contains the most important metadata of a package such as name, version or dependencies. This simplicity imposes significant but beneficial constraints:

  • There is a 1:1 relationship between an npm package and its metadata. Every npm package has a single package.json file that is the source of truth of metadata. Metadata is trivially accessible, even programmatically, via require('packageName/package.json').
  • Dependencies (and all other metadata) are consistent across platforms and architectures. Platform-specific binaries are handled via a filter mechanism (os and cpu) paired with optionalDependencies. [1]
  • All metadata is static, and updates require explicit changes to package.json prior to distribution or installation. Tools are provided to manipulate that metadata such as npm version patch which will edit the file in-place.

These constraints offer several benefits:

  • Uniform behavior regardless of whether a dependency is installed locally or from a remote source. There is not even a filesystem layout difference between what comes from git or npm. This enables things like replacing an installed dependency with a local development copy, without change in functionality.
  • There is one singular source of truth for all metadata. You can edit package.json and any consumer of that metadata can just monitor that file for changes. No complex external information needs to be consulted.
  • Resolvers can rely on a single API call to fetch dependency metadata for a version, improving efficiency. Practically this also means that the resolver only needs to hit a single URL to retrieve all possible dependencies of a dependency. [2]
  • It makes auditing much easier because there are fewer moving parts and just one canonical location for metadata.

Python: The Cost of Too Few Constraints

In contrast, Python has historically placed very few constraints on metadata. For example, the old setup.py based build system essentially allowed arbitrary code execution during the build process. At one point it was at least strongly suggested that the version produced by that build step better match what is uploaded to PyPI. However, in practice, if you lie about the version that is okay too. You could upload a source distribution to PyPI that claims it's 2.0 but will in fact install 2.0+somethinghere or a completely different version entirely.

What happens is that both before a package is published to PyPI and when a package is installed locally after downloading, the metadata is generated from scratch. Not only does that mean the metadata does not have to match, it also means that it's allowed to be completely different. It's absolutely okay for a package to claim it depends on cool-dependency on your machine, but on uncool-dependency on my machine. Or to dependent on different packages depending on the time of the day of the phase of the moon.

Editable installs and caching are particularly problematic since metadata could become invalid almost immediately after being written. [3]

Some of this has been somewhat improved because the new pyproject.toml standard encourages static metadata. However build systems are entirely allowed to override that by falling back to what is called “dynamic metadata” and this is something that is commonly done.

In practice this system incurs a tremendous tax to everybody that can be easily missed.

  • Disjointed and complex metadata access: there is no clear relationship of PyPI package name and the installed Python modules. If you know what the PyPI package name is, you can access metadata via importlib.metadata. Metadata is not read from pyproject.toml, even if it's static, instead it takes the package name and it accesses the metadata from the .dist-info folder (most specifically the METADATA file therein) installed into site-packages.

  • Mandatory metadata re-generation: As a consequence if you edit pyproject.toml to edit a piece of metadata, you need to re-install the package for that metadata to be updated in the .dist-info. People commonly forget doing that, so desynchronized metadata is very common. This is true even for static metadata today!

  • Unclear cache invalidation: Because metadata can be dynamic, it's not clear when you should automatically re-install a package. It's not enough to just track pyproject.toml for changes when dynamic metadata is used. uv for instance has a really complex, explicit cache management system so one can help uv detect outdated metadata. This obviously is non-standardized, requires uv to understand version control systems and is also not shared with other tools. For instance if you know that the version information incorporates the git hash, you can tell uv to pay attention to git commits.

  • Fragmented metadata storage: even where generated metadata is stored is complex. Different systems have slightly different behavior for storing that metadata.

    • When working locally (eg: editable installs) what happens depends on the build system:
      • If setuptools is used, metadata written into two locations. The legacy <PACKAGE_NAME>.egg-info/PKG-INFO file. Additionally it's placed in the new location for metadata inside site-packages in a <PACKAGE_NAME>.dist-info/METADATA file.
      • If hatch and most other modern build systems are used, metadata is only written into site-packages. (into <PACKAGE_NAME>.dist-info/METADATA)
      • If no build system is configured it depends a bit on the installer. pip will even for an editable install build a wheel with setuptools, uv will only build a wheel and make the metadata available if one runs uv build. Otherwise the metadata is not available (in theory it could be found in pyproject.toml for as long as it's not dynamic).
    • For source distributions (sdist) first the build step happens as in the section before. Afterwards the metadata is thrown into a PKG-INFO file. It's currently placed in two locations in the sdist: PKG-INFO in the root and <PACKAGE_NAME>.egg-info/PKG-INFO. That metadata however I believe is only used for PyPI, when installing the sdist locally the metadata is regenerated from pyproject.toml (or if setuptools is used setup.py). That's also why metadata can change from what's in the sdist to what's there after installation.
    • For wheels the metadata is placed in <PACKAGE_NAME>.dist-info/METADATA exclusively. Wheels have static metadata, so no build step is taking place. What is in the wheel is always used.
  • Dynamic metadata makes resolvers slow: Dynamic metadata makes the job of resolvers and installers very hard and slows them down. Today for instance advanced resolvers like poetry or uv sometimes are not able to install the right packages, because they assume that dependency metadata is consistent across sdists and wheels. However there are a lot of sdists available on PyPI that publish incomplete dependency metadata (just whatever the build step for the sdist created on the developer's machine is what is cached on PyPI).

    Not getting this right can be the difference of hitting one static URL with all the metadata, and downloading a zip file, creating a virtualenv, installing build dependencies, generating an entire sdist and then reading the final generated metadata. Many orders of magnitude difference in time it takes to execute.

    This also extends to caching. If the metadata can constantly change, how would a resolver cache it? Is it required to build all possible source distributions to determine the metadata as part of resolving?

  • Cognitive complexity: The system introduces an enormous cognitive overhead which makes it very hard to understand for users, particularly when things to wrong. Incorrectly cached metadata can be almost impossible to debug for a user because they do not understand what is going on. Their pyproject.toml shows the right information, yet for some reason it behaves incorrectly. Most people don't know what "egg info" or "dist info" is. Or why an sdist has metadata in a different location than a wheel or a local checkout.

    Having support for dynamic metadata also means that developers continue to maintain elaborate and confusing systems. For instance there is a plugin for hatch that dynamically creates a readme [4], requiring even arbitrary Python code to run to display documentation. There are plugins to automatically change versions to incorporate git version hashes. As a result to figure out what version you actually have installed it's not just enough to look into a single file, you might have to rely on a tool to tell you what's going on.

Moving The Cheese

The challenge with dynamic metadata in Python is vast, but unless you are writing a resolver or packaging tool, you're not going to experience the pain as much. You might in fact quite enjoy the power of dynamic metadata. Unsurprisingly bringing up the idea to remove it is very badly received. There are so many workflows seemingly relying on it.

At this point fixing this problem might be really hard because it's a social problem more than a technical one. If the constraint would have been placed there in the first place, these weird use cases would never have emerged. But because the constraints were not there, people were free to go to town with leveraging it with all the consequences it causes.

I think at this point it's worth moving the cheese, but it's unclear if this can be done through a standard. Maybe the solution will be for tools like uv or poetry to warn if dynamic metadata is used and strongly discourage it. Then over time the users of packages that use dynamic metadata will start to urge the package authors to stop using it.

The cost of dynamic metadata is real, but it's felt only in small ways all the time. You notice it a bit when your resolver is slower than it has to, you notice it if your packaging tool installs the wrong dependency, you notice it if you need to read the manual for the first time when you need to reconfigure your cache-key or force a package to constantly reinstall, you notice it if you need to re-install your local dependencies over and over for them not to break. There are many ways you notice it. You don't notice it as a roadblock, just as a tiny, tiny tax. Except that is a tax we all pay and it makes the user experience significantly worse compared to what it could be.

The deeper lesson here is that if you give developers too much flexibility, they will inevitably push the boundaries and that can have significant downsides as we can see. Because Python's packaging ecosystem lacked constraints from the start, imposing them now has become a daunting challenge. Meanwhile, other ecosystems, like JavaScript's, took a more structured approach early on, avoiding many of these pitfalls entirely.

[1] You can see how this works in action for sentry-cli for instance. The @sentry/cli package declares all its platform specific dependencies as optionalDependencies (relevant package.json). Each platform build has a filter in its package.json for os and cpu. For instance this is what the arm64 linux binary dependency looks like: package.json. npm will attempt to install all optional dependencies, but it will skip over the ones that are not compatible with the current platform.
[2] For @sentry/cli at version 2.39.0 for instance this means that this singular URL will return all the information that a resolver needs: registry.npmjs.org/@sentry/cli/2.39.0
[3] A common error in the past was to receive a pkg_resources.DistributionNotFound exception when trying to run a script in local development
[4] I got some flak on Bluesky for throwing readme generators under the bus. While they do not present the same problem when it comes to metadata like dependencies and versions do, they do still increase the complexity. In an ideal world what you find in site-packages represents what you have in your version control and there is a README.md file right there. That's what you have in JavaScript, Rust and plenty of other ecosystems. What we have however is a build step (either dynamic or copying) taking that readme file, and placing it in a RFC 5322 header encoded file in a dist info. So instead of "command clicking" on a dependency and finding the readme, we need special tools or arcane knowledge if we want to read the readme files locally.

Playground Wisdom: Threads Beat Async/Await

2024-11-18 08:00:00

It's been a few years since I wrote about my challenges with async/await-based systems and how they just seem to not support back pressure well. A few years later, I do not think that this problem has subsided much, but my thinking and understanding have perhaps evolved a bit. I'm now convinced that async/await is, in fact, a bad abstraction for most languages, and we should be aiming for something better instead and that I believe to be thread.

In this post, I'm also going to rehash many arguments from very clever people that came before me. Nothing here is new, I just hope to bring it to a new group of readers. In particular, you should really consider these who highly influential pieces:

Your Child Loves Actor Frameworks

As programmers, we are so used to how things work that we make some implicit assumptions that really cloud our ability to think freely. Let me present you with a piece of code that demonstrates this:

def move_mouse():
    while mouse.x < 200:
        mouse.x += 5
        sleep(10)

def move_cat():
    while cat.x < 200:
        cat.x += 10
        sleep(10)

move_mouse()
move_cat()

Read that code and then answer this question: do the mouse and cat move at the same time, or one after another? I guarantee you that 10 out of 10 programmers will correctly state that they move one after another. It makes sense because we know Python and the concept of threads, scheduling and whatnot. But if you speak to a group of children familiar with Scratch, they are likely to conclude that mouse and cat move simultaneously.

The reason is that if you are exposed to programming via Scratch you are exposed to a primitive form of actor programming. The cat and the mouse are both actors. In fact, the UI makes this pretty damn clear, just that the actors are called “sprites”. You attach logic to a sprite on the screen and all these pieces of logic run at the same time. Mind-blowing. You can even send messages from sprite to sprite.

The reason I want you to think about this for a moment is that I think this is rather profound. Scratch is a very, very simple system and it's intended to teaching programming to young kids. Yet the model it promotes is an actor system! If you were to foray into programming via a traditional book on Python, C# or some other language, it's quite likely that you will only learn about threads at the very end. Not just that, it will likely make it sound really complex and scary. Worse, you will probably only learn about actor patterns in some advanced book that will bombard you with all the complexities of large scale applications.

There is something else though you should keep in mind: Scratch will not talk about threads, it will not talk about monads, it will not talk about async/await, it will not talk about schedulers. As far as you are concerned as a programmer, it's an imperative (though colorful and visual) language with some basic “syntax” support for message passing. Concurrency comes natural. A child can program it. It's not something to be afraid of.

Imperative Programming Is Not Inferior

The second thing I want you to take away is that imperative languages are not inferior to functional ones.

While probably most of us are using imperative programming languages to solve problems, I think we all have been exposed to the notion that it's inferior and not particularly pure. There is this world of functional programming, with monads and other things. This world have these nice things involving composition, logic and maths and fancy looking theorems. If you program in that, you're almost transcending to a higher plane and looking down to the folks who are stitching together if statements, for loops, make side effects everywhere, and are doing highly inappropriate things with IO.

Okay, maybe it's not quite as bad, but I don't think I'm completely wrong with those vibes. And look, I get it. I feel happy chaining together lambdas in Rust and JavaScript. But we should also be aware that these constructs are, in many languages, bolted on. Go, for instance, gets away without most of this, and that does not make it an inferior language!

So what you should keep in mind here is that there are different paradigms, and mentally you should try to stop thinking for a moment that functional programming has all its stuff figured out, and imperative programming does not.

Instead, I want to talk about how functional languages and imperative languages are dealing with “waiting”.

The first thing I want to back to is the example from above. Both of the functions (for the cat and the mouse) can be seen as separate threads of execution. When the code calls sleep(10) there's clearly an expectation by the programmer that the computer will temporarily pause the execution and continue later. I don't want to bore you with monads, so as my “functional” programming language, I will use JavaScript and promises. I think that's an abstraction that most readers will be sufficiently familiar with:

function moveMouseBlocking() {
  while (mouse.x < 200) {
    mouse.x += 5;
    sleep(10);  // a blocking sleep
  }
}

function moveMouseAsync() {
  return new Promise((resolve) => {
    function iterate() {
      if (mouse.x < 200) {
        mouse.x += 5;
        sleep(10).then(iterate);  // non blocking sleep
      } else {
        resolve();
      }
    }
    iterate();
  });
}

You can immediately see a challenge here: it's very hard to translate the blocking example into a non blocking example because all the sudden we need to find a way to express our loop (or really any control flow). We need to manually decompose it into a form of recursive function calling and we need the help of a scheduler and executor here to do the waiting.

This style obviously eventually became annoying enough to deal with that async/await was introduced to mostly restore the sanity of the old code. So it now can look more like this:

async function moveMouseAsync() {
  while (mouse.x < 200) {
    mouse.x += 5;
    await sleep(10);
  }
}

Behind the scenes though, nothing has really changed, and in particular, when you call that function, you just get an object that encompasses the “composition of the computation”. That object is a promise which will eventually hold the resulting value. In fact, in some languages like C#, the compiler will really just transpile this into chained function calls. With the promise in hand, you can await the result, or register a callback with then which gets invoked if this thing ever runs to completion.

For a programmer, I think async/await is clearly understood as some sort of neat abstraction — an abstraction over promises and callbacks. However strictly speaking, it's just worse than where we started out, because in terms of expressiveness, we have lost an important affordance: we cannot freely suspend.

In the original blocking code, when we invoked sleep we suspended for 10 milliseconds implicitly; we cannot do the same with the async call. Here we have to “await” the sleep operation. This is the crucial aspect of why we're having these “colored functions”. Only an async function can call another async function, as you cannot await in a sync function.

Halting Problems

The above example shows another problem that async/await causes: what if we never resolve? A normal function call eventually returns, the stack unwinds, and we're ready to receive the result. In an async world, someone has to call resolve at the very end. What if that is never called? Now in theory, that does not seem all that different from someone calling sleep() with a large number to suspend for a very long time, or waiting on a pipe that never gets data sent into. But it is different! In one case, we keep the call stack and everything that relates to it alive; in another case, we just have a promise and are waiting for independent garbage collection with everything already unwound.

Contract wise, there is absolutely nothing that says one has to call resolve. As we know from theory the halting problem is undecidable so it's going to be actually impossible to know if someone will call resolve or not.

That sounds pedantic, but it's very important because promises/futures and async/await are making something strictly worse than not having them. Let's consider a JavaScript promise to be the most canonical example of what this looks like. A promise is created by an anonymous function, that is invoked to eventually call resolve. Take this example:

let neverSettle = new Promise((resolve) => {
  // this function ends, but we never called resolve
});

Let me clarify first that this is not a JavaScript specific problem, but it's nice to show it this way. This is a completely legal thing! It's a promise, that never resolves. That is not a bug! The anonymous function in the promise itself will return, the stack will unwind, and we are left with a “pending” promise that will eventually get garbage collected. That is a bit of a problem because since it will never resolve, you can also never await it.

Think of the following example, which demonstrates this problem a bit. In practice you might want to reduce how many things can work at once, so let's imagine a system that can handle up to 10 things that run concurrently. So we might want to use a semaphore to give out 10 tokens so up to 10 things can run at once; otherwise, it applies back pressure. So the code looks like this:

const semaphore = new Semaphore(10);

async function execute(f) {
  let token = await semaphore.acquire();
  try {
    await f();
  } finally {
    await semaphore.release(token);
  }
}

But now we have a problem. What if the function passed to the execute function returns neverSettle? Well, clearly we will never release the semaphore token. This is strictly worse compared to blocking functions! The closest equivalent would be a stupid function that calls a very long running sleep. But it's different! In one case, we keep the call stack and everything that relates to it alive; in the other case case we just have a promise that will eventually get garbage collected, and we will never see it again. In the promise case, we have effectively decided that the stack is not useful.

There are ways to fix this, like making promise finalization available so we can get informed if a promise gets garbage collected etc. However I want to point out that as per contract, what this promise is doing is completely acceptable and we have just caused a new problem, one that we did not have before.

And if you think Python does not have that problem, it does too. Just await Future() and you will be waiting until the heat death of the universe (or really when you shut down your interpreter).

The promise that sits there unresolved has no call stack. But that problem also comes back in other ways, even if you use it correctly. The decomposed functions calling functions via the scheduler flow means that now you need extra affordances to stitch these async calls together into full call stacks. This all creates extra problems that did not exist before. Call stacks are really, really important. They help with debugging and are also crucial for profiling.

Blocking is an Abstraction

Okay, so we know there is at least some challenge with the promise model. What other abstractions are there? I will make the argument that a function being able to “suspend” a thread of execution is a bloody great capability and abstraction. Think of it for a moment: no matter where I am, I can say I need to wait for something and continue later where I left off. This is particularly crucial to apply back-pressure if you decide to need it later. The biggest footgun in Python asyncio remains that write is non blocking. That function will stay problematic forever and you need to follow up with await s.drain() to avoid buffer bloat.

In particular it's an important abstraction because in the real world we have constantly faced with things in fact not being async all the time, and some of the things we think might not block, will in fact block. Just like Python did not think that write should be able to block when it was designed. I want to give you a colorful example of this. Why is the following code blocking, and what is?

def decode_object(idx):
    header = indexes[idx]
    object_buf = buffer[header.start:header.start + header.size]
    return brotli.decompress(object_buf)

It's a bit of a trick question, but not really. The reason it's blocking is because memory access can be blocking! You might not think of it this way, but there are many reasons why just touching a memory region can take time. The most obvious one is memory-mapped files. If you're touching a page that hasn't been loaded yet, the operating system will have to shovel it into memory before returning back to you. There is no “await touching this memory” expression, because if there were, we would have to await everywhere. That might sound petty but blocking memory reads were at the source of a series of incidents at Sentry [1].

The trade-off that async/await makes today is that the idea is that not everything needs to block or needs to suspend. The reality, however, has shown me that many more things really want to suspend, and if a random memory access is a case for suspending, then is the abstraction worth anything?

So maybe to allow any function call block and suspend really was the right abstraction to begin with.

But then we need to talk about spawning threads next, because a single thread is not worth much. The one affordance that async/await system gives you that you don't have otherwise, is actually telling two things to run concurrently. You get that by starting the async operation and deferring the awaiting to later. This is where I will have to concede that async/await has something going for it. It moves the reality of concurrent execution right into the language. The reason concurrency comes so natural to a Scratch programmer is that it's right there, so async/await solves a very similar purpose here.

In a traditional imperative language based on threads, the act of spawning a thread is usually hidden behind a (often convoluted) standard library function. More annoyingly threads very much feel bolted on and completely inadequate to even to the most basic of operations. Because not only do we want to spawn threads, we want to join on them, we want to send values across thread boundaries (including errors!). We want to wait for either a task to be done, or a keyboard input, messages being passed etc.

Classic Threading

So lets focus on threads for a second. As said before, what we are looking for is the ability for any function to yield / suspend. That's what threads allow us to do!

When I am talking about “threads” here, I'm not necessarily referring to a specific kind of implementation of threads. Think of the example of promises from above for a moment: we had the concept of “sleeping”, but we did not really say how that is implemented. There is clearly some underlying scheduler that can enable that, but how that takes places is outside the scope of the language. Threads can be like that. They could be real OS threads, they could be virtual and be implemented with fibers or coroutines. At the end of the day, we don't necessarily have to care about it as developer if the language gets it right.

The reason this matters is that when I talk about “suspending” or “continuing somewhere else,” immediately the thought of coroutines and fibers come to mind. That's because many languages that support them give you those capabilities. But it's good to step back for a second and just think about general affordances that we want, and not how they are implemented.

We need a way to say: run this concurrently, but don't wait for it to return, we want to wait later (or never!). Basically, the equivalent in some languages to call an async function, but to not await. In other words: to schedule a function call. And that is, in essence, just what spawning a thread is. If we think about Scratch: one of the reasons concurrency comes natural there is because it's really well integrated, and a core affordance of the language. There is a real programming language that works very much the same: go with its goroutines. There is syntax for it!

So now we can spawn, and that thing runs. But now we have more problems to solve: synchronization, waiting, message passing and all that jazz are not solved. Even Scratch has answers to that! So clearly there is something else missing to make this work. And what even does that spawn call return?

A Detour: What is Async Even

There is an irony in async/await and that irony is that it exists in multiple languages, it looks completely the same on the surface, but works completely different under the hood. Not only that, the origin stories of async/await in different languages are not even the same.

I mentioned earlier that code that can arbitrary block is an abstraction of sorts. That abstraction for many applications really only makes sense is if the CPU time while you're blocking can be used in other useful ways. On the one hand, because the computer would be pretty bored if it was only doing things in sequence, on the other hand, because we might need things to run in parallel. At times as programmers we need to do two things to make progress simultaneously before we can continue. Enter creating more threads. But if threads are so great, why all that talking about coroutines and promises that underpins so much of async/await in different languages?

I think this is the point where the story actually becomes confusing quickly. For instance JavaScript has entirely different challenges than Python, C# or Rust. Yet somehow all those languages ended up with a form of async/await.

Let's start with JavaScript. JavaScript is a single threaded language where a function scope cannot yield. There is no affordance in the language to do that and threads do not exist. So before async/await, the best you could do is different forms of callback hell. The first iteration of improving that experience was adding promises. async/await only became sugar for that afterward. The reason that JavaScript did not have much choice here is that promises was the only thing that could be accomplished without language changes, and async/await is something that can be implemented as a transpilation step. So really; there are no threads in JavaScript. But here is an interesting thing that happens: JavaScript on the language level has the concept of concurrency. If you call setTimeout, you tell the runtime to schedule a function to be called later. This is crucial! In particular it also means that a promise created, will be scheduled automatically. Even if you forget about it, it will run!

Python on the other hand had a completely different origin story. In the days before async/await, Python already had threads — real, operating system level threads. What it did not have however was the ability for multiple of those threads to run in parallel. The reason for this obviously the GIL (Global Interpreter Lock). However that “just” makes things not to scale to more than one core, so let's ignore that for a second. Because it had threads, it also rather early had people experiment with implementing virtual threads in Python. Back in the day (and to some extend today) the cost of an OS level thread was pretty high, so virtual threads were seen as a fast way to spawn more of these concurrent things. There were two ways in which Python got virtual threads. One was the Stackless Python project, which was an alternative implementation of Python (many patches for cpython rather) that implemented what's called a “stackless VM” (basically a VM that does not maintain a C stack). In short, what that enabled is implementing something that stackless called “tasklets” which were functions that could be suspended and resumed. Stackless did not have a bright future because the stackless nature meant that you could not have interleaving Python -> C -> Python calls and suspend with them on the stack.

There was a second attempt in Python called “greenlet”. The way greenlet worked was implementing coroutines in a custom extension module. It is pretty gnarly in its implementation, but it does allow for cooperative multi tasking. However, like stackless, that did not win out. Instead, what actually happened is that the generator system that Python had for years was gradually upgraded into a coroutine system with syntax support, and the async system was built on top of that.

One of the consequences of this is that it requires syntax support to suspend from a coroutine. This meant that you cannot implement a function like sleep that, when called, yields to a scheduler. You need to await it (or in earlier times you could use yield from). So we ended up with async/await because of how coroutines work in Python under the hood. The motivation for this was that it was seen as a positive thing that you know when something suspends.

One interesting consequence of the Python coroutine model is that at least on the coroutine model it can transcend OS level threads. I could make a coroutine on one thread, ship it off to another, and continue it there. In practice, that does not work because once hooked up with the IO system, it cannot travel to another event loop on another thread any more. But you can already see that fundamentally it does something quite different to JavaScript. It can travel between threads at least in theory; there are threads; there is syntax to yield. A coroutine in Python will also start out with not running, unlike in JavaScript where it's effectively always scheduled. This is also in parts because the scheduler in python can be swapped out, and there are competing and incompatible implementations.

Lastly let's talk about C#. Here the origin story is once again entirely different. C# has real threads. Not only does it have real threads, it also has per-object locks and absolutely no problems with dealing with multiple threads running in parallel. But that does not mean that it does not have other issues. The reality is that threads alone are just not enough. You need to synchronize and talk between threads quite often and sometimes you just need to wait. For instance you need to wait for user input. You still want to do something, while you're stuck there processing that input. So over time .NET introduced “tasks” which are an abstraction over async operations. They are part of the .NET threading system and the way you interact with them is that you write your code in there, you can suspend from tasks with syntax. .NET will run the task on the current thread, and if you do some blocking you stay blocked. This is in that sense, quite different from JavaScript where while no new “thread” is created, you pend the execution in the scheduler. The reason it works this way in .NET is that some of the motivation of this system was to allow UI triggered code to access the main UI thread without blocking it. But the consequence again is, that if you block for real, you just screwed something up. That however is also why at least at one point what C# did was just to splice functions into chained closures whenever it hit an await. It just decomposes one logical piece of code into many separate functions.

I really don't want to go into Rust, but Rust's async system is probably the weirdest of them all because it's polling-based. In short: unless you actively “wait” for a task to complete, it will not make progress. So the purpose of a scheduler there is to make sure that a task actually can make progress. Why did rust end up with async/await? Primarily because they wanted something that works without a runtime and a scheduler and the limitations of the borrow checker and memory model.

Of all those languages, I think the argument for async/await is the strongest for Rust and JavaScript. Rust because it's a systems language and they wanted a design that works with a limited runtime. JavaScript to me also makes sense because the language does not have real threads, so the only alternative to async/await is callbacks. But for C# the argument seems much weaker. Even the problem of having to force code to run on the UI thread could be just used by having a scheduling policy for virtual threads. The worst offender here in my mind is Python. async/await has ended up with a really complex system where the language now has coroutines and real threads, different synchronization primitives for each and async tasks that end up being pinned to one OS thread. The language even has different futures in the standard library for threads and async tasks!

The reason I wanted you to understand all this is that all these different languages share the same syntax, yet what you can do with it is completely different. What they all have in common is that async functions can only be called by async functions (or the scheduler).

What Async Isn't

Over the years I heard a lot of arguments about why for instance Python ended up with async/await and some of the arguments presented don't hold up to scrutiny from my perspective. One argument that I have heard repeatedly is that if you control when you suspend, you don't need to deal with locking or synchronization. While there is some truth to that (you don't randomly suspend), you still end up with having to lock. There is still concurrency so you need to still protect all your stuff. In Python in particular this is particularly frustrating because not only do you have colored functions, you also have colored locks. There are locks for threads and there are locks for async code, and they are different.

There is a very good reason why I showed the example above of the semaphore: semaphores are real in async programming. They are very often needed to protect a system from taking on too much work. In fact, one of the core challenges that many async/await-based programs suffer from is bloating buffers because there is an inability to exert back pressure (I once again point you to my post on that). Why can they not? Because unless an API is async, it is forced to buffer or fail. What it cannot do, is block.

Async also does not magically solve the issues with GIL in Python. It does not magically make real threads appear in JavaScript, it does not solve issues when random code starts blocking (and remember, even memory access can block). Or you very slowly calculate a large Fibonacci number.

Threads are the Answer, Not Coroutines

I already alluded to this above a few times, but when we think about being able to “suspend” from an arbitrary point in time, we often immediately think of coroutines as a programmers. For good reasons: coroutines are amazing, they are fun, and every programming language should have them!

Coroutines are an important building block, and if any future language designer is looking at this post: you should put them in.

But coroutines should be very lightweight, and they can be abused in ways that make it very hard to follow what's going on. Lua, for instance, gives you coroutines, but it does not give you the necessary structure to do something with them easily. You will end up building your own scheduler, your own threading system, etc.

So what we really want is where we started out with: threads! Good old threads!

The irony in all of this is, that the language that I think actually go this right is modern Java. Project Loom in Java has coroutines and all the bells and whistles under the hood, but what it exposes to the developer is good old threads. There are virtual threads, which are mounted on carrier OS threads, and these virtual threads can travel from thread to thread. If you end up issuing a blocking call on a virtual thread, it yields to the scheduler.

Now I happen to think that threads alone are not good enough! Threads require synchronization, they require communication primitives etc. Scratch has message passing! So there is more that needs to be built to make them work well.

I want to follow up on an another blog post about what is needed to make threads easier to work with. Because what async/await clearly innovated is bringing some of these core capabilities closer to the user of the language, and often modern async/await code looks easier to read than traditional code using threads is.

Structured Concurrency and Channels

Lastly I do want to say something nice about async/await and celebrate the innovations that it has brought up. I believe that this language feature singlehandedly drove some crucial innovation about concurrent programming by making it widely accessible. In particular it moved many developers from a basic “single thread per request” model to breaking down tasks into smaller chunks, even in languages like Python. For me, the biggest innovation here goes to Trio, which introduced the concept of structured concurrency via its nursery. That concept has eventually found a home even in asyncio with the concept of the TaskGroup API and is finding its way into Java.

I recommend you to read Nathaniel J. Smith's Notes on structured concurrency, or: Go statement considered harmful for a much better introduction. However if you are unfamiliar with it, here is my attempt of explaining it:

  • There is a clear start and end of work: every thread or task has a clear beginning and end, which makes it easier to follow what each thread is doing. All threads spawned in the context of a thread, are known to that thread. Think of it like creating a small team to work on a task: they start together, finish together, and then report back.
  • Threads don't outlive their parent: if for whatever reason the parent is done before the children threads, it automatically awaits before returning.
  • Error propagate and cause cancellations: If something goes wrong in one thread, the error is passed back to the parent. But more importantly, it also automatically causes other child threads to cancel. Cancellations are a core of the system!

I believe that structured concurrrency needs to become a thing in a threaded world. Threads must know their parents and children. Threads also need fo find convenient ways to ways to pass their success values back. Lastly context should flow from thread to thread implicity through context locals.

The second part is that async/await made it much more apparent that tasks / threads need to talk with each other. In particular the concept of channels and selecting on channels became more prevalent. This is an essential building block which I think can be further improved upon. As food for thought: if you have structured concurrency, in principle each thread's return value really can be represented as a buffered channel attached to the thread, holding up to a single value (successful return value or error) that you can select on.

Today, although no language has perfected this model, thanks to many years of experimentation, the solution seems clearer than ever, with structured concurrency at its core.

Conclusion

I hope I was able to demonstrate to you that async/await has been a mixed bag. It brought some relief from callback hell, but it also saddled us with new issues like colored functions, new back-pressure challenges, and introduced new problems all entirely such as promises that can just sit around forever without resolving. It has also taken away a lot of utility that call stacks brought, in particular for debugging and profiling. These aren't minor hiccups; they're real obstacles that get in the way of the straightforward, intuitive concurrency we should be aiming for.

If we take a step back, it seems pretty clear to me that we have veered off course by adopting async/await in languages that have real threads. Innovations like Java's Project Loom feel like the right fit here. Virtual threads can yield when they need to, switch contexts when blocked, and even work with message-passing systems that make concurrency feel natural. If we free ourselves from the idea that the functional, promise system has figured out all the problems we can look at threads properly again.

However at the same time async/await has moved concurrent programming to the forefront and has resulted in real innovation. Making concurrency a core feature of the language (via syntax even!) is a good thing. Maybe the increased adoption and people struggling with it, was what made structured concurrency a real thing in the Python async/await world.

Future language design should rethink concurrency once more: Instead of adopting async/await, new languages should model themselves more like Java's Project Loom but with more user friendly primitives. But like Scratch, it should give programmers really good APIs that make concurrency natural. I don't think actor frameworks are the right fit, but a combination of structured concurrency, channels, syntax support for spawning/joining/selecting will go a long way. Watch this space for a future blog post about some things I found to work better than others.

[1] Sentry works with large debug information files such as PDB or DWARF. These files can be gigabytes in size and we memory map terabytes of preprocessed files into memory during processing. Memory mapped files can block is hardly a surprise, but what we learned in the process is that thanks to containerization and memory limits, you can easily navigate yourself into a situation where you spend much more time on page faults than you expected and the system crawls to a halt.