RSS preview of Geoffrey Litt

Rss preview of Blog of Geoffrey Litt

Code like a surgeon

2025-10-24 22:59:00

A lot of people say AI will make us all “managers” or “editors”…but I think this is a dangerously incomplete view!

Personally, I’m trying to code like a surgeon.

A surgeon isn’t a manager, they do the actual work! But their skills and time are highly leveraged with a support team that handles prep, secondary tasks, admin. The surgeon focuses on the important stuff they are uniquely good at.

My current goal with AI coding tools is to spend 100% of my time doing stuff that matters. (As a UI prototyper, that mostly means tinkering with design concepts.)

It turns out there are a LOT of secondary tasks which AI agents are now good enough to help out with. Some things I’m finding useful to hand off these days:

Before attempting a big task, write a guide to relevant areas of the codebase
Spike out an attempt at a big change. Often I won’t use the result but I’ll review it as a sketch of where to go
Fix typescript errors or bugs which have a clear specification
Write documentation about what I’m building

I often find it useful to run these secondary tasks async in the background – while I’m eating lunch, or even literally overnight!

When I sit down for a work session, I want to feel like a surgeon walking into a prepped operating room. Everything is ready for me to do what I’m good at.

Mind the autonomy slider

Notably, there is a huge difference between how I use AI for primary vs secondary tasks.

For the core design prototyping work, I still do a lot of coding by hand, and when I do use AI, I’m more careful and in the details. I need fast feedback loops and good visibility. (eg, I like Cursor tab-complete here)

Whereas for secondary tasks, I’m much much looser with it, happy to let an agent churn for hours in the background. The ability to get the job done eventually is the most important thing; speed and visibility matter less. Claude Code has been my go-to for long unsupervised sessions but Codex CLI is becoming a strong contender there too, possibly my new favorite.

These are very different work patterns! Reminds me of Andrej Karpathy’s “autonomy slider” concept. It’s dangerous to conflate different parts of the autonomy spectrum – the tools and mindset that are needed vary quite a lot.

Your agent doesn’t need a career trajectory

The “software surgeon” concept is a very old idea – Fred Brooks attributes it to Harlan Mills in his 1975 classic “The Mythical Man-Month”. He talks about a “chief programmer” who is supported by various staff including a “copilot” and various administrators. Of course, at the time, the idea was to have humans be in these support roles.

OK, so there is a super obvious angle here, that “AI has now made this approach economically viable where it wasn’t before”, yes yes… but I am also noticing a more subtle thing at play, something to do with status hierarchies.

A lot of the “secondary” tasks are “grunt work”, not the most intellectually fulfilling or creative part of the work. I have a strong preference for teams where everyone shares the grunt work; I hate the idea of giving all the grunt work to some lower-status members of the team. Yes, junior members will often have more grunt work, but they should also be given many interesting tasks to help them grow.

With AI this concern completely disappears! Now I can happily delegate pure grunt work. And the 24/7 availability is a big deal. I would never call a human intern at 11pm and tell them to have a research report on some code ready by 7am… but here I am, commanding my agent to do just that!

Notion is for surgeons?

Finally I’ll mention a couple thoughts on how this approach to work intersects with my employer, Notion.

First, as an employee, I find it incredibly valuable right now to work at a place that is bullish on AI coding tools. Having support for heavy use of AI coding tools, and a codebase that’s well setup for it, is enabling serious productivity gains for me – especially as a newcomer to a big codebase.

Secondly, as a product – in a sense I would say we are trying to bring this way of working to a broader group of knowledge workers beyond programmers. When I think about how that will play out, I like the mental model of enabling everyone to “work like a surgeon”.

The goal isn’t to delegate your core work, it’s to identify and delegate the secondary grunt work tasks, so you can focus on the main thing that matters.

Weiser’s rant

Weiser was speaking at an MIT Media Lab event on “interface agents”. They were grappling with many of the same issues we’re discussing in 2025: how to make a personal assistant that automates tasks for you and knows your full context. They even had a human “butler” on stage representing an AI agent.

Everyone was super excited about this… except Weiser. He was opposed to the whole idea of agents! He gave this example: how should a computer help you fly a plane and avoid collisions?

The agentic option is a “copilot” — a virtual human who you talk with to get help flying the plane. If you’re about to run into another plane it might yell at you “collision, go right and down!”

Weiser offered a different option: design the cockpit so that the human pilot is naturally aware of their surroundings. In his words: “You’ll no more run into another airplane than you would try to walk through a wall.”

Weiser’s goal was an “invisible computer"—not an assistant that grabs your attention, but a computer that fades into the background and becomes "an extension of [your] body”.

Weiser’s 1992 slide on airplane interfaces

HUDs

There’s a tool in modern planes that I think nicely illustrates Weiser’s philosophy: the Head-Up Display (HUD), which overlays flight info like the horizon and altitude on a transparent display directly in the pilot’s field of view.

A HUD feels completely different from a copilot! You don’t talk to it. It’s literally part invisible—you just become naturally aware of more things, as if you had magic eyes.

Designing HUDs

OK enough analogies. What might a HUD feel like in modern software design?

One familiar example is spellcheck. Think about it: spellcheck isn’t designed as a “virtual collaborator” talking to you about your spelling. It just instantly adds red squigglies when you misspell something! You now have a new sense you didn’t have before. It’s a HUD.

(This example comes from Jeffrey Heer’s excellent Agency plus Automation paper. We may not consider spellcheck an AI feature today, but it’s still a fuzzy algorithm under the hood.)

Spellcheck makes you aware of misspelled words without an “assistant” interface.

Here’s another personal example from AI coding. Let’s say you want to fix a bug. The obvious “copilot” way is to open an agent chat and ask it to do the fix.

But there’s another approach I’ve found more powerful at times: use AI to build a custom debugger UI which visualizes the behavior of my program! In one example, I built a hacker-themed debug view of a Prolog interpreter.

With the debugger, I have a HUD! I have new senses, I can see how my program runs. The HUD extends beyond the narrow task of fixing the bug. I can ambiently build up my own understanding, spotting new problems and opportunities.

Both the spellchecker and custom debuggers show that automation / “virtual assistant” isn’t the only possible UI. We can instead use tech to build better HUDs that enhance our human senses.

Tradeoffs

I don’t believe HUDs are universally better than copilots! But I do believe anyone serious about designing for AI should consider non-copilot form factors that more directly extend the human mind.

So when should we use one or the other? I think it’s quite tricky to answer that, but we can try to use the airplane analogy for some intuition:

When pilots just want the plane to fly straight and level, they fully delegate that task to an autopilot, which is close to a “virtual copilot”. But if the plane just hit a flock of birds and needs to land in the Hudson, the pilot is going to take manual control, and we better hope they have great instruments that help them understand the situation.

In other words: routine predictable work might make sense to delegate to a virtual copilot / assistant. But when you’re shooting for extraordinary outcomes, perhaps the best bet is to equip human experts with new superpowers.

Meet Stevens

The assistant is called Stevens, named after the butler in the great Ishiguro novel Remains of the Day. Every morning it sends a brief to me and my wife via Telegram, including our calendar schedules for the day, a preview of the weather forecast, any postal mail or packages we’re expected to receive, and any reminders we’ve asked it to keep track of. All written up nice and formally, just like you’d expect from a proper butler.

Here’s an example. (I’ll use fake data throughout this post, beacuse our actual updates contain private information.)

Beyond the daily brief, we can communicate with Stevens on-demand—we can forward an email with some important info, or just leave a reminder or ask a question via Telegram chat.

That’s Stevens. It’s rudimentary, but already more useful to me than Siri!

Behind the scenes

Let’s break down the simple architecture behind Stevens. The whole thing is hosted on Val.town, a lovely platform that offers SQLite storage, HTTP request handling, scheduled cron jobs, and inbound/outbound email: a perfect set of capabilities for this project.

First, how does Stevens know what goes in the morning brief? The key is the butler’s notebook, a log of everything that Stevens knows. There’s an admin view where we can see the notebook contents—let’s peek and see what’s in there:

You can see some of the entries that fed into the morning brief above—for example, the parent-teacher conference has a log entry.

In addition to some text, entries can have a date when they are expected to be relevant. There are also entries with no date that serve as general background info, and are always included. You can see these particular background memories came from a Telegram chat, because Stevens does an intake interview via Telegram when you first get started:

With this notebook in hand, sending the morning brief is easy: just run a cron job which makes a call to the Claude API to write the update, and then sends the text to a Telegram thread. As context for the model, we include any log entries dated for the coming week, as well as the undated background entries.

Under the hood, the “notebook” is just a single SQLite table with a few columns. Here’s a more boring view of things:

But wait: how did the various log entries get there in the first place? In the admin view, we can watch Stevens buzzing around entering things into the log from various sources:

This is just some data importers populating the table:

An hourly data pull from the Google Calendar API
An hourly check of the local weather forecast using a weather API
I forward USPS Informed Delivery containing scans of our postal mail, and Stevens OCRs them using Claude
Inbound Telegram and email messages can also result in log entries
Every week, some “fun facts” get added into the log, as a way of adding some color to future daily updates.

This system is easily extensible with new importers. An importer is just any process that adds/edits memories in the log. The memory contents can be any arbitrary text, since they’ll just be fed back into an LLM later anyways.

Reflections

A few quick reflections on this project:

It’s very useful for personal AI tools to have access to broader context from other information sources. Awareness of things like my calendar and the weather forecast turns a dumb chatbot into a useful assistant. ChatGPT recently added memory of past conversations, but there’s lots of information not stored within that silo. I’ve written before about how the endgame for AI-driven personal software isn’t more app silos, it’s small tools operating on a shared pool of context about our lives.

“Memory” can start simple. In this case, the use cases of the assistant are limited, and its information is inherently time-bounded, so it’s fairly easy to query for the relevant context to give to the LLM. It also helps that some modern models have long context windows. As the available information grows in size, RAG and fancier approaches to memory may be needed, but you can start simple.

Vibe coding enables sillier projects. Initially, Stevens spoke with a dry tone, like you might expect from a generic Apple or Google product. But it turned out it was just more fun to have the assistant speak like a formal butler. This was trivial to do, just a couple lines in a prompt. Similarly, I decided to make the admin dashboard views feel like a video game, because why not? I generated the image assets in ChatGPT, and vibe coded the whole UI in Cursor + Claude 3.7 Sonnet; it took a tiny bit of extra effort in exchange for a lot more fun.

Try it yourself

Stevens isn’t a product you can run out of the box, it’s just a personal project I made for myself.

But if you’re curious, you can check out the code and fork the project here. You should be able to apply this basic pattern—a single memories table and an extensible constellation of cron jobs—to do lots of other useful things.

I recommend editing the code using your AI editor of choice with the Val Town CLI to sync to local filesystem.

Avoid the nightmare bicycle

2025-03-04 06:13:00

In my opinion, one of the most important ideas in product design is to avoid the “nightmare bicycle”.

Imagine a bicycle where the product manager said: “people don’t get math so we can’t have numbered gears. We need labeled buttons for gravel mode, downhill mode, …”

This is the hypothetical “nightmare bicycle” that Andrea diSessa imagines in his book Changing Minds.

As he points out: it would be terrible! We’d lose the intuitive understanding of how to use the gears to solve any situation we encounter. Which mode do you use for gravel + downhill?

It turns out, anyone can understand numbered gears totally fine after a bit of practice. People are capable!

Along the same lines: one of the worst misconceptions in product design is that a microwave needs to have a button for every thing you could possibly cook: “popcorn”, “chicken”, “potato”, “frozen vegetable”, bla bla bla.

You really don’t! You can just have a time (and power) button. People will figure out how to cook stuff.

Good designs expose systematic structure; they lean on their users’ ability to understand this structure and apply it to new situations. We were born for this.

Bad designs paper over the structure with superficial labels that hide the underlying system, inhibiting their users’ ability to actually build a clear model in their heads.

p.s. Changing Minds is one of the best books ever written about design and computational thinking, you should go read it.

Geoffrey Litt Modify

Rss preview of Blog of Geoffrey Litt

Mind the autonomy slider

Your agent doesn’t need a career trajectory

Notion is for surgeons?

Related reads

Weiser’s rant

HUDs

Designing HUDs

Tradeoffs

Further reading

Meet Stevens

Behind the scenes

Reflections

Try it yourself

Geoffrey LittModify

Rss preview of Blog of Geoffrey Litt

Mind the autonomy slider

Your agent doesn’t need a career trajectory

Notion is for surgeons?

Related reads

Weiser’s rant

HUDs

Designing HUDs

Tradeoffs

Further reading

Meet Stevens

Behind the scenes

Reflections

Try it yourself

The author's social media

Geoffrey Litt Modify