About thesephist

By Linus. About software research, creative work, and community. He has built over 100 side projects

The RSS's url is : https://thesephist.com/index.xml

Please copy to your reader or subscribe it with :

Preview of RSS feed of thesephist

Joining Thrive Capital

2024-09-11 21:35:25

I’ve joined Thrive Capital as an EIR and advisor, working with the Thrive team to support our founders in understanding and deploying AI thoughtfully, while furthering my own research and explorations around AI interpretability, knowledge representations, and interface design.

August was my last month as a part of Notion’s AI engineering team.

It’s been a privilege at Notion to get to work with a world-class team at the frontier of applied LLM products. Notion was one of the first companies to preview an LLM product, even before ChatGPT. I’m grateful to have been a part of the many ways the team has grown in the last almost-two years, across product launches, generational leaps in models, and many orders of magnitude of scale. We learned alongside the rest of the industry about prompt programming, retrieval, agents, evals, and how people are using AI day-to-day in their schools, companies, meetings, and life.

Notion’s “AI team” is really a team-of-teams spanning product, design, engineering, partnerships, finance, marketing, sales, and growth across the company. With all the talented Notinos that have joined since my first days, Notion’s AI team is poised to build and share a lot of beautiful, useful products in the next months, I have no doubt.

As Notion brings all the ideas we’ve explored (and more) to the world, I’ve been feeling an urge to (1) take a step back and understand how the broader world is adapting to this new technology, and (2) spend much more time on my research agendas around interpretability and interfaces.

After a brief break, I joined Thrive earlier this month.

As I’ve gotten to know the Thrive team over the last couple years, I’ve been consistently impressed with the thoughtfulness, depth of partnership with founders, and clarity of conviction behind investments across stages and industries. They’ve generously granted me an ambitious remit to pursue both of my goals as a part of the team, and based on my first week I’ve got a lot to be excited about on both fronts.

I hope to share what I learn and build along the way, as I always have.

Good creative tools are virtuosic and open-ended

2024-08-08 13:18:20

It’s not easy, but with a lifetime of dedication to the craft, many people become virtuoso guitar players. They learn the intricate nuances and details of the instrument and attain a level of mastery over it as if it were an extension of their mind. Some of the best instrumentalists even transcend traditional techniques of guitar performance and find their own ways of creating music with the instrument, like using it as a percussion voice. Alex Misko’s music is a beautiful example of virtuosity and novel techniques on the acoustic guitar.

No amount of such dedication can make you a virtuoso at Guitar Hero. Though also an instrument of sorts, Guitar Hero does not admit itself to virtuosity. At the cost of a lower barrier to entry, its ceiling of mastery and nuanced expression is capped. Its guardrails also prevent open-ended use. There is only one way to create music with Guitar Hero — exactly the way the authors of the video game intended.

I think great creative tools are more like the acoustic guitar than Guitar Hero. They:

  1. Allow for virtuosity and mastery, often enabled by a capacity for precise, nuanced expression; and
  2. Are open-ended; they can be used in surprising new ways that the creator didn’t anticipate.

Creative tools like Logic, Photoshop, or even the venerable paintbrush can be mastered. In these creative tools, artists can deftly close the gap between an image in their mind and the work they produce while worrying less about the constraints that the tool imposes on their expression. And where there are constraints, they are free to think beyond the tool’s designed purpose.

Both capacity for virtuosity and open-endedness contribute to an artist’s ability to use a medium to communicate what can’t be communicated in any other way. The converse is also true; if a creative instrument has a low ceiling for mastery and can only ever be used in one intended way, the operator can only use it to say what’s already been said.


Every established artistic practice and creative medium, whether acoustic instruments, digital illustrations, photography, or even programming, has standards of virtuoso-level mastery. The communities behind them all intimately know the huge gap that spans being able to just barely use it and being a master creative. Virtuosos attain their level of intimacy with their mediums from extensive experience and a substantial portfolio that often takes a lifetime to build up.

When new creative mediums appear, it’s never immediately obvious what virtuoso-level performance with that medium looks like. It takes time for virtuosity to take form, because several things have to happen concurrently to open up the gap between novices and virtuosos.

These changes happen in lockstep with each other: as tools improve, people must refine their sense for telling the bad from the good, and as the standard of mastery diverges from previous artistic medium’s standards, a new community of practice forms, slowly, over time. As a new community forms around the new medium, there is more space for its practitioners to develop their own sense of mastery and refine their toolset.


Electronics, software, and computing have all birthed their own communities of artistic practice through this process. I’m reminded of computational artists like Zach Lieberman. I have no doubt AI models will lead to another schism, another inception of a new legitimate creative community of practice with its own standard of virtuoso performance, cornucopia of tools, and unique set of values. AI models will become a creative medium as rich and culturally significant as animation and photography.

But we are clearly at the very beginning:

Over time, I think we will see creative tools built natively around AI separate itself from tools for augmenting existing mediums in applications like Photoshop. We’ll witness virtuoso levels of performance for expressing new ideas through this new medium, as difficult as it is for us to imagine now what such mastery might look like. We’ll see artists use neural networks and data in ways they were never meant to be used. Through it all, our capacity for creation can only expand.

I feel lucky to be present for the birth of a new medium.


Thanks to Weber Wong and Avery Klemmer for helpful discussions that sparked many ideas in this post.

What makes a good human interface?

2024-08-03 14:29:24

When I discuss interfaces on this blog, I’m most often referring to software interfaces: intermediating mechanisms from our human intentions to computers and the knowledge within them. But the concept of a human interface extends far before it and beyond it. I’ve been trying to build myself a coherent mental framework for how to think about human interfaces to knowledge and tools in general, even beyond computers.

This is the second of a pair of pieces on this topic. The other is Instrumental interfaces, engaged interfaces.


What makes a user interface good?

What are the qualities we want in something that mediates our relationship to our knowledge and tools?

When I visited Berlin earlier this year for a small conference on AI and interfaces, I spent my last free night in the city wandering and pondering whether there could be a general answer to this expansive question. My focus — both then and now — is on engaged interfaces, interfaces people use to deeply understand or explore some creative medium or knowledge domain, rather than to complete a specific well-defined task. (In this post, when I write interface, I specific mean this type.) The question of what makes an interface compelling is particularly interesting for this type because, as I noted in the other post in this series, inventing good primitives for engaged interfaces demands broad, open-ended exploration. I was hopeful that foundational principles could guide our exploration process and make our search more efficient.

I returned home from that trip with a hazy sense of those principles, which have since become more crisp through many conversations, research, and experiments.

What makes a good human interface?

A good engaged interface lets us do two things. It lets us

  1. see information clearly from the right perspectives, and

  2. express our intent as naturally and precisely as we desire.

To see and to express. This is what all great engaged interfaces — creative and exploratory tools — are about.

To see

A good engaged interface makes visible what is latent. In that way, they are like great maps. Good interfaces and maps enable us to more effectively explore some domain of information by visualizing and letting us see the right slices of a more complex, underlying reality.

Data visualizations and notations, the backbone of many kinds of graphical interfaces, are maps for seeing better. Primitives like charts, canvases, (reverse-)chronological timelines, calendars, are all based on taking some meaningful dimension of information, like time or importance, and mapping it onto some space.

If we take some liberties with the definition of a data visualization, we can consider interface patterns like the “timeline” in an audio or video editing app. In fact, the more capable a video editing tool, the greater variety of maps that tool offers users, enabling them to see different dimensions of the underlying project. An experienced video editor doesn’t just work with video clips on a timeline, but also has a “scope” for visualizing the distribution of color in a frame, color histograms and curves for higher-level tuning, audio waveforms, and even complex filtered and categorized views for navigating their vast library of source footage. These are all maps for seeing information clearly from diverse perspectives.

Straying even further, a table of contents is also a kind of data visualization, a map of a longer document that helps the reader see its structure at a glance. A zoomed-out thumbnail grid of a long paged document is yet another map in disguise, where the reader can see a different more scannable perspective on the underlying information.

Even when there isn’t an explicit construction of space in the interface, there is often a hidden metaphor gesturing at one. When we open a folder in a file browser, for example, we imagine hierarchies of folders above and below to which we can navigate. In a web browser, we imagine pages of history coming before and after the current page. When editing a document, the undo/redo “stack” gestures at a hidden chronological list of edits. Sometimes, these hidden metaphors are worth reifying into concrete visuals, like a list of changes in a file history view or a file tree in the sidebar of a code editor. But over time these inherently cartographic metaphors get collapsed into our imagination as we become more adept at seeing them in our minds.

To express

Once we’ve seen what is in front of us, we need to act on that understanding. Often that comes in the form of manipulating the thing being visualized — the thing we see in the interface. A good engaged interface also helps us here by transparently translating natural human interactions into precise intents in the domain of the tool.

Simple applications accomplish this by letting the user directly manipulate the element of interest. Consider the way map applications allow the user to explore places by dragging and zooming with natural gestures, or how the modern WIMP desktop interface lets users directly arrange windows that logically correspond to applications. When possible, directly manipulating the underlying information or objects of concern, the domain objects, minimizes cognitive load and learning curve.

Sometimes, tools can give users much more capability by inventing a new abstraction. Such an abstraction represents latent aspects of a domain object that couldn’t be individually manipulated before. In one type of implementation, a new abstraction shows individual attributes of some underlying object that can now be manipulated independently. We often see this in creative applications like Photoshop, Figma, or drag-and-drop website builders, where a sidebar or attribute panel shows independent attributes of a selected object. By interacting directly a color picker, font selector, or layout menus in the panel — the surrogate objects — the user indirectly manipulates the actual object of concern. To make this kind of interaction more powerful many of these tools also have a sophisticated notion of selection. “Layers” in image editing apps are a new abstraction that makes both selection and indirect attribute manipulation more useful.

A second type of surrogate object is focused not on showing individual attributes, but on revealing intermediate states that otherwise wouldn’t have been amenable to direct manipulation, because they weren’t concrete. Spreadsheet applications are full of UI abstractions that make intermediate states of calculation concrete. A typical spreadsheet will contain many cells that store some intermediate result, not to mention the concept of a formula itself, which is all about making the computation itself directly editable. Version control systems take the previously inaccessible object of past versions of a document or the concept of a single change — a “diff” — and allow the user to directly manipulate them to undo or reorder edits.

Direct manipulation

All of the interfaces I mention above are examples of direct manipulation, a term dating back at least to 1983 for interfaces that:

  1. Make key objects for some task visible to the user, and
  2. Allow rapid, reversible, incremental action on the objects.

This kind of an interface lets us re-use our intuition for physical objects, movement, and space to see and express ideas in more abstract domains. An underrated benefit of direct manipulation is that it enables low-friction iteration and exploration of an idea space. Indeed, I think it’s fair to say that direct manipulation is itself merely a means to achieve this more fundamental goal: let the user easily iterate and explore possibilities, which leads to better decisions.

In the forty years since, direct manipulation has eaten away at nearly every corner of the landscape of knowledge tools. But despite its ubiquity, the most interesting and important part of creative knowledge work — the understanding, coming up with ideas, and exploring options part — still mostly takes place in our minds, with paper and screens serving as scratchpads and memory more than true thinking aids. There are very few direct manipulation interfaces to ideas and thoughts themselves, except in specific constrained domains like programming, finance, and statistics where mathematical statements can be neatly reified into UI elements.

Of course, we have information tools that use direct manipulation principles, like graphical word processors and mind mapping software. But even when using these tools, a user has to read and interpret information on screen, transform and manipulate them in the mind, and then relay their conclusions back into the computer. The intermediate states of thinking are completely latent. In the best thinking tools today, we still can’t play with thoughts, only words.

We are in the pre-direct manipulation, program-by-command-line age of thinking tools, where we cannot touch and shape our thoughts like clay, where our tools let us see and manipulate words on a page, but not the concepts and ideas behind them.

This realization underlies all of my technical research and interface explorations, though I’m certainly not early nor unique in pursuing this vision. To me, solving this problem means freeing our most nuanced and ineffable ideas from our individual heads. It would give us a way to translate those thoughts into something we can hold in our hands and manipulate in the same way we break down an algebra problem with pencil and paper or graphs on a grid.

What could we accomplish if, instead of learning to hold the ever more complex problems in our world within our minds, we could break down and collaborate on them with tools that let us see them in front of us in full fidelity and bring our full senses and dexterity to bear on understanding and exploring the possibilities?

Instrumental interfaces, engaged interfaces

2024-08-01 14:26:32

When I discuss interfaces on this blog, I’m most often referring to software interfaces: intermediating mechanisms from our human intentions to computers and the knowledge within them. But the concept of a human interface extends far before it and beyond it. I’ve been trying to build myself a coherent mental framework for how to think about human interfaces to knowledge and tools in general, even beyond computers.

This is the first of a pair of pieces on this topic. The other is What makes a good human interface?.


Maps are my favorite kind of interface, so I want to begin with a brief story about a map I use every day.

New York, where I live, is a fast-moving river. Friends and neighbors move in just as quickly as they move out. In my three years living in the city, most of my friends and acquaintances have moved apartments every year. Too many to count have also moved in, and then away again, to some other city.

In these circles, one of my sources of childlike pride is the Manhattan subway map and schedule that’s now as clear in my memory as in the posters on station walls. I know where which trains run, at what times of the day, and which stops different trains skip during rush hour. Sometimes, when luck cooperates, I can beat transit apps’ time estimates with a clever series of transfers and brisk walks.

Obviously, I didn’t start this way.

When I first moved here, I was glued to Google Maps, following its directions and timestamps religiously. I relied on turn-by-turn directions to get around, but I also checked the iconic New York subway maps to see how many stations were left or if I was passing any landmarks or neighborhoods I liked. Over time, I learned to navigate my routes from the hazy map taking shape in my head, and now I can find the shortest path between any location in Manhattan below 100th St from memory, any time of day. (Brooklyn and Queens, I’m still working on…)

These two kinds of navigation aids — turn-by-turn directions and the subway map — were valuable to me in different ways. Though both maps of New York, I relied on the directions to reach a specific goal, namely getting to my destinations on time. The maps on the train, though, were more multipurpose. Sometimes I was looking for landmarks, other times simply getting oriented, and all along, I was also learning local geography by engaging with the map in a deeper way than the directions on my phone.

These two different uses of a map represent two different kinds of interfaces, one more focused on a specific goal, and the other more about the process of engaging with the interface.

On second thought, most interfaces have elements of both. So perhaps it’s better to say:

A human interface serves two different kinds of uses:

  1. Instrumental use. An instrumental user is goal-oriented. The user simply wants to get some good-enough solution to a problem they have, and couldn’t care less how it’s done.

    Here’s a good litmus test to find out whether an interface is instrumental: If the user could press a magic button and have their task at hand completed instantly to their requirements, would they want that? If so, you are likely looking at an instrumental interface.

    A turn-by-turn nav, a food delivery app, and a job application form are all interfaces that are used almost exclusively in an instrumental way. Let’s call these instrumental interfaces.

  2. Engaged use. Engaged users want to be intimately involved in the mechanics of an interface. They’re using the interface not just to check off a to-do item, but because they get some intrinsic value out of interacting with the interface, or they can only get what they want by deeply engaging with the interface’s domain of information.

    A musical instrument, a board game, and a flash card set are all engaged interfaces, because they’re used almost exclusively for the intrinsic value of spending time using them. The user wants to feel the joy of performing music, not just listen to a track on a computer. They want to enjoy playing a board game, not just be handed a victory or loss. They want to learn information they’ve written in a flash card by repeatedly engaging with it, not simply read a textbook for facts they may forget.

Many interfaces are sometimes instrumental and sometimes engaged. Consider:

Instrumental users have very different requirements, expectations, and goals from engaged users of an interface, and understanding the blend that applies to your particular context is a prerequisite to designing a good interface.

As I noted earlier, the ideal instrumental interface for any task or problem is a magic button that can (1) read the user’s mind perfectly to understand the desired task, and (2) perform it instantly and completely to the desired specifications.

In absence of such a perfect button, you, the designer, must conceive of the closest possible approximation you can manage within the limits of technology. In a sense, building an instrumental tool is very straightforward: you can work with your users to find out as much as you can about their intent when using the tool, and then engineer a solution that accomplishes that goal in the cheapest, fastest, most reliable way possible. The interesting details are in the necessary tradeoffs between how well you understand the user’s intent and how cheaply, quickly, and reliably you can deliver the result.

An engaged interface has no such top-line metric to optimize. Each kind of engaged interface has a different way it can be improved. A video game, for example, can sometimes be better by being more realistic and easier to learn. But this isn’t always true. Sometimes, the fun of a game comes from the challenge of learning its mechanics, or strange, surrealist laws of physics in the game world. A digital illustration tool is usually better off giving users more precise controls, but there are creative tools that lead artists to discover surprising results by adding uncertainty or elements of surprise.

In absence of a straightforward goal, to build a good engaged interface requires exploration and play. To discover the ideas that make good maps, data visualizations, video games, musical instruments, and social experiences, we need to try new ideas and see people experience them firsthand. This is a stranger environment in which to do design work, but I find the surprising nature of this process motivating and rewarding.

As a designer and engineer, I used to have a kind of moral aversion to instrumental tools and interfaces. I was drawn to creative, deeply engaging tools that I felt were most meaningful to my personal life, and viewed open-endedness as a kind of virtue unto itself.

I don’t think this way anymore.

These days, I think both instrumental and engaged interfaces are worth working on, and bring value and meaning to their users.

I do believe that the culture of modern life makes the benefits of instrumental interfaces much more legible than engaged ones: marketing tactics tout how fast and affordable things are. They talk about discounts and deals and out-compete the market based on easily quantifiable factors. Especially in business products, product makers view their customers as cold, calculating agents of reason that only pay for hard numbers. But the reality is more nuanced, and even the coldest corporate organizations are made of people. Look at the dominance of supposedly business tools like Notion or Slack. Those tools won not purely because it made employees more efficient workers, though these companies will lead with that argument. These tools won because they are beautiful and fun to use. In a tool that consumes hours of people’s days every week, beauty, taste, and fun matter, too.

Following any transformative leap in technology, it takes some time for popular design practice to catch up. This is especially the case for design practice of engaged interfaces, because unlike instrumental interfaces, where the goal is always straightforward and the leverage is in the enabling technology, better engaged interfaces often come from surprising new ideas that can only be discovered through a more open ended design exploration process.

There is always a delay between technological leaps and design explorations bearing fruit. I believe we’re going through just such a period right now. Most current work in “AI UI” is concerned about fulfilling the promise of faster, better, cheaper workflows with language models, used “out of the box” in conversational settings. This is because the implementation possibility is more obvious, and the goals are clear from the start. But there is still a second shoe to drop: interfaces that lean on foundation models somehow to enable humans to search, explore, understand, and engage with media deeper using completely new interaction mechanics we haven’t discovered yet. What direct manipulation is to the graphical user interface, we have yet to uncover for this new way to work with information.

A beginner’s guide to exploration

2024-07-30 11:52:45

I have a few friends who are in the midst of navigating very hazy idea spaces, guided by strong intuition and taste, but early enough in the process that very little is visible through the fog. Whenever I’m in this situation I feel a strange internal conflict. One side of me feels conviction in a direction of exploration, while the other part of me feels the risk of potential dead ends, afraid to go out on a limb and say I should put my full effort into pursuing where my hunch leads.

In fact, I’m navigating a version of this conflict right now:

Last week, I was catching up with one of my friends going through something similar, and ended up describing to them a mental model of exploration that I’ve developed in the hopes that it was helpful. It’s a framework I think my past self would have found meaningful, so I’m also sharing it with you, dear reader.

Let’s begin by understanding that exploration is far from the only way to do meaningful creative work in the world. Following popular rhetoric, it was easy for my past self to fall victim to the idea that somehow exploratory, open-ended, research-y work was more virtuous and interesting than the work of incremental improvements. On the contrary, I now think it’s quite likely that most valuable things were brought into this world by a million little incremental improvements, rather than by a lightning strike of a discovery. Incremental work can be just as rewarding as exploration. When you work in a known domain:

But you’re not here to work on optimizing what’s known. You’re here because something about the process of exploration draws you in. Maybe it’s the breadth of ideas you encounter in exploration. Perhaps it’s about the surprising things you learn on the way. But you’ve decided you’re more interested in navigating and mapping out unknown mazes of ideas than building cathedrals on well-paved streets.

Here’s what I would tell my past self:

If you’re on this path of exploration, there’s probably some voice inside of you that’s worth listening to. This is your taste, your gut feeling, whatever you want to call it. It’s the part of your intuition that tells you, “this might not look that great right now, but on the other side of this uncertain maze of ideas is something worth the trouble.” It’s worth listening to your intuition because this is what’ll set your perspective apart from everyone else who’s also looking around for problems to solve. In the beginning, your intuition is all that you have as you begin your exploration.

This voice — your intuition — is strongest at the outset. At the start of your exploration, when you have very few hard facts about what will work out in the end, you can lean on your inner voice, and the intrinsic excitement you feel about the potential of your idea, to keep you going. Nearly all of your motivation comes from your intuition at the starting line.

But excitement from your intuition is volatile. As time passes, and as you run experiments that don’t succeed, that innate excitement you felt at the beginning will begin to dwindle. To keep your momentum, you need to replenish your motivation over time with evidence that backs your intuition and your vision.

1-intuition-evidence-curves.png

There are many short-term faux-remedies to faltering intrinsic excitement — raising money, external validation, encouragements from friends and family — but all of these are like borrowing your motivation on debt. Eventually, they’ll run out, and you’ll be left with a bigger gap you need to fill with real-world evidence of the correctness of your vision. The only way to sustain your motivation long term is to rapidly replenish your excitement and motivation with evidence that your vision is correct.

Evidence comes from testing clear, falsifiable claims about your vision against the real world.

For my personal explorations, here’s a claim that I can use to find evidence to support my vision:

By using interfaces that expose the inner workings of machine learning models, experts in technical and creative jobs can uncover insight that was otherwise inaccessible to them.

This claim can still be more precise, but the bones are there. I can polish this into a testable statement about the real world, build prototypes, put it in the hands of experts, and collect evidence supporting my vision that interpretable machine learning models can be a foundation for transformative new information interfaces.

Very frequently, your initial claim will turn out to be incorrect. This doesn’t mean your vision is doomed. On the contrary, this often gives you a good opportunity to understand where your intuition and the real world diverged, and come up with a more precise statement of what initially felt right in your gut. While collecting evidence that supports your vision motivates you towards your goal, finding evidence that contradicts your claim can bring much-needed clarity to your vision.

Once you’ve collected enough evidence to support your initial vision, you’ll often see that the answers you found through your evidence-gathering experiments gave rise to many more new bets you’re interested in taking. That same taste for good ideas you had in the beginning, now strengthened by repeated contact with reality, sees new opportunities for exploration.

2-new-bets.png

So the cycle repeats. On the shoulders of your original vision, which is now a reality after all the evidence you’ve collected, you can make a new bet, starting with a renewed jump in motivation. You can collect more evidence against this new vision, and on and on.

All the skills involved in this cycle — clearly stating a vision, tastefully choosing an exploration direction, and collecting useful evidence from reality — will get sharper over time as you repeat this process. Every time you make a new bet on a new idea, you’ll be able to take bigger swings.

If you look at this chart a little differently, this story isn’t about making many disparate bets in a sequence. It’s about bringing the world towards your ever-more-refined statement of your vision. Every piece of evidence you collect to support your vision will be another stair step in this upward motion, and over time, the world behind your experimental evidence will come to resemble the world you originally envisioned.

3-leaps-of-faith.png

In this process, you’ll periodically have to find new ideas that give you motivation to take on those new bets. These are what someone else may call “leaps of faith”, but I don’t like that expression. A “leap” makes it sound as if the motivation is coming from nowhere, and you should blindly jump into a new idea without structure.

But there is structure. The “leap” is guided by your intuition and aimed at your vision, and as your internal motivation runs out, you’ll collect evidence from reality to close that gap.

Taking a step back, I think there’s another helpful perspective we can find. Chasing a vision is about a dance between two forces: the stories you believe and tell about a world you envision, and the world you build on top of reality to follow that story.

4-vision-storytelling.png

Your stories draw on your taste, your experience, and your intuition. These stories need to bottle up the special things that you’ve seen in your life that others aren’t seeing, and compel yourself and others to work through the process of turning that motivation into evidence.

Your evidence, in turn, needs to continually close the gap between reality and your storytelling. Over time, as your stories and your reality evolve around each other, you’ll bridge the distance between your vision and the real world.


This is all well and good as a way of thinking about exploration at a distance, but when I’m buried in the thick of confusing experiments and dwindling motivation, it’s difficult to know exactly what I need to do when I wake up in the morning.

During those times, I focus on two things:

  1. Clarity of vision and sharp storytelling. In the face of new bets or lackluster evidence, your success depends on how well you can convince yourself and others around you that there’s something worth digging deeper for just around the corner. You can do this by investing in clearer vision, communicated with sharp storytelling. If you’re finding that your evidence supports your vision, incorporate that evidence into your storytelling to move faster.
  2. Momentum behind my exploration, which is the best way to ensure you can convert your energy into useful evidence to support your vision. In open-ended exploratory search for good ideas, it’s hard to know exactly which next experiment is going to yield fruit. During these times, rather than trying to make every individual step successful, it’s better to invest in breadth and coverage of your exploration, and keeping momentum up will help you continue to cover new ground until you hit on the right evidence.

I find my life and work most invigorating when I can work next to people on exploratory paths. There is no special virtue in novelty or risk-taking, but exploratory work often leads to surprising new facts about how the world works, and if those surprises are well leveraged, exploration can yield transformative progress. At a personal level, I also find myself having the most fun when I get the fortune of working next to people doing exploratory work.

If you find yourself in such a path, as I do, I hope this way of thinking about your path ahead helps take some of the burden off your shoulders, and allows you to shape the world to your ever sharper vision for what ought to be.


Thanks to my friends at UC Berkeley for discussions that ultimately sparked this blog post. You know who you are.

A notebook for seeing with language models

2024-07-29 03:49:35

This post is a read-through of a talk I gave at a demo night at South Park Commons in July 2024.


I’ve spent my career investigating how computers could help us not just store the outputs of our thinking, but actively aid in our thinking process. Recently, this has involved building on top of advancements in machine learning.

In Imagining better interfaces to language models, I compared text-based, conversational interaction paradigms powered by language models to command-line interfaces, and proposed a new direction of research that would be akin to a GUI for language models, wherein we could directly see and manipulate what language models were “thinking”.

In Prism, I adapted some recent breakthroughs in model interpretability research to show how we can “see” what a model sees in a piece of input. I then used interpretable components of generative models to decompose various forms of media into their constituent concepts, and edit these documents and media in semantic latent space by manipulating the concepts that underlie them.

Most recently, in Synthesizer for thought, I began exploring the rich space of interface possibilities opened up by techniques like feature decomposition and steering outlined in the Prism project. These techniques allow us to understand and create media in entirely new ways at semantic and stylistic levels of abstraction.

These explorations were based on the premise that foundational understanding of pieces of media as mathematical objects opens up new kinds of science and creative tools based on our ability to study and imagine these forms of media at a deeper more complete level:

  1. Before advanced understanding of optics and our sense of sight, we referred to color by their name: red, blue, amber, turquoise. This was imprecise and ad-hoc. With improved science of light and our visual perception came a mathematical model of color and vision. We began to reason about color and light as mathematical objects: waves with frequencies and elements of a geometric space — color spaces, like RGB, HSL, YCrCb. This, combined with mechanical instruments to decompose and synthesize light into and out of its mathematical representation, gave us better creative tools, richer vocabulary, and a way to systematically map and explore a fuller space of colors.
  2. Before advanced understanding of sound and hearing, we created music out of natural materials – rubbing strings together, hitting things, blowing air through tubes of various lengths. As we advanced our understanding of the physics of sound, we could imagine new kinds of sounds as mathematical constructs, and then conjure them into reality, creating entirely new kinds of sounds we could never have created with natural materials. We could also sample sounds from the real world and modulate its mathematical structure. Not only that, backed by our mathematical model of sound, we could systematically explore the space of possible sounds and filters.

Interpretable language models can give us a similar foundational understanding of ideas and thoughts as mathematical objects. With Prism, I demonstrated how by treating them as mathematical objects, we can similarly break down and recompose sentences and ideas.

Today, I want to share with you some of my early design explorations for what future creative thinking tools may look like, based on these techniques and ideas. This particular exploration involves what I’ve labelled a computational notebook for ideas. This computational notebook is designed based on one cornerstone principle: documents aren’t a collection of words, but a collection of concepts.

01-just-a-notebook.png

Intro – Here, I’ve been thinking about a problem very dear to my heart. I want to explore creating a community of researchers interested in the future of computing, AI, and interface explorations. I simply write that thought in my notebook.

02-a-thought.png

Documents – When we think within our minds, a thought conjures up many threads of related ideas. In this notebook, writing down a thought brings up a gallery of related pieces of media, from images to PDFs to simple notes. These documents may be fetched from my personal library of collected documents, or from the wider Web.

03-discovery.png

But unlike many other tools with similar functionality, this notebook treats documents as a set of ideas. So in addition to what media are similar, we can see which concepts they share. As we hover over different documents, we can see what part of our input recalled that document.

04-token-attribution.png

05-more-token-attribution.png

Concepts – We’ve pivoted around our idea space anchored on documents. We can instead pivot around concepts. To do this, we open our concepts sidebar to see a full list of features, or concepts, our input contains. This view is like a brain scan of a model as it reads our input, or a DNA reading of our thought.

06-feature-selection.png

As we hover over different concepts in our input, we can see which pieces of media in our library share that particular concept.

07-more-feature-selection.png

Composition – A key part of thinking is inventing new ideas by combining existing ones. In this story, I’m interested in both large industrial research bets and AI. By selecting these concepts at once, we can see which pieces of media express this new, higher-level concept.

08-feature-composition.png

Heatmap – We can get an even more detailed view of the relationships between these individual concept components using a heatmap. In this view, we assign different colors to disparate concepts and see the way they co-occur in our media library. Through this view, we can not only discover documents that contain these concepts together, but also find deeper relationships like, perhaps, that many papers on AI and automation later discuss the idea of industrial shifts.

10-features-as-filters-with-colors.png

Abstraction – If we find a composition of ideas we like, such as for example these three concepts about large-scale creative collaboration, we can select and group them into a new, custom concept, which I’ll call collective creative project ideas.

11-creating-a-new-concept.png

This workflow, in which detailed exploration of existing knowledge leads us to come up with a new idea, is a key part of the creative and scientific process that current AI-based tools can’t capture very well, because existing tools rarely let you see a body of knowledge through the lens of key ideas.

Visualization – Now that we have a new lens in the form of this custom concept, we can explore our entire media library through this lens using data graphics. We can see the evolution and prominence of our new concept over time, for example.

12-visualizations.png

Surprise – We can compare the relationship and co-evolution of this idea with another one in our dataset, perhaps this concept related to interviews with creative leaders like film directors. This relationship is something I personally discovered in my real concept dataset while preparing this talk, and was a surprising relationship between ideas I didn’t expect.

13-visualizations-show-new-correlated-concepts.png

The ability for knowledge tools to surprise us with new insight is fundamental to the process of discovery, and too often ignored in AI-based knowledge tools, which largely rely on humans asking precise, well-formed questions about data.

Based on this clue, we may choose to examine this relationship more deeply, with a scatter plot. Notice that by treating documents as collections of ideas rather than words, we can benefit from the well-established field of data graphics to study an entirely new universe of documents and unstructured ideas.

14-correlated-concepts-scatter-plot.png

Conclusion – So, there we have it. A tool that lets us discover new concepts and relationships between ideas, and use them to see our knowledge and our world through a new lens.

15-conclusions.png

Good human interfaces let us see the world from new perspectives.

I really view language models as a new kind of scientific and creative instrument, like a microscope for a mathematical space of ideas. And as our understanding of this mathematical space and our instrument improves, I think we’ll see rapid progress in our ability to craft new ideas and imagine new worlds, just as we’ve seen for color and music.

Create things that come alive

2024-07-22 13:53:27

This is an excerpt from today’s issue of my weekly newsletter.


This week, I discovered The Objects of Our Life, a piece from Steve Jobs Archive about a talk he gave to an audience of designers in Aspen in 1983, one year before the Macintosh. I found this section aspirational:

One American industry after another—cars, televisions, cameras, watches—has lost market share to foreign competition, he explains, and he is worried that the same will happen with the computer if it becomes what he calls “one more piece-of-junk-object.” This moment, when “computers and society are out on a first date”—and here he interlaces his fingers to show how close that relationship could one day become—offers a rare opportunity that they must seize together. The audience is present at the birth of something monumental, and they can help define it. His voice rises with emotion. “We need help. We really, really need your help.”

Building technology is fundamentally an affair by humans, for other humans, and objects of technology ought to be ensconced in romance and history and all manners of color and details and textures of life. It ought to come alive in our environment. Technology is not what’s shiny and boxy and delivered in metallic wraps;

Technology is the active human interface with the material world.

This is from sci-fi author Ursula Le Guin’s A Rant About “Technology”. She goes on:

We have been so desensitized by a hundred and fifty years of ceaselessly expanding technical prowess that we think nothing less complex and showy than a computer or a jet bomber deserves to be called “technology” at all. As if linen were the same thing as flax — as if paper, ink, wheels, knives, clocks, chairs, aspirin pills, were natural objects, born with us like our teeth and fingers — as if steel saucepans with copper bottoms and fleece vests spun from recycled glass grew on trees, and we just picked them when they were ripe…

Within my little corner of the world, I think we’re often victim to an even more myopic pathology of this kind: we think that technology involves a computer, or spacecraft, or a microscope, or some other fragile thing cursed to be beholden to software. But writing is technology. Oral tradition is technology. Farming is technology. Roads are technology.

Technology exists woven into the physics and politics and romance of the world, and to disentangle it is to suck the life out of it, to sterilize it to the point of exterminating its reason for existence, to condemn it to another piece of junk.

If you consider yourself a technologist, here’s your imperative: build things that are unabashedly, beautifully tangled into all else in life — people and relationships, politics, emotion and pain, understanding or the lack thereof, being alone, being together, homesickness, adventure, victory, loss. Build things that come alive, and drag everything they touch into the realm of the living. And once in a while, if you are so lucky, may you create not just technology, but art — not only giving us life, but elevating us beyond.

Epistemic calibration and searching the space of truth

2024-07-08 09:04:23

I’ve long been enamored by DALL-E 2’s specific flavor of visual creativity. Especially given the text-to-image AI system’s age, it seems to have an incredible command over color, light and dark, the abstract and the concrete, and the emotional resonance that their careful combination can conjure.

A 4x3 grid of diverse AI-generated artworks: abstract shapes, cosmic scenes, portraits, still life, surrealism, and landscapes, showcasing DALL-E 2’s versatility in color, composition, and subject matter.

I picked these twelve images out of a much larger batch I generated with DALL-E 2 automatically by combining some randomly generated subjects with one of a few pre-written styles suffixes like “watercolor on canvas.”

Notice the use of shadows behind the body in the first image and the impressionistic use of color in the third image in the first row. I also love the softness of the silhouette in the top right, and the cyclops figure that seems to emerge beyond the horizon in the second row. Even in the most abstract images in this grid, the choice of color and composition result in something I would personally find not at all out of place in a gallery. There is surprising variety, creativity, and depth to these images, especially considering most of the prompts are as simple as giving form to metaphor, watercolor on canvas or a cozy bedroom, still life composition.

When I try to create similar kinds of images with what I believe to be the state-of-the-art text-to-image system today, Midjourney v6, here’s what I get with similar prompts.

A 4x4 grid of surreal yet highly detailed artworks: vivid abstract portraits, photorealistic hands with eyes, intricate cityscapes with cats, and lifelike sleeping figures, showcasing more intense colors, cohesive themes, and significantly higher detail than the previous collection.

These images are beautiful in their own right, and in their detail and realism they are impressive. I’m regularly stunned by the quality and realism in images generated by Midjourney. This isn’t meant to be another “AI-generated images aren’t artistic” post.

However, there is a very obvious difference in the styles of these two systems. After having generated a few hundred images from both systems, I find DALL-E 2 to be regularly:

By contrast, Midjourney’s images are biased towards:

Though Midjourney v6 is the most capable system like this in my experience, I encounter these same stylistic biases when using any modern model from the last couple of years, like Stable Diffusion XL and its derivatives, Google’s Imagen models, or even the current version of DALL-E (DALL-E 3). This is a shame, because I really like the variety and creativity of outputs from DALL-E 2, and it seems no modern systems are capable of reproducing similar results.

I’ve also done some head-to-head comparisons, including giving Midjourney examples of images from which it could transfer the style. Though Midjourney v6 successfully copies the original image’s style, it still has the hallmark richness in detail, as well as a clear tendency towards concrete subjects like realistic human silhouettes:

Side-by-side comparison of two abstract watercolor-style images: DALL-E 2 on left with simpler, isolated hand shapes; Midjourney v6 on right with more complex, blended forms and richer color palette and more realistic human forms.

Though none of this is a scientifically rigorous study, I’ve heard similar sentiments from other users of these systems, and observed similar “un-creative” behavior from modern language models like ChatGPT. In particular, I found this study of distribution of outputs before and after preference tuning on Llama 2 models interesting, because I think they successfully quantify the bland “ChatGPT voice” and show some concrete ways that reinforcement learning has produced accidental attractors in the model’s output space.

Why does this happen?

There are a few major differences between DALL-E 2 and other systems that we could hypothetically point to:

After thinking about it a bit and playing with some open source models, I think there are two big things going on here.

The first is that humans simply prefer brighter, more colorful, more detailed images when asked to pick a “better” visual in a side-by-side comparison, even though they would not necessarily prefer a world in which every single image was so hyper-detailed and hyper-colorful. So when models are tuned to human preferences, they naturally produce these hyper-detailed, hyper-colorful sugar-pop images.

The second is that when a model is trained using a method with feedback loops like reinforcement learning, it tends towards “attractors”, or preferred modes in the output space, and stops being an accurate model of reality in which every concept is proportionately represented in its output space. Preference tuning tunes models away from being accurate reflections of reality into being greedy reward-seekers happy to output a boring response if it expects the boring output to be rated highly.

Let’s investigate these ideas in more detail.

1. Are we comparing outputs, or comparing worlds?

If you’ve ever walked into an electronics store and looked at a wall full of TVs or listened to headphones on display, you’ll notice they’re all tuned to the brightest, loudest, most vibrant settings. Sometimes, the colors are so vibrant they make pictures look a bit unrealistic, with perfectly turquoise oceans and perfectly tan skin.

In general, when asked to compare images or music, people with untrained eyes and ears will pick the brightest images and the loudest music. Bright images create the illusion of vibrant colors and greater detail, and make other images that are less bright seem dull. Loud music has a similar effect, giving rise to a “loudness war” on public radio where tracks compete to catch listeners’ attention by being louder than other tracks before or after it.

Now, we are also in a loudness war of synthetic media.

Another way to think about this phenomenon is as a failure to align what we are asking human labelers to compare with what we actually want to compare.

When we build a preference dataset, what we should actually be asking is, “Is a world with a model trained on this dataset preferable to a world with a model trained on that dataset?” Of course, this is an intractable question to ask, because doing so would require somehow collecting human labels on every possible arrangement of a training dataset, leading to a combinatorial explosion of options. Instead, we approximate this by collecting human preference signals on each individual data point. But there’s a mismatch: just because humans prefer a more detailed image in one instance doesn’t mean that we’d prefer a world where every single image was maximally detailed.

2. Building attractors out of world models

Preference tuning methods like RLHF and DPO are fundamentally different from the kind of supervised training that goes on during model pretraining or a “basic” fine-tuning run with labelled data, because methods like RL and DPO involve feeding the model’s output back into itself, creating a feedback loop.

Whenever there are feedback loops in a system, we can study its dynamics — over time, as we iterate towards infinity, does the system settle into some state of stability? Does it settle into a loop? Does it diverge, accelerating towards some limit?

In the case of systems like ChatGPT and Midjourney, these models appear to converge under feedback loops into a few attractors, parts of the output space that the model has deemed reliably preferred, “safe” options. One attractor, for example, is a hyper-realistic detailed style of illustration. Another seems to be a fondness for geometric lines and transhumanist imagery when asked to generate anything abstract and vaguely positive.

I think recognizing this difference between base models and feedback-tuned models is important, because this kind of a preference tuning step changes what the model is doing at a fundamental level. A pretrained base model is an epistemically calibrated world model. It’s epistemically calibrated, meaning its output probabilities exactly mirror frequency of concepts and styles present in its training dataset. If 2% of all photos of waterfalls also have rainbows, exactly 2% of photos of waterfalls the model generates will have rainbows. It’s also a world model, in the sense that what results from pretraining is a probabilistic model of observations of the world (its training dataset). Anything we can find in the training dataset, we can also expect to find in the model’s output space.

Once we subject the model to preference tuning, however, the model transforms into something very different, a function that greedily and cleverly finds a way to interpret every input into a version of the request that includes elements it knows is most likely to result in a positive rating from a reviewer. Within the constraints of a given input, a model that’s undergone RLHF is no longer an accurate world model, but a function whose sole job is to find a way to render a version of the output that’s super detailed, very colorful, extremely polite, or whatever else the model has learned will please the recipient of its output. These reliably-rewarded concepts become attractors in the model’s output space. See also the apocryphal story about OpenAI’s model optimized for positive outputs, resulting in inescapable wedding parties.

Today’s most effective tools for producing useful, obedient models irreversibly take away something quite valuable that base models have by construction: its epistemic calibration to the world it was trained on.

Interpretable models enable useful AI without mode collapse

Though I find any individual output from ChatGPT or Midjourney useful and sometimes even beautiful, I can’t really say the same about the possibility space of outputs from these models at large. In tuning these models to our pointwise preferences, it feels like we lost the variety and creativity that enable these models to yield surprising and tasteful outputs.

Maybe there’s a way to build useful AI systems without the downsides of mode collapse.

Preference tuning is necessary today because of the way we currently interact with these AI systems, as black boxes which take human input and produce some output. To bend these black boxes to our will, we must reprogram their internals to want to yield output we prefer.

But there’s another growing paradigm for interacting with AI systems, one where we directly manipulate concepts within a model’s internal feature space to elicit outputs we desire. Using these methods, we no longer have to subject the model to a damaging preference tuning process. We can search the model’s concept space directly for the kinds of outputs we desire and sample them directly from a base model. Want a sonnet about the future of quantum computing that’s written from the perspective of a cat? Locate those concepts within the model, activate them mechanistically, and sample the model outputs. No instructions necessary.

Mechanistic steering like this is still early in research, and for now we have to make do with simpler tasks like changing the topic and writing style of short sentences. But I find this approach very promising because it could give us a way to make pretrained models useful without turning them into overeager human-pleasers that fall towards an attractor at the first chance they get.

Furthermore, sampling directly from a model’s concept space allows us to rigorously quantify qualities like diversity of output that we can’t control well in currently deployed models. Want variety in your outputs? Simply expand the radius around which you’re searching in the model’s latent space.

This world — directly interacting with epistemically calibrated models — isn’t incompatible with Midjourney-style hyper-realistic hyper-detailed images either. Perhaps when we have in our hands a well-understood, capable model of the world’s images we’ll find not only all the abstract images from DALL-E 2 and all the intricate illustrations from Midjourney, but an uncountable number of styles and preferences in between, as many as we have time to enumerate as we explore its vast space of knowledge.

The quotes on my wall

2024-07-07 07:45:58

My desktop wallpaper loops through a handful of screenshots of quotes I’ve collected over the years. These quotes push on my worldview in just the right places to help me approach my work in ways I find encouraging and energizing, so I like to have them in the periphery of my workspace like virtual post-it notes.

I prefer screenshots to purely textual excerpts because I think screenshots preserve a kind of texture of the original context in which I found the idea, whether Twitter or a magazine article or a blog.

In no particular order, here are the ones I thought worth sharing with you.


I’m drawn to Werner Herzog’s pursuit of a poetic kind of truth and focus on the human experience. Here, he speaks of a “deeper illumination” beyond the facts:

This image depicts a snippet from an interview: Murphy: There have been some accusations that you’ve taken liberties with facts in some of your documentaries and in “Rescue Dawn,” particularly from the family of Eugene DeBruin. What is your reaction to those accusations? Herzog: If we are paying attention about facts, we end up as accountants. If you find out that yes, here or there, a fact has been modified or has been imagined, it will be a triumph of the accountants to tell me so. But we are into illumination for the sake of a deeper truth, for an ecstasy of truth, for something we can experience once in a while in great literature and great cinema. I’m imagining and staging and using my fantasies. Only that will illuminate us. Otherwise, if you’re purely after facts, please buy yourself the phone directory of Manhattan. It has four million times correct facts. But it doesn’t illuminate.

and here, finding harmony in chaos.

Taking a close look at what is around us, there is some sort of a harmony. It is the harmony of overwhelming and collective murder. And we in comparison to the articulate vileness and baseness and obscenity of all this jungle, we in comparison to that enormous articulation, we only sound and look like badly pronounced and half-finished sentences out of a stupid suburban novel, a cheap novel. And we have to become humble in front of this overwhelming misery and overwhelming fornication, overwhelming growth, and overwhelming lack of order. Even the stars up here in the sky look like a mess. There is no harmony in the universe. We have to get acquainted to this idea that there is no harmony as we have conceived it. But when I say this all full of admiration for the jungle. It is not that I hate it, I love it, I love it very much, but I love it against my better judgment. - Werner Herzog

This one reminds me that, even in the most disorienting and confusing moments, some part of me usually knows the right thing to do, if I listen closely.

A tweet by @shauseth: you have access to your optimal policy at all times. you just choose not to follow it. you can literally access it by asking “what should i be doing rn?” when it’s time to reset the answer will be to stop thinking about random shit and rest. when it’s time to work the answer would be to work. same for planning or talking to people or leaving a place you don’t want to be in. literally all of that boils down to asking one simple course correcting question. you can accomplish pretty much anything by continuously asking it

I really enjoy sci-fi stories set in a world that hasn’t just overcome the current struggles of civilization, but has so transformed and advanced beyond them that they would look upon our greatest challenges as weekend projects.

There are many, many books I love and recommend all the time in this genre, including Greg Egan’s Diaspora, but I thought this screenshot captured a particularly “accelerationist” version of this view in a tweet.

Tweet from @mattparlmer: You will see mountains assembled in days, rivers cut in hours, cities conjured in minutes, flowers that materialize in the blink of an eye

A reminder for my personal work and research: as tool builders, we can choose to serve a wide range of audiences. The audience I’m most interested in building for are the experts, researchers, and artists working at the frontiers of what we know and what we’ve imagined. Hopefully by building for them, many of the key ideas will also turn out to be applicable to tools for the rest of us. This excerpt is from Andy Matuschak’s blog.

Crop of a screenshot that reads: “…expands the frontiers of practice tor the entire field. My collaborator Michael Nielsen has long argued that this is true of all our most powerful representations. If you make experts more capable, similar ideas will often also help novices; but if you focus on educational use, you’re unlikely to transform real work in the field. Mathematica is a great modern example of this: it was invented to support frontier research in cellular automata; happily it also allows novices to more easily build intuition for…”

Consistency wins, and the most important consequence of most decisions – and all small decisions – is changing the person making it.

Tweet reply by @IronEconomist: You are evaluating the act as a standalone, when the largest effect of almost any action is to change the person making it.

This is a photograph from Causal Islands, a conference that brought together a bunch of my favorite researchers and writers and designers to Toronto in April 2023. I believe this particular slide was from a talk by Chia Amisola.

A photograph of a presentation. The slides shows a quote that reads, “Remember to imagine and craft the worlds you cannot live without, just as you dismantle the ones you cannot live within…” The quote is from Ruha Benjamin’s book “Captivating Technology.”

Giving form to metaphor is the perfect description for so much of what I’m interested in: language and notation, tools for thought, research into interfaces, and artificial intelligence.

Tweet from @moultano: If you aren’t going to give form to metaphor, what good are you?

Synthesizer for thought

2024-06-23 12:20:00

For most of the history of music, humans produced sounds out of natural materials — rubbing together strings, hitting things, blowing air through tubes of various lengths. Until two things happened.

  1. We understood the physics of sound. A sound is a combination of overlapping waves, and a wave is a kind of marvelous mathematical object that we can write theorems and prove things about and deeply and fundamentally understand.

    Once we started understanding sound as a mathematical object, our vocabulary for talking about sound expanded in depth and precision. Architects could engineer the acoustical properties of a concert hall, and musicians could talk about different temperaments of tuning. We also advanced in our understanding of how humans perceive sounds.

  2. We learned to relate our mathematical models to sounds in the real world. We built devices to record sound and decompose it into its constituent fundamental parts using the mathematical model of waves. We also invented a way to turn that mathematical model of a sound back into real notes we could hear, using electronics like oscillators and speakers.

    This meant we could imagine new kinds of sounds as mathematical constructs, and then conjure them into reality, creating entirely new kinds of sounds we could never have created with natural materials. We could also sample sounds from the real world and modulate its mathematical structure. Not only that, backed by our mathematical model of sound, we could systematically explore the space of possible sounds and filters.

The instrument that results is a synthesizer.

Synthesizers

A synthesizer produces music very differently than an acoustic instrument. It produces music at the lowest level of abstraction, as mathematical models of sound waves. It begins with raw waveforms defined as oscillators, which get transmogrified through a sequence of filters and modulators before reaching our ears. It’s a way of producing sound by assembling it from logical components rather than creating it wholesale by hitting or vibrating something natural.

Because synthesizers are electronic, unlike traditional instruments, we can attach arbitrary human interfaces to it. This dramatically expands the design space of how humans can interact with music. Synthesizers can be connected to keyboards, sequencers, drum machines, touchscreens for continuous control, displays for visual feedback, and of course, software interfaces for automation and endlessly dynamic user interfaces.

With this, we freed the production of music from any particular physical form.

Synthesizers enabled entirely new sounds and genres of music, like electronic pop and techno. These new sounds were easier to discover and share because new sounds didn’t require designing entirely new instruments. The synthesizer organizes the space of sound into a tangible human interface, and as we discover new sounds, we could share it with others as numbers and digital files, as the mathematical objects they’ve always been.

The synthesizer is just one example of a pattern in the history of media: with breakthroughs in mathematical understanding of a medium, come new tools that exploit that mathematical understanding to enable new creative forms and human interfaces.

Optics, the mathematics of light and color, underpins so much of how humans interact with visual media today. Behind every image you see on screen is a color space like RGB or CMYK, a mathematical model of how we perceive color. We edit photos on our devices not by applying chemicals in a dark room but by passing our photographs through mathematical functions we call filters. This mathematical model of color and light also gave us new vocabulary (saturation, hue, contrast) and new interfaces (color curves, scopes, histograms) for working with the visual medium.

Recently, we’ve seen neural networks learn detailed mathematical models of language that seem to make sense to humans. And with a breakthrough in mathematical understanding of a medium, come new tools that enable new creative forms and allow us to tackle new problems.

Instruments for thought

In Prism, I discussed two new interface primitives enabled by interpretable language models:

  1. Detailed decomposition of concepts and styles in language. This is analogous to splitting a sound into its constituent fundamental waves. It takes a sentence like “A synthesizer produces music very differently than an acoustic instrument.” and decomposes it into a list of “features” like “Technical electronics and signal processing” and “Comparison between entities”.
  2. Precise steering of high-level semantic edits. I can take the same synthesizer sentence, add some “Discussions about parenthood”, and get “A parent often produces music differently from their children.”

In other words, we can decompose writing into a mathematical model of its more fundamental, constituent parts, and reconstruct those mathematical models of ideas back into text.

I spent some time imagining what kinds of wild and interesting interfaces may be possible as this nascent technology matures over time.

Heatmaps turn documents into terrain maps of concepts

In data visualization, a heatmap lets the user browse and navigate a very large area or dataset with ease by letting the eye quickly scan for areas with high or low values. Similarly, in the context of text documents, a heatmap can highlight the distribution of thematic elements, allowing users to quickly identify key topics and their prominence. Heatmaps can be particularly useful for analyzing large corpora or very long documents, making it easier to pinpoint areas of interest or relevance at a glance.

Screenshot of a ‘Prism highlighter’ interface displaying the opening paragraph of ‘Moby-Dick’. The top shows tabs for ‘Editor’ and ‘Reader’, and a drop-down menu for ‘Feature’ selection, currently set to ‘#13188 Water-related and Natural Environment Context (0.96)’. The text is partially highlighted in yellow, emphasizing sentences related to sailing, the ocean, and Manhattan’s waterfront, visually demonstrating how the selected feature activates across the passage.

For example, a user might begin with a collection of thousands or even millions of books and PDFs, turn on some filters for specific features like “mention of geopolitical conflict” and “escalating rhetoric”, then quickly zoom in to the highlighted parts to find relevant passages and paragraphs. Compared to a conventional search that flattens all the detail into a single sorted list of a few dozen items, a heatmap lets the user see detail without getting lost in it.

Three text blocks demonstrating different cell sensitivities: one sensitive to position, showing a paragraph about the Berezina crossing; one activating inside quotes, displaying dialogue about food supplies; and one robustly activating inside if statements, showing C code for signal processing. The text is color-coded to represent a heatmap-like visualization of content.

Image from Andrej Karpathy’s The Unreasonable Effectiveness of Recurrent Neural Networks.

Another perspective on the semantic heatmap is semantic syntax highlighting. Just as we highlight program source code to help visually distinguish specific parts of a program, heatmaps and highlights could help humans quickly visually navigate complex document structures.

Spectrograms and track views reveal meaningful patterns across time

In the context of audio processing, a spectrogram visualizes the prominence of different frequency waves within a single stream of audio, and how it evolves over time. In other words, it breaks out the individual mathematically pure components of a sound wave and visualizes each as its own signal over time.

Spectrograms let you visualize sound in a way that communicates much more structure than raw waveforms, by producing a kind of thumbnail of an audio track that breaks out different components like bass lines and rising/falling melodic progressions into distinctive visual patterns.

Tweet by @graycrawford about song spectrograms, with an image showing nine unique spectrogram patterns from the album ‘Medieval Femme’ by Fatima Al Qadiri. Each spectrogram displays distinct visual patterns of white and blue frequencies on a black background.

If we apply the same idea to the experience of reading long-form writing, it may look like this. Imagine opening a story on your phone and swiping in from the scrollbar edge to reveal a vertical spectrogram, each “frequency” of the spectrogram representing the prominence of different concepts like sentiment or narrative tension varying over time. Scrubbing over a particular feature “column” could expand it to tell you what the feature is, and which part of the text that feature most correlates with.

Sketch of a smartphone screen showing a text document with white lines representing text on a black background. A colorful vertical bar on the right edge represents a spectrogram, suggesting interactive content analysis features.

We could also take inspiration from a different kind of interface for music, the track view. Music production software like Logic Pro (below) let the user assemble a song from many different tracks, each corresponding to an instrument and processed by different filters and modulations.

Screenshot of Logic Pro music production app on an iPad. The interface shows a multitrack view with colorful waveforms, a virtual keyboard, and audio effect modules. Hands are visible interacting with the touchscreen.

In a writing tool, the whole “song” may correspond to a piece of writing, with each measure a sentence or thought and each instrument track a feature or pre-defined collection of features. The user could modulate each feature (tone, style, technical depth) across time the way a producer may adjust a track’s volume or filter over time, or turn features on and off over certain sentences.

Semantic diffs visualize adjacent possibilities

Take a look at this interactive widget from Red Blob Games’ Predator-Prey article. As I hover over each control, the interface shows me the range of possible forms the subject of my edit can take on as I vary that particular parameter.

Interactive graph showing predator-prey dynamics. A purple curve represents one possible outcome, surrounded by many light gray curves showing alternative scenarios. Sliders below control parameters for prey and predators, with visual indicators of their ranges and current values.

I call this a semantic diff view. It helps visualize how some output or subject of an edit changes when the user modulates some input variable. It shows all the “diffs” between various possible points in the possibility space of outputs, anchored on a particular semantic feature.

What would a semantic diff view for text look like? Perhaps when I edit text, I’d be able to hover over a control for a particular style or concept feature like “Narrative voice” or “Figurative language”, and my highlighted passage would fan out the options like playing cards in a deck to reveal other “adjacent” sentences I could choose instead. Or, if that involves too much reading, each word could simply be highlighted to indicate whether that word would be more or less likely to appear in a sentence that was more “narrative” or more “figurative” — a kind of highlight-based indicator for the direction of a semantic edit.

Conceptual user interface for advanced text editing, showing a graph of ‘Literal/Figurative’ versus ‘Casual/Formal’ writing styles, a toolbar with formatting options and controls for ‘Tone’ and ‘Style’, and sliders for adjusting ‘Narrative’, ‘Quotes’, and ‘Technical’ content levels. This layout illustrates potential controls for fine-tuning semantic qualities of text beyond basic formatting.

Icons and glyphs for new concepts discovered in neural networks

In Interpreting and Steering Features in Images, the author proposes the idea of a “feature icon”, a nearly-blank image modified to strongly express a particular feature drawn from a computer vision model, like specific lighting, colors, patterns, and subjects. He writes:

To quickly compare features, we also found it useful to apply the feature to a standard template to generate a reference photo for human comparison. We call this the feature expression icon, or just the icon for short. We include it as part of the human-interpretable reference to this feature.

Here are some examples of feature expression icons from that piece.

Four images showing feature expressions: a white circle hovering above a surface, hands holding a glowing sphere, a colorful American Robin, and three white spheres in a row, illustrating concepts from computer vision feature analysis.

I found this to be the most interesting part of this particular work. Browsing through these icons felt as if we were inventing a new kind of word, or a new notation for visual concepts mediated by neural networks. This could allow us to communicate about abstract concepts and patterns found in the wild that may not correspond to any word in our dictionary today.

In Imagining better interfaces to language models, I pointed out a major challenge in designing latent space-based information interfaces: high dimensionality.

The primary interface challenge here is one of dimensionality: the “space of meaning” that large language models construct in training is hundreds and thousands of dimensions large, and humans struggle to navigate spaces more than 3-4 dimensions deep. What visual and sensory tricks can we use to coax our visual-perceptual systems to understand and manipulate objects in higher dimensions?

One way to solve this problem may involve inventing new notation, whether as literal iconic representations of visual ideas or as some more abstract system of symbols.

A concept library for sharing and collaborating on units of meaning and style

In the user community for a particular modular synth called the OP-1, there’s a website called op1.fun that hosts a huge library of samples (playable sounds) that anyone can download and incorporate into their practice. Even as a complete stranger to this community, it was surprisingly fun to browse around and explore different kinds of sounds in the library of sound patches.

Screenshot of the OP-1 sound patch library website, displaying a list of sound patches with names like ‘VRAFK’ and ‘Burgermaker420’. The interface includes a ‘Type’ filter with options such as digital, drum, dsynth, and a search bar, showcasing various users’ contributions and the vibrancy of the OP-1 community.

Similar kinds of marketplaces and libraries exist for other creative professions. Photographers buy and sell filters, and cinematographers share and download LUTs to emulate specific color grading styles. If we squint, we can also imagine software developers and their package repositories like NPM to be something similar — a global, shared resource of abstractions anyone can download and incorporate into their work instantly.

No such thing exists for thinking and writing.

As we figure out ways to extract elements of writing style from language models, we may be able to build a similar kind of shared library for linguistic features anyone can download and apply to their thinking and writing. A catalogue of narrative voice, speaking tone, or flavor of figurative language sampled from the wild or hand-engineered from raw neural network features and shared for everyone else to use.

We’re starting to see something like this already. Today, when users interact with conversational language models like ChatGPT, they may instruct, “Explain this to me like Richard Feynman.” In that interaction, they’re invoking some style the model has learned during its training. Users today may share these prompts, which we can think of as “writing filters”, with their friends and coworkers. This kind of an interaction becomes much more powerful in the space of interpretable features, because features can be combined together much more cleanly than textual instructions in prompts.

A new history of the word

For most of the history of writing, humans produced words out of natural thought — taking in some ideas, mixing it with our memories and intuitions and logic, and vocalizing what came to us in our minds. Until two things happened.

  1. We understood the physics of ideas. An idea is composed of concepts in a vector space of features, and a vector space is a kind of marvelous mathematical object that we can write theorems and prove things about and deeply and fundamentally understand.
  2. We learned to relate our mathematical models of writing to ideas in the real world. We built devices to take writing and decompose it into its constituent fundamental ideas. We also invented a way to turn that mathematical model of a thought back into real words we could read, using piles and piles of compute.

Once we started understanding writing as a mathematical object, our vocabulary for talking about ideas expanded in depth and precision. We could imagine new ideas and metaphors as mathematical constructs, and then conjure them into words, creating entirely new kinds of knowledge tools we could never have created with natural materials. Not only that, backed by our mathematical model of ideas, we could systematically explore the space of what’s possible to imagine.

The instrument that results…?

Prism: mapping interpretable concepts and features in a latent space of language

2024-06-22 13:37:40

Foundation models gesture at a way of interacting with information that’s at once more natural and powerful than “classic” knowledge tools. But to build the kind of rich, directly interactive information interfaces I imagine, current foundation models and embeddings are far too opaque to humans. Models and their raw outputs resist understanding. Even when we go to great lengths to try to surface what the model is “thinking” through black-box methods like prompting or dimensionality reduction approaches like PCA and UMAP, we can’t deliver interaction experiences that feel as direct and predictable, as “extension-of-self”, as multi-touch.

Solving this understandability gap is fundamental to unlocking richer interfaces to modern AI systems and the information they model. Good information interfaces open up not just new utility, but new understanding to users about their world, and without closing the understandability gap between frontier AI systems and humans, we can’t build good information interfaces with them.

This work explores a scalable, automated way to directly probe embedding vectors representing sentences in a small language model and “map out” what human-interpretable attributes are represented by specific directions in the model’s latent space. A human-legible map of models’ latent spaces opens doors to dramatically richer ways of interacting with information and foundation models, which I hope to explore in future research. In this work, I share two primitives that may be a part of such interfaces: detailed decomposition of concepts and styles in language and precise steering of high-level semantic text edits.

The bulk of this work took place between December 2023 and February 2024, concurrently with Anthropic’s application of similar methods on Claude Sonnet and OpenAI’s work on interpreting GPT-4, as a part of my research at Notion. I wrote this report at the end of February, and then subsequently came back to revise and add some updates in June 2024.

In this piece, I’ll use the words “latent space” and “embedding space” interchangeably; I use them to refer to the same entity in the models being studied.

Contents

  1. Key ideas
  2. Further motivations
    1. Debugging & tuning embeddings
    2. Steerable text generation
  3. Demo
    1. Feature gallery
  4. Methodology
    1. Mechanistic interpretability as debugging models
    2. Background
    3. Training sparse autoencoders
    4. Automated interpretability
  5. Results and applications
    1. Understanding embeddings
    2. Text editing in embedding space
    3. Edit methods: balancing steering strength and precision
  6. Caveats and limitations
  7. Future work
  8. Looking back, looking forward
  9. Appendix
    1. Automated interpretability prompts
    2. Feature gradients implementation
    3. FAQs

Key ideas

Many of these are generalizations of findings in Anthropic’s sparse autoencoders work to text embeddings, alongside some new training and steering techniques.

  1. Sparse autoencoders can discover tens of thousands of human-interpretable features in text embedding models.

    Applying sparse autoencoders to text embeddings (with some training and architectural modifications), we’re able to recover tens of thousands of human-interpretable features from embedding spaces of size 512-2048.

    Using this technique, we can:

    • Get an intuitive sense of what kinds of features are most commonly represented by an embedding model over some dataset.
    • Find the most strongly represented interpretable features for a given input. In other words, ask “What does the embedding model see in this span of text?”
    • Answer “why” questions about embedding models’ outputs, like “Why are these two embeddings closer together than we’d expect?” or “How does adding a page title to this document affect the way the model encodes its meaning?”

    For example, we can embed the title of this piece, Prism: mapping interpretable concepts and features in a latent space of language, and see that this activates features like “Starts with letter ‘P’”, “Technical discourse on formal logic and semantics”, “Language and linguistic studies”, “Discussion of features”, and “Descriptions of data analysis procedures”.

  2. Interventions in latent space enable precise and coherent semantic edits to text that compose naturally.

    Once we find meaningful directions in a model’s latent space, we can modify embeddings using this knowledge to make semantic edits, by pushing the embedding further in the direction of a specific feature. For example, we can turn a statement into a question using the “Interrogative sentence structure” feature without disrupting other semantics of the original text. Applying this edit to the title of this bullet point outputs:

    “Insights into how can we make precise changes to the latent space’s text output?”

    Semantic text editing works by modifying an embedding (in this case, via simple vector addition), then decoding it back into text using my Contra text autoencoder models.

    This kind of precise semantic editing capability enables direct manipulation of models’ internal states as a way of interacting with information, as I first wrote about here in 2022.

    I also share a novel way to make sparse autoencoder-based edits to embeddings I call feature gradients, which uses gradient descent at inference time to minimize interference between the desired feature and other unrelated features. This makes semantic edits in latent space even more precise than before. Furthermore, I demonstrate examples of multiple semantic edits using many different features stacking predictably to result in edits that express many different desired features at once, showing that edits in latent space can compose cleanly.

  3. Large language models can automatically label and score its own explanations for features discovered with sparse autoencoders, and this process can be made much more efficient for a small sacrifice in accuracy with a new approach I call normalized aggregate scoring.

    Building on OpenAI’s automated interpretability work, I propose a much more cost-efficient way to score a language model-written description of a sparse autoencoder feature that still appears to be well-aligned with human expectations.

    This automated confidence score, a number between 0 and 1:

    • Aligns with my human judgement of explanation quality. Specifically, the number of features with confidence greater than some threshold like 0.99, 0.9, or 0.75 are reliable predictors of overall quality of features in a particular SAE training run.
    • Is precise enough to use as a part of hyperparameter search without human review of labels. This enabled me to do much more hyperparameter search than would have otherwise been possible as an individual researcher.

Further motivations

Beyond expanding the design space of rich information interfaces, there are several other reasons why interpretable embeddings are valuable.

Debugging & tuning embeddings

Embeddings are useful for clustering, classification, and search, but we usually treat embedding values as black boxes that give us no understanding of exactly why two inputs are similar, or what features of input data get encoded into embeddings.

This means when embeddings perform poorly, our debugging is less precise, and we aren’t able to express certain invariants about a problem like “we don’t care about the presence of punctuation or gendered pronouns” into queries involving embeddings.

If we understood how human-legible features were encoded in embeddings, we would be able to more precisely understand why embeddings underperform when they do, and perhaps allow us to manually tune embeddings quickly and cheaply with confidence.

Steerable text generation

When using AI for writing, current generation language models require the user to write in plain English exactly what edit they would like. But often, verbally specifying edits or preferences is laborious or impractical (e.g. when referring to a specific style of some sample text, or when trying to steer the model away from some common behavior).

In many graphics editing tools, rather than manually drawing shapes, users can use features like “copy style” to work more directly with stylistic elements of edits. They can even enumerate different elements of style (shape, color, shadow, border) and manually edit them with direct real-time feedback.

This work, especially if applied to more capable models, opens up ability for text editing tools to offer similar kinds of “direct style editing” features.

Demo

As of time of publication, I have a publicly hosted demo of everything in this work at https://linus.zone/prism.

Screenshot of the Prism research tool interface. The image shows a list of features with confidence ratings, a central panel with a detailed description of a selected feature, and a sidebar displaying custom text and related features. Various components of the interface are labeled to explain their functions.

I built this as an internal research tool, so the interface isn’t the most intuitive, and there is some functionality I won’t explain here. But this is a quick overview of this tool:

  1. Choose an embedding model and sparse autoencoder to study in the top left drop-down. I recommend starting with lg-v6, which is a good balance of speed and expressiveness. In general, you should stick with the “v6” generation of sparse autoencoders.

    Each sparse autoencoder loads a new “dictionary” of features. For example, lg-v6 has around 8,000 features that it has extracted from the embedding space of the large model found here, with a 1,024-dimensional embedding space.

  2. Select a feature in the left sidebar to see more information about that feature.

    Index: index of this feature in the sparse autoencoder’s model weights. Not useful except as a unique ID.

    Autointerp: GPT-4 written description of this feature, based on samples on which this feature activated strongly.

    Feature edits: If the editor (described below) is open, this section shows semantically edited versions of the given text with the selected feature clamped to specific values.

    High/low activation samples: Example sentences in the sparse autoencoder training dataset (currently a filtered subset of The Pile) that most strongly activated the feature, or did not activate the feature, respectively.

  3. Enter some text into the right “Editor” sidebar and submit to see which features activate particularly strongly for that text.

Here are some hand-curated interesting features I’ve found that represent the diversity of features discovered with this method.

Though I’ve spent a lot of time studying individual features in these datasets, I haven’t done any systematic dataset-wide study. So view these cherry-picked examples more as curious observations of what features look like, rather than a statement about the effectiveness of the specific techniques used or what the “typical” feature is.

Topic / subject matter

Sentiment and tone

Specific words or phrasing

Punctuation, grammar, and structure of text

Numbers, dates, counting

Natural and code languages

You can also visually explore every “confidently labelled” feature (confidence score above 0.8) in all trained v6 SAEs on Nomic Atlas, for sm-v6, bs-v6, lg-v6, and xl-v6. In these visualizations, each feature is represented by a dot and clustered nearby other features pointing in similar directions in embedding space.

Screenshot of the Nomic Atlas map of features from the lg-v6 SAE. The main area displays a colorful scatter plot with thousands of dots representing different feature directions, clustered and labeled. The left sidebar contains details about a selected data point, while the right sidebar shows view settings. The interface includes various tools for interacting with the data.

Methodology

Sparse autoencoders draw on a rich history of mechanistic interpretability research, and to understand it in detail, we need to build a little intuition. If you’re versed in mech interp, feel free to skip ahead to the results.

Mechanistic interpretability as debugging models

One way to understand the goal of mechanistic interpretability is to think of a trained neural network as having learned a kind of program or algorithm. Neural networks learn numerous algorithms over the course of training, each of which accomplish a specific part of the network’s task. For example, a part of a neural net may implement an algorithm to know when to output a question mark, or an algorithm to look up specific facts about the world from its internal knowledge. As a model sees billions of samples of input during training, these algorithms slowly carve themselves into the model’s weight matrices, over time implementing a full program that excels at the tasks we train models to do, like predicting the next token in a string of text.

Interpreting a neural network, then, is a bit like trying to understand a big, complex program without any source code that explains its implementation.

Reverse engineering the algorithms in a neural network is a bit like reverse engineering a running black-box program. A program has variables which store specific values that represent something about the algorithm’s view of the world. These variables are computed from other variables by operations, like addition or multiplication, that connect these variables together. In a neural network, the variables are called features, and the operations are called circuits. For example, a language model may have a feature that represents whether a sentence is coming to an end, and a circuit that computes that feature’s value from other earlier features.

Just as a program combines simple primitive operations to form larger ones, a neural network is ultimately composed of millions and billions of features and circuits that connect to compute those features. To get a mechanistic understanding of how a model produces its outputs from its inputs, we need to be able to stick a debugger into the neural network’s internals, read out the values its features take on, and trace its circuits to understand how those features are computed from each other.

To establish some vocabulary:

Background

Over the last few years, there’s been accelerating research into decomposing features within transformer models through a framework called dictionary learning, wherein we try to infer a “dictionary” of features that a model knows about. It’s a bit like trying to find a way to scan a black-box program and get all the variables inside it.

Notable prior art include:

These approaches all build on the same core ideas:

  1. Sparsity. While there are many concepts and features we care about in the space of all inputs, only a few features are relevant to each input. For example, any given sentence can only contain a handful of languages at most, and mention only a few people, while the model may know about hundreds of languages and thousands of famous people. We call this phenomenon, where each input only contains a small fraction of all known features, “sparsity”.

  2. Superposition. A good model must learn potentially hundreds of thousands or millions of concepts, but may only have a few hundred dimensions in its latent space. To overcome this limitation, models learn to let features interfere with each other. In simple terms, this means the same direction may be re-used to represent multiple different directions, so long as they rarely occur together, and thus are unlikely to be confused for one another. For example, a model may choose to put “talks about cooking” and “memory allocation in C++” next to each other in latent space because only one of those features is likely relevant for any specific input. The model is taking advantage of sparsity to pack in more information into a smaller latent space.

    Here’s a more technical rephrasing: within an embedding space of $n$ dimensions, good embedding models often compress many more $m \gg n$ features such that features that don’t often occur together are allowed to share directions and interfere with each other. Superposition becomes exponentially more effective as dimensionality increases. This allows models to represent far more features than it has dimensions, and is called the superposition hypothesis.

  3. Autoencoding. To recover which features are compressed into a particular embedding space, we can train a secondary model to learn to represent the original model’s embeddings (in a process called “autoencoding”), but sparsely within a much larger latent space. In other words, this sparse autoencoder (SAE) model’s task is to find a way to pull apart the many, many features that were all packed into the original subject model’s latent space. The sparse autoencoder is trained to approximate the much larger, more sparse space of features, and then learn to map embeddings into and out of this larger, more sparse space.

Recent work from Cunningham et al. and Bricken et al. show this works reasonably well for small language models, recovering features like “this text is written in Arabic” or “this token is a part of a base64 string”. This work generalizes some of these findings to much larger text embedding models that are close to production scale.

In particular, Anthropic’s extremely detailed research report goes into fantastically helpful detail about their sparse autoencoder training recipe, including hyperparameter choices and architectures that did not work as well. These were tremendously helpful in my own experiments.

Training sparse autoencoders

Here, I share the concrete architecture and training recipes for my sparse autoencoders, including all the technical details. If you aren’t interested in digging into the specifics, feel free to skip down to the “Automated interpretability” section right below this one.

Below is a high-level concept diagram of how these sparse autoencoders are trained. The numbers in the illustration are for the xl-v6 SAE.

Architecture diagram of a sparse autoencoder. Flow: Text embedding (2048 dims) enters Encoder, passes through ReLU and sparsity loss, creating Feature vector (16384 dims), then through Decoder (‘dictionary’) to produce Reconstructed text embedding (2048 dims). Input and output should match (MSE loss). Bar graphs show dimensionality changes: dense input/output (2048) vs. sparse hidden layer (16384).

First, I take a medium size English language modeling dataset. Here, it’s Minipile. I chunk every data sample on sentence boundaries with nltk , giving me roughly 31M English sentences.

Then, I run each sentence through an embedding model (Contra) to generate 31M embeddings.

On this dataset of embeddings, I train a SAE that tries to learn a dictionary that’s 8x, 16x, 32x, and 64x larger than the size of the original embedding.

More specifically, my SAE closely follows architecture choices from Bricken et al.:

$$\mathbf{f} = \mathrm{ReLU}(W_e (\mathbf{x} - \mathbf{b}_d) + \mathbf{b}_e)$$

$$\hat{\mathbf{x}} = W_d \mathbf{f} + \mathbf{b}_d$$

$$\mathcal{L} = \frac{1}{|X|}\sum_{\mathbf{x} \in X}||\mathbf{x} - \hat{\mathbf{x}}||^2_2 + \lambda||\mathbf{f}||_1$$

Here,

In other words, my SAE is a two-layer model with the ReLU activation and a hidden layer size that’s 8, 16, 32, or 64x larger than the inputs and outputs.

Some more notes I made in my experiments about training:

In addition to loss and sparsity, the most reliable sign I found of a good SAE during training is a log feature density histogram with a prominent “second bump”.

A feature’s density is the fraction of training samples for which a feature had a value above zero. It’s a measure of how often the feature activates. For good, sparse features, we want them to activate quite rarely, but not so rarely that they fail to capture any useful recurring patterns in our data. A log feature density plot is a histogram showing the value $\log_e(\mathrm{feature\ density})$ over all of the features in an SAE.

Take a look at this chart for the lg-v6 SAE. It shows how the log feature density histogram evolves through the course of a training run.

Graph showing log feature density over training steps. The y-axis ranges from -15 to 0, x-axis from 0 to 1.4k steps. The plot displays a prominent spike at the left edge and a smaller ‘second bump’ around -6 log density, highlighted by an orange circle. Handwritten labels indicate ‘Training steps’ on x-axis and point out the ‘Second bump in log density plot’.

By the end of the training run, notice that there’s a large spike on the left that corresponds to dead or “ultra-rare” features that don’t seem very interpretable, and a smaller bump to the right. That second bump represents a cluster of interpretable features. Anthropic notes similarly shaped feature density plots in their first sparse autoencoder work.

With this setup and knowledge, I swept across the two key hyperparameters, $\lambda$ and learning rate.

I trained these models for 2 epochs with fairly large batch size (512-1024). It seems essential to overtrain these models past when loss plateaus to get good results for interpretability. In retrospect, it’s unclear that repeating training data was helpful, and I would recommend simply collecting more data in the future over training for more than one epoch.

In the end, I could only achieve good results with the 8x-wide SAEs I trained, which are the v6 SAEs I described earlier. I believe other SAEs required quite different hyperparameter choices that I didn’t have time to fully explore.

Automated interpretability

Analogous to OpenAI’s automated interpretability work in Language models can explain neurons in language models, I use GPT-4 to automatically label every feature discovered by these sparse autoencoders, and also to score its own labels for confidence.

This is a two-step process. Prompts used for these steps are documented in the appendix.

  1. Labeling with chain-of-thought

    I show GPT-4 the top 50 highest-activating example sentences from the training dataset (up to context length limits) for a particular feature, and ask the model to describe the attribute that is most commonly shared across all texts in the list.

  2. Normalized aggregate scoring

    Both in OpenAI’s work and in this work, GPT-4 biases towards overly general labels, like “Coherent, factual English sentences”. These are not useful, since they are a feature shared by almost any sentence in the dataset. What this usually means is that the particular feature does not have an obvious interpretable explanation.

    To filter these out, I added a scoring step. In this step, I ask GPT-4 to rate the fraction of both highly activating and non-activating example sentences which fit the auto-generated label, and take the difference between the two fractions to get a normalized confidence score for how well the label and explanation specifically describe samples from this particular feature.

What results is a dataset of features with the following schema.

type SpectreFeatureSample = {
  text: string;
  act: number;
};

export type SpectreFeature = {
  index: number;      // feature index in the learned dictionary
  label: string;      // short explanation
  attributes: string; // longer explanation, which is often useful
  reasoning: string;  // chain-of-thought/scratchpad output
  confidence: number; // normalized confidence score
  density: number;    // how often is this feature turned on?
  highActSamples: SpectreFeatureSample[];
  lowActSamples: SpectreFeatureSample[];
  vec: number[];      // feature vector in the model's embedding space
};

Results and applications

Understanding embeddings

When ranked by a combination of confidence (GPT-4’s self-assigned interpretability score) and feature density (how frequently does this feature activate in the dataset?), the highest ranking features reliably have obviously human-interpretable explanations, as we can see in the “Feature gallery” section above.

Larger models have more interpretable features when all other hyperparameters are held constant, and larger models also tend to have more specific features. For example:

Larger models with higher-dimensional embedding spaces also contain more features for specific words and phrases, in addition to common features about general styles and topics.

It’s also interesting to look at which features activate strongly for a particular text input. For example, on that very sentence, the top features in lg-v6 are:

  1. Referencing specific objects or conditions, as in “particular”
  2. Discussion of text formatting
  3. Expressed interest in further exploration
  4. High intensity/emotionality, as in “activate strongly”

There’s room for lots of interesting future work studying relationships between features in different embedding spaces, the order in which features are learned through training, when features split as models get larger, what kinds of attributes features tend to represent, and so on.

Text editing in embedding space

Sparse autoencoder features can be used to edit text by manipulating an embedding to make meaningful semantic edits, then reconstructing the embedding back into text with the modification.

This method isn’t yet competitive with prompting larger models for general-purpose text editing. But it works well enough to often make quite precise semantic edits to text, meaning the edit is able to intervene only on the specific attribute of the text we care about, and generally avoid changing other things about the text. This is surprising, because we’re just directly editing numbers in embeddings!

Moreover, the efficacy of latent space intervention like this is further suggestive evidence that features discovered with this method aren’t just showing correlations in the embedding space or doing some rough approximation of clustering, but finding directions that actually represent specific, interpretable units of meaning in the embedding space.

Here are some curated examples of semantic edits on the passage, “This research investigates the future of knowledge representation and creative work aided by machine understanding of language. I prototype software interfaces that help us become clearer thinkers and more prolific dreamers.”

Edits can also be composed together. For example, by applying the features for Historical analysis and commentary, Sweet Treats and Baking, and Nautical terms and situations, we can transform the original sentence into this strange combination of ideas.

Her research into the sea and shipwrecks of the Atlantic Ocean enabled the development of modern food and cake decorations and the more sophisticated craft of sailing and vanilla.

Edit methods: balancing steering strength and precision

As a part of this work, I experimented with a few different ways to apply a feature to an embedding to edit text, and found some interesting new approaches that seem to work better in specific situations.

When making semantic edits in latent space, we need to balance two different goals:

These goals are at odds with each other. To understand why, we need to think about correlated features.

When we study a dataset of real-world text, we often find distinct features that co-occur often. For example, a feature for “Discussion of economics” is probably likely to co-occur often with a feature for “Mentions of the US government”, since the government often comments on and influences the economy. To take advantage of this fact, an embedding model may learn to point these two distinct features into directions that lightly overlap, because it saves some capacity in the embedding space, and if the model accidentally gets confused, the chances are good that the model’s output will still be coherent.

But this superposition becomes an issue when we want to activate one of these features without the other, because very strongly activating the “Mentions of the US government” may push our embedding far enough along the nearby “Discussion of economics” direction to make a difference in the output. We’ve tried to make a strong edit, and lost precision as a result.

To overcome this tradeoff, I explored a few different “Edit modes” when manipulating embeddings.

Addition. This is the most obvious way to edit embeddings. I simply push the embedding along the feature direction. (Mathematically, I add the feature vector directly to the embedding vector.) This reliably makes strong edits, but also often causes interference between features, resulting in edits where the output looks quite different from my input.

Using the Nautical terms and situations feature edit from above, an edit via addition results in

Her research investigates the development of shipboard knowledge and understanding aided by creative work. This research enabled sailors and engineers to make more fragile and colorful vessels and navigate faster and more dreamily.

The nautical theme is certainly very strongly expressed, but now we’ve lost a lot of the original phrasing and wording.

SAE intervention. In this approach, we try to take advantage of the fact that the sparse autoencoder model has learned how to translate feature activations into embeddings and back. Taking advantage of this, we encode the embedding we want to edit with the encoder of the SAE, modify the value of the feature we care about in-place, and then decode it back into the embedding space, adding any residual that the SAE could not reconstruct as a constant. In theory, because the SAE has learned how to “un-do” some of the interferences between features, this approach should result in “cleaner” edits.

This research venture investigates the representation of human knowledge and navigating under sail assisted by ship work. She designed software and creative ideas to help crews become more proficient and more brilliant navigators.

Word choice and phrasing are closer to the original text, but there’s still a lot that’s changed.

This led me to my final approach, which I call feature gradients. It works like this:

A perfect embedding edit would mean that when we take the edited embedding and put it through the SAE, the resulting feature activation values would exactly equal how much we want each feature to be expressed. What if we used gradient descent to directly optimize for this objective?

When using feature gradients, we:

  1. First perform the “addition” method to get an approximation of the edited embedding we want.
  2. Then, we iteratively optimize this approximation through gradient descent, to minimize the difference between (a) the value of sae.encode(embedding) and (b) the feature activation values we want. Concretely, we do this by minimizing the mean squared error (MSE) between the two values. I’ve found the Adam optimizer with a cosine learning rate annealing schedule to get good results quickly.

You can find a PyTorch implementation of feature gradients in the appendix.

Using feature gradients, we can produce this final edit:

This research lab investigates the development of maritime exploration and art and human knowledge aided by ship building. She trains her crew to become better thinkers and more prolific swimmers.

This version preserves much of the wording and sentence structure from the original text, while expressing the feature very clearly.

Feature gradients is not always the best edit method to use, however. Though I haven’t performed a rigorous study, in my experience making hundreds of edits, addition is most effective if the edit involves steering broad features like topics, voice, or writing style; feature gradients is most effective if the edit is more precise, like adding a mention of a specific word or number. I hypothesize that this is because when editing text to express a broad topic or style, we actually do want to express many related features together, whereas this is undesirable when making edits about phrasing or structure, which are naturally more specific.

Caveats and limitations

TL;DR — this is all building on a relatively new approach to interpretability, and my implementation of the key ideas in this work is a proof of concept more than a rigorous claim about how interpretable embedding spaces are.

Future work

For me, the most exciting consequence of this work is that it opens up a vast design space of richer more direct interfaces for interaction with foundation models and information. It’ll take me a whole difference piece to explore the interface possibilities I imagine in detail, but I’ll try to share a glimpse here.

Using the two primitives I shared above, (1) decomposing an input into features and (2) making semantic edits, we can assemble ways of interacting with text that look very different from conventional reading and writing. For example, in the prototype below, I can select a specific SAE feature and see a heat map visualization of where in the text that feature was most activated.

Screenshot of a ‘Prism highlighter’ interface displaying the opening paragraph of ‘Moby-Dick’. The top shows tabs for ‘Editor’ and ‘Reader’, and a dropdown menu for ‘Feature’ selection, currently set to ‘#13188 Water-related and Natural Environment Context (0.96)’. The text is partially highlighted in yellow, emphasizing sentences related to sailing, the ocean, and Manhattan’s waterfront, visually demonstrating how the selected feature activates across the passage.

Imagine exploring much longer documents, or even vast collections of documents, by visualizing feature maps like this at a bird’s eye view across the whole set at once rather than reading and tagging each document one by one.

On the editing side, there are many interesting ways we could surface precise semantic edits in a writing tool. Here, for example, we can imagine a text editing toolbar that doesn’t just expose formatting controls like bold, italic, and underline, but also continuous controls for tone, style, structure, and depth of material.

Conceptual user interface for advanced text editing, showing a graph of ‘Literal/Figurative’ versus ‘Casual/Formal’ writing styles, a toolbar with formatting options and controls for ‘Tone’ and ‘Style’, and sliders for adjusting ‘Narrative’, ‘Quotes’, and ‘Technical’ content levels. This layout illustrates potential controls for fine-tuning semantic qualities of text beyond basic formatting.

In the future, it may also be possible to “copy style” of a piece of writing or image and “paste” those same stylistic features onto another piece of media, the same way we copy and paste formatting. In general, as our understanding of latent spaces of foundation models improve, we’ll be able to treat elements of semantics, style, and high-level structure as just another kind of formatting in creative applications.

There is also no shortage of areas of improvement beyond interfaces.

Most urgent for me is scaling SAEs to more production-scale models, and generalizing these techniques to models of other modalities like image, audio, and video. In particular, I’m currently working on extending this work to:

The core science behind SAEs and dictionary learning can also improve a lot, including advancements in SAE architecture and training recipes. Major AI labs with alignment teams like DeepMind and Anthropic are making major strides on this front. A recent example is gated sparse autoencoders, which improve upon the state-of-the-art to yield cleaner, more interpretable features.

Lastly, everyone working with SAEs would benefit from a deeper understanding of how features and feature spaces relate to each other across different models trained over different datasets, trained with different architectures, or trained on different modalities.

Looking back, looking forward

Since I first began researching how to build latent space-based information interfaces in 2022, it was clear to me that model-learned representations were key to a big step forward in how we interact with knowledge and media. To borrow some ideas from Using Artificial Intelligence to Augment Human Intelligence (one of whose authors now leads Anthropic’s interpretability research), machine learning gives us a way to automatically discover useful abstractions and concepts from data. If only we could find a way to mine these models for the patterns they’ve learned and then interact with them in intuitive ways, this would open up a lot of new insight and understanding about our world.

While iterating on design prototypes in mid 2022, I hit a big roadblock. Though I had built many ways to work with embeddings in relation to each other, like clustering them and finding interpolations between them, to make embeddings truly useful, I felt I needed some way to know what an embedding actually represented. What does it mean that this embedding has a “0.8” where that embedding has a “0.3”? Without a way to give meaning to specific numbers, embedding spaces were glorified ranking algorithms. This problem of interpretability, this understanding gap, felt like a fundamental problem preventing interface innovation in this space.

I wrote at the time:

I’ve been researching how we could give humans the ability manipulate embeddings in the latent space of sentences and paragraphs, to be able to interpolate between ideas or drag sentences across spaces of meaning. The primary interface challenge here is one of dimensionality: the “space of meaning” that large language models construct in training is hundreds and thousands of dimensions large, and humans struggle to navigate spaces more than 3-4 dimensions deep. What visual and sensory tricks can we use to coax our visual-perceptual systems to understand and manipulate objects in higher dimensions?

This core challenge of dimensionality, I think, still remains. In some ways, we’re now finding the problem to be much worse, because instead of working with embedding spaces of a few hundred dimensions, it turns out we ought to work with feature spaces of millions or even billions of concepts. But rendering embeddings legible, giving meaning to specific numbers and locations and directions in latent space, is a meaningful step forward, and I’m incredibly excited to explore both the technical and interface design advancements that are sure to come as we push our understanding of neural networks even further.


Appendix

I. Automated interpretability prompts

These are formatted a little oddly because they’re excerpted from a Notion-internal prompt engineering framework.

Here’s the prompt for generating feature labels and explanations.

`What are the attributes that texts within <positive-samples></positive-samples> share, that are not shared by those in <negative-samples></negative-samples>?`,
``,
`<positive-samples>${
  formatSamples(highActSamples, { mode: "tokens", max: 3000 })
}</positive-samples>`,
`<negative-samples>${
  formatSamples(lowActSamples, { mode: "tokens", max: 3000 })
}</negative-samples>`,
``,
`Before you respond, try to identify the most specific attribute that are shared by all positive samples, and none of the negative samples. The more specific the better, as long as only all positive samples share that same attribute.`,
``,
`First, describe your reasoning in <reasoning></reasoning> tags.`,
`Then, describe the shared attributes in <attributes></attributes> tags. Within these tags, do not reference any tags or the words "positive" or "negative"; simply refer to samples as "the samples".`,
`Finally, within <label></label> tags, write a short, 4-8 word label for this attribute in sentence case.`,

Here’s the prompt used to compute the normalized aggregate score for each feature’s explanation.

`I have a dataset of text snippets:`,
`<samples>${
  formatSamples(samples, { mode: "tokens", max: 4000 })
}</samples>`,
``,
`What percent of the samples fit the attribute described below?`,
`<attribute>${attributes}</attribute>`,
``,
`Give your answer as an integer between 0 and 100 inclusive, enclosed in <percent></percent> tags, as in <percent>NN</percent>.`,
`Do not include the percent sign in your response; include only the number itself.`,

II. Feature gradients implementation

Note: here, spectre is an implementation of sparse autoencoders in my research codebase.

def dictgrad(
    spectre: nn.Module,
    x: torch.FloatTensor,
    f: torch.FloatTensor,
    original_features: torch.FloatTensor,
    # method-specific config
    steps: int = 500,
    **kwargs,
) -> torch.FloatTensor:
    # "Reference" is the text, latent, and feature dictionary we want to edit.
    reference_latent = x.clone().detach().requires_grad_(False)
    reference_features = original_features.clone().detach().requires_grad_(False)

    # Initialize with the "addition" edit method.
    latent = addition(spectre, x, f, original_features, **kwargs).clone().detach().requires_grad_(True)

    # Adam optimizer with cosine annealing works faster than SGD, with minimal
    # loss in performance.
    optim = torch.optim.AdamW([latent], lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.0)
    optim.zero_grad()
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optim, steps, eta_min=0, last_epoch=-1)

    # Gradient descent: we optimize MSE loss to our desired feature dictionary.
    for step in range(steps):
        features = spectre.encode(latent)
        loss = F.mse_loss(features, reference_features)
        loss.backward()

        optim.step()
        optim.zero_grad()
        scheduler.step()
    return latent

III. FAQs

Why interpret embeddings and not language models?

Many other research groups are applying this technique to autoregressive language models. Why embeddings?

I worked on applying SAEs to embeddings because:

  1. From the beginning, my aim is to build tools that work with information in units that are natural for humans. Humans work with thought at the level of sentences and paragraphs, not sub-word tokens. It seemed useful to work with a model that represented information at the same altitude.
  2. I think embeddings are interesting as a kind of “storage at rest” form of how models represent information, rather than “storage in transit” format, which is what LLMs contain when it’s generating text.
  3. Nobody else was applying SAEs to embeddings at the time (December 2023).

The basics of the technique transfer well between the two settings, so I plan on eventually returning to autoregressive language models as well.

I see these results are using a custom embedding model. Does this generalize to other widely used embedding models?

I haven’t tried this, so can’t say with 100% confidence, but based on my experiments mapping this model’s embedding space to that of text-embedding-ada-002 from OpenAI, the technique should transfer very well to any reasonably good embedding model. One of my follow-up projects is to validate this claim.


Much gratitude to the many people who offered ideas, advice, resources, and encouragement through this project, including Gytis Daujotas, Matthew Siu, Dan Shipper, Theo Bleier, Maya Bakir, Betaworks, and others.

Personalization, measuring with taste, and intrinsic interfaces

2024-06-15 11:38:12

Midjourney launched a new “personalization” feature this week, and I think it presents an interesting and valuable contrast to the kind of direct manipulation interfaces for interacting with AI systems that I’ve been advocating for. I also think this implementation of personalization reveals an unconventional way of thinking about style in creative tools, not by concrete stylistic traits but by a style’s relationship to a universe of other tastes or style preferences.

This is my stream-of-consciousness notes on both ideas.

Instrumental and intrinsic interfaces

With personalization, Midjourney users can steer generated images toward a specific style of their preference by choosing a few hundred images that exemplify that style. This method allows users to articulate and build on their style preferences much more accurately, and perhaps easily, than would be possible with a conventional text prompt. I hypothesize that, under the hood, Midjourney trains a steering vector or LoRA adapter that can steer their models’ outputs toward users’ preferred styles. From the demos I’ve seen online, it seems to work quite well.

I’ve been long advocating for interfaces where users observe the concepts and styles a model has learned during training and directly manipulate some interface affordance to steer the model toward their desired output. This leans on the user to discover their preferences by exploring the model’s output space over the course of using the model. But more importantly, it lets each user build a strong internal mental model of how their tool maps inputs to outputs. It exposes a kind of laws of physics about the generative model’s understanding and expression of concepts and styles.

By contrast, Midjourney’s personalization system automates much of the style discovery process on behalf of the user. In place of a user building up a mental model of the ML model’s output space, the personalization system takes binary preference signals from the user and automatically tries to infer the user’s intent within the model’s space of concepts and styles.

I compare the ideological difference in design to the difference between a navigation system that shows you but a list of turn-by-turn directions, and one that shows you some routes highlighted on a map:

One simple litmus test to distinguish these two types of interfaces is to ask, “Would the user care if I replaced the interface with a button that magically solved their problem?” If the user won’t care, an instrumental interface is the right fit. But if the user does care, they ultimately care about the value of engaging with the interface itself, and an intrinsic interface may be better. It may be possible to replace a turn-by-turn navigation system with a magic button, but not a map.

There’s a big space in the world for instrumental interfaces. Lots of people want to get a good output or result, and don’t really care how they get it. But I’m much more interested in the intrinsic interfaces, the ones that require a bit more effort, but can teach us to internalize knowledge about the domain the interface represents.

The idea of instrumental and intrinsic interfaces has been brewing in my mind for a long time, and I thank Parthiv in particular for planting the initial seeds and helping grow them into the version I wrote about here.

Measuring with taste

Another way to think about Midjourney’s personalization feature is to conceptualize each “personalization code” (however it’s represented under the hood) as a pointer into the model’s internal representation space of visual styles, the model’s “style space”.

Anytime we have a space and enough reference points in the space, we can build a coordinate system out of it. Rather than describing a point in the space by their absolute positions, we can describe a location in the space by enumerating how far that point is from all the other reference points whose locations we already know.

In other words, if we think of Midjourney’s personalization codes as pointers into specific parts of the model’s style space, we can imagine building a kind of “style coordinate system” where we describe a particular image’s style as “0.5 of personalization style A, 0.2 of personalization style B” and so on.

3D graph representing a style coordinate system. The graph has three labeled axes: '35mm photography' (vertical axis), 'noir' (horizontal axis extending to the left), and 'studio lighting' (horizontal axis extending to the right). There are three points labeled 'style A', 'style B', and 'style C' connected to a central blue point labeled 'personal style coordinates' by dashed lines. This central point is also connected to the three labeled axes by straight grid-like lines, representing feature coordinates.

Of course, we can choose a different set of reference points in our style space. For example we can choose concrete, interpretable elements of style we have names for, like “studio lighting”, “35mm film photography”, or “noir”. In this style coordinate system, we could describe a particular image as “0.7 studio lighting, 0.4 noir” and so on.

One of the technical foundations for my recent research, mapping interpretable features in foundation models, is doing something quite similar to the latter option, finding “features” in the model’s space of concepts that we can use to very precisely steer models towards specific voice, style, or topics. Over time, we’ll learn to discover more detailed feature maps for more capable models, giving us a very detailed and human-readable coordinate system for ideas and styles.

Midjourney’s implementation of personalization is not technically unique. There is no shortage of methods researchers have invented to add a style “adapter” to some generative model from a few examples: LoRA, steering vectors, textual inversion, soft prompts, and so on.

What makes style adapters like Midjourney’s interesting is that it leads me to imagine a very different kind of feature map for generative models, one where each coordinate axis isn’t a specific concept or element of style that humans can quickly label, but a vibe derived from someone’s unique preference fingerprint.

Slicing up the latent space not by concepts, but by vibe. Measuring reality by the unit of tastes.

Some applied research problems in machine learning

2024-06-10 11:42:07

There’s a lot of great research on foundation models these days that are yielding deeper understanding of their mechanics (how they work) and phenomenology (how they behave in different scenarios). But despite a growing body of research literature, in my work at Notion and conversations with founders, I’ve noticed a big gap between what we currently know about foundation models and what we still need to build valuable tools out of them. This post is a non-exhaustive list of applied research questions I’ve collected from conversations with engineers and founders over the last couple of months.

To be abundantly clear, I don’t mean to say that nobody is working on a question when I include it here, but only that more people should be working on them. If you are working on one of these problems in your research, I’d love to hear from you.

Modeling and data

How do models represent style, and how can we more precisely extract and steer it?

A commonly requested feature in almost any LLM-based writing application is “I want the AI to respond in my style of writing,” or “I want the AI to adhere to this style guide.” Aside from costly and complicated multi-stage finetuning processes like Anthropic’s RL with AI feedback (RLAIF; Constitutional AI), we don’t really have a good way to imbue language models with a specific desired voice.

In the previous “AI boom”, generative models like GANs gathered interest for their ability to perform neural style transfer, absorbing the style of one image or video and overlaying it onto another. There is some literature on how to do this without re-training or costly fine-tuning for modern diffusion models as well, but few seem to have gotten traction.

Steering mechanisms like ControlNet may be another avenue of exploration for style transfer, but most applications of ControlNet seem to be about steering a model to include specific objects or layouts of objects in a scene rather than steering a model toward a particular style.

I’m currently optimistic that mechanistic interpretability techniques like sparse autoencoders can make advances here by discovering precise, interpretable “features” or concepts corresponding to style. By intervening on these specific features during the generation process, we may be able to manipulate style of model outputs very precisely.

When does it make sense to prefer one of RL, DPO/KTO, supervised fine-tuning? When is synthetic data useful?

In most fundamental ML research, model training usually happens in one of two very clear regimes: “data-rich” environments where we assume the amount of data is never a bottleneck, or “data-efficient” settings where we assume we have a hard limit on how much data we can learn from. Academic datasets are generally assumed to perfectly describe the task being studied. For example, most studies using the ImageNet or MS COCO computer vision datasets simply train on the entire dataset and no more, and assume that the dataset only contains correct results.

In reality, in industrial settings, none of these assumptions are true. Instead:

Once a budget and data quality bar has been set, teams then need to decide between a growing zoo of alignment techniques like RLHF, DPO, KTO, and classic supervised training. How do these techniques compare in apples-to-apples studies considering different data quality and quantity regimes? We don’t know. Furthermore, there’s a talent shortage as well for some teams. Many companies trying to build AI products don’t have in-house expertise to effectively deploy RL, for example.

It’s not glamorous work, but empirically studying tradeoffs in the frontiers of these techniques would help push the industry forward.

How can models effectively attend to elements of visual design, like precise color, typography, vector shapes, and components?

Imagine a real-world application of graphic design or photography. Chances are, imagery is only a small part of the final creative artifact. These “primary source” visuals are combined with type, animation, layout, and even software interface components to produce a final asset like a poster, an advertisement, or a product mock-up.

In a production pipeline for, say, a brand campaign, original illustrations and photography consume a small portion of the total budget, which includes pre-production like concepting but also tasks like copywriting, typography, overlaying brand assets like logos, and producing layouts for different display formats.

Even the most expensive frontier multimodal models like Claude 3 and GPT-4 can’t reliably differentiate similar colors or type families, and certainly can’t generate images with pixel-perfect alignment between components in a layout. These kinds of tasks probably require domain-specific data or adding on some new component to the model designed for numerically precise visuals.

I suspect companies like Figma are working hard on teaching language models how to work with vector shapes and text in design, but more work from the research community can likely benefit everyone.

How can we make interacting with conversational models feel more natural?

Every conversational interface to a language model adopts the same pattern:

None of these assumptions are true for human conversations, and in general, I’m excited about every advance that closes the gap between this stilted experience and human-like dialogue. As a successful example of this idea, pushing latency down to sub-second levels makes GPT-4o from OpenAI feel much less robotic in an audio conversation compared to other similar systems.

What problems, if solved, could enable more natural dialogue?

Knowledge representations and applied interpretability

Many of these questions assume that within the next year, we’ll see high-quality open datasets of steering vectors and sparse autoencoder features that allow for precisely “reading out” and influencing what production-scale models are thinking. There has been tremendous interest and progress in mechanistic interpretability in the last six months, owing to early pioneering and field building work of labs like Conjecture and Anthropic, and it seems all but certain that if the trend continues, this assumption will prove out.

In a world where anyone can choose which of hundreds of million concepts to inject into a generating model to flexibly steer its behavior and output, what applied ML and interface design research questions should we be asking?

How can we communicate to end users what a “feature” is?

I’ve been giving talks and speaking with engineers and non-technical audiences about interpretability since 2022, and I still struggle to explain exactly what a “feature” is. I often use words like “concept” or “style”, or establish metaphors to debugging programs or making fMRI scans of brains. Both metaphors help people outside of the subfield understand core motivations of interpretability research, but don’t actually help people imagine what real model features may look like.

I’ve found that the best way to develop intuition for features is just to spend a lot of time looking at real features from real models, using interfaces like my Prism tool. It turns out features can represent pretty much anything about some input, like:

Given the breadth of ideas that features can represent, how can we help the user build a mental model of what features are, and why they’re useful? Are features colors on a color palette? One in a selection of brushes? Knobs and levers in a machine? I think we need to discover the right interface metaphors as a foundation for building more powerful products on this technology.

What’s the best way for an end user to organize and explore millions of latent space features?

I’ve found tens of thousands of interpretable features in my experiments, and frontier labs have demonstrated results with a thousand times more features in production-scale models. No doubt, as interpretability techniques advance, we’ll see feature maps that are many orders of magnitude larger. How can we enable a human to navigate such a vast space of features and find useful ones for a particular task?

The exact interaction mechanics will probably differ for each use case, but I can think of a few broad patterns to borrow from prior art:

How do we compare and reconcile features across models of different scales, families, and modalities?

As understanding and steering models mechanically become more popular, users may expect to be able take their favorite features from one model to another, or across modalities (e.g. from image to video).

We know from existing research that (1) models trained on similar data distributions tend to learn similar feature spaces and (2) sparse autoencoders trained on similar models tend to learn similar features. It would be interesting to try to build a kind of “model-agnostic feature space” that lets users bring features or styles across models and modalities.

What does direct manipulation in latent space look like?

There is some precedent for direct manipulation in a space that isn’t the concrete output space of a modality. In Photoshop and other image editing programs, users can directly manipulate representations of an image that isn’t the raw pixels. For example, I can edit an image in “color space” by dragging over the color curves of an image. In doing so, I’m manipulating the image in a different more abstract dimension than pixels in space, but I’m still directly manipulating an interface element to explore my options and decide on a result.

With interpretable generative models, the number of possible levers and filters explodes to millions or even billions. While it’s easy to imagine a directly draggable dial for some features like “time of day”, great UI affordances are less obvious for other higher-level features like “symbols associated with speed or agility”.

Closely related to manipulation is selection. In existing design domains like photo or text editing, the industry invented, and our culture collectively absorbed, a rich family of metaphors for how to make a selection, what selection looks like, and what we can do once some slice of text or image is selected. When we can select higher-level concepts like the “verbosity of a sentence” or “vintage-ness of an image”, how should these software interface metaphors evolve?

From primitives to useful building blocks

Historically, use cases and cultural uptake of a new technology go up not when the fundamental primitives are discovered, but when those primitives are combined into more opinionated, more diverse building blocks that are closer to end users’ workflows and needs. Fundamental research helps discover the primitives of a new technology, like language models, alignment techniques (RLHF), and feature vectors, but building for serious contexts of use will inspire the right building blocks to make these primitives really valuable and useful.

I think we’re still in the period of technology adoption that favors implementation purity over usefulness. We build chat experiences with LLMs the way we do not because it’s the best chat interface, but because LLMs have context window limits. We generate images from text rather than image-native features and descriptions because we train our models on paired text-image data.

When we can move beyond polishing primitives toward more opinionated building blocks designed for humans, I think we’ll see a rejuvenation in possibilities at the application layer.

In the beginning… was the command line

2024-05-28 09:34:43

Neal Stephenson’s In the Beginning… Was the Command Line is one of the most profound pieces of writing about technology I’ve read. I found myself underscoring and highlighting numerous passages in this essay that reads equally as much like political propaganda, memoir, novella, and journalistic reporting all at once. If you have a couple of hours to spare, I’d recommend this essay only behind Greg Egan’s Diaspora, my favorite piece of science fiction, as a must-read.

This post is a compilation of various snippets from that essay which I quote at length, paired with some minimal commentary. Some of these quotes require the full context of the essay to deliver their message, so I hope you read the essay regardless.

I don’t necessarily agree with everything on this page (though many of the key ideas resonate with me). Inclusion here just means I found it worthwhile to read and ponder. All emphasis is mine.


The business of selling software

On the idea of selling operating system software:

The product itself was a very long string of ones and zeroes that, when properly installed and coddled, gave you the ability to manipulate other very long strings of ones and zeroes. Even those few who actually understood what a computer operating system was were apt to think of it as a fantastically arcane engineering prodigy, like a breeder reactor or a U-2 spy plane, and not something that could ever be (in the parlance of high-tech) “productized.”

On people’s relationship to technology, and the fact that it’s mostly not about technology:

In retrospect, this was telling me two things about people’s relationship to technology. One was that romance and image go a long way towards shaping their opinions. If you doubt it (and if you have a lot of spare time on your hands) just ask anyone who owns a Macintosh and who, on those grounds, imagines him- or herself to be a member of an oppressed minority group.

The other, somewhat subtler point, was that interface is very important. Sure, the MGB was a lousy car in almost every way that counted: balky, unreliable, underpowered. But it was fun to drive. It was responsive. Every pebble on the road was felt in the bones, every nuance in the pavement transmitted instantly to the driver’s hands. He could listen to the engine and tell what was wrong with it. The steering responded immediately to commands from his hands. To us passengers it was a pointless exercise in going nowhere–about as interesting as peering over someone’s shoulder while he punches numbers into a spreadsheet. But to the driver it was an experience. For a short time he was extending his body and his senses into a larger realm, and doing things that he couldn’t do unassisted.

On building platforms vs. building applications and services:

Applications get used by people whose big problem is understanding all of their features, whereas OSes get hacked by coders who are annoyed by their limitations. The OS business has been good to Microsoft only insofar as it has given them the money they needed to launch a really good applications software business and to hire a lot of smart researchers.

Microsoft in the early days of the Web:

Confronted with the Web phenomenon, Microsoft had to develop a really good web browser, and they did. But then they had a choice: they could have made that browser work on many different OSes, which would give Microsoft a strong position in the Internet world no matter what happened to their OS market share. Or they could make the browser one with the OS, gambling that this would make the OS look so modern and sexy that it would help to preserve their dominance in that market.

Memento mori: corporations as vehicles for technological progress

I love Stephenson’s irreverence even to the most ambitious and successful technology ventures of the last century — capitalist behemoths whose collapse seem unfathomable. Technology has existed, and will exist, far beyond the concept of corporations as an organizing force behind its progress.

Companies that sell OSes exist in a sort of technosphere. Underneath is technology that has already become free. Above is technology that has yet to be developed, or that is too crazy and speculative to be productized just yet. Like the Earth’s biosphere, the technosphere is very thin compared to what is above and what is below.

And an expansive analogue of Clay Christensen’s “Innovator’s Dilemma”:

The danger is that in their obsession with staying out of the fossil beds, these companies will forget about what lies above the biosphere: the realm of new technology. In other words, they must hang onto their primitive weapons and crude competitive instincts, but also evolve powerful brains. This appears to be what Microsoft is doing with its research division, which has been hiring smart people right and left.

Stephenson describes Microsoft’s business of selling operating systems as a kind of arbitrage of inventions across time — pulling something that would be free in the future into the present, and selling it at a markup — which I find fascinating.

…today, it is making its money on a kind of temporal arbitrage. “Arbitrage,” in the usual sense, means to make money by taking advantage of differences in the price of something between different markets. It is spatial, in other words, and hinges on the arbitrageur knowing what is going on simultaneously in different places. Microsoft is making money by taking advantage of differences in the price of technology in different times. Temporal arbitrage, if I may coin a phrase, hinges on the arbitrageur knowing what technologies people will pay money for next year, and how soon afterwards those same technologies will become free. What spatial and temporal arbitrage have in common is that both hinge on the arbitrageur’s being extremely well-informed; one about price gradients across space at a given time, and the other about price gradients over time in a given place.

User interfaces, authorship, and social strata of computing

Throughout the piece, Stephenson builds a case for a foil between the command-line, which gives unbridled hackers deep and direct access to complex systems, and the graphical user interface, invented by corporations to package up such complexity in layers of candy-pop intermediation to make it palatable for the general public who could care less than to learn the ways of the command line. My personal views are a little less extreme than these, but I find his analogy so artful that I include some of his best passages here.

In particular, I love the way this passage speaks to the idea of authorship in a software system, which ill-conceived graphical interfaces often dull and disfigure out of recognition in the name of ease of use.

Disney is in the business of putting out a product of seamless illusion – a magic mirror that reflects the world back better than it really is. But a writer is literally talking to his or her readers, not just creating an ambience or presenting them with something to look at; and just as the command-line interface opens a much more direct and explicit channel from user to machine than the GUI, so it is with words, writer, and reader.

Compared to more recent productions like Beauty and the Beast and Mulan, the Disney movies based on these books (particularly Alice in Wonderland and Peter Pan) seem deeply bizarre, and not wholly appropriate for children. That stands to reason, because Lewis Carroll and J.M. Barrie were very strange men, and such is the nature of the written word that their personal strangeness shines straight through all the layers of Disneyfication like x-rays through a wall. Probably for this very reason, Disney seems to have stopped buying books altogether, and now finds its themes and characters in folk tales, which have the lapidary, time-worn quality of the ancient bricks in the Maharajah’s ruins.

Blurring and erasing authorship leaves media with a kind of broad, cultural smear of a personality without any pointed sense of identity.

In this world, artists are like the anonymous, illiterate stone carvers who built the great cathedrals of Europe and then faded away into unmarked graves in the churchyard. The cathedral as a whole is awesome and stirring in spite, and possibly because, of the fact that we have no idea who built it. When we walk through it we are communing not with individual stone carvers but with an entire culture.

I also found his long-running allegory between Morlocks and Eloi from H. G. Wells’s The Time Machine poignant.

Back in the days of the command-line interface, users were all Morlocks who had to convert their thoughts into alphanumeric symbols and type them in, a grindingly tedious process that stripped away all ambiguity, laid bare all hidden assumptions, and cruelly punished laziness and imprecision. Then the interface-makers went to work on their GUIs, and introduced a new semiotic layer between people and machines. People who use such systems have abdicated the responsibility, and surrendered the power, of sending bits directly to the chip that’s doing the arithmetic, and handed that responsibility and power over to the OS. This is tempting because giving clear instructions, to anyone or anything, is difficult. We cannot do it without thinking, and depending on the complexity of the situation, we may have to think hard about abstract things, and consider any number of ramifications, in order to do a good job of it. For most of us, this is hard work.

Stephenson directly addresses the topic of graphical user interfaces (GUIs). In his view, GUIs are a drag to software authors, who pay for the explosion in software complexity required, and a drag to end users, who pay in gross intermediation of their interaction with reality that often results in annoying and dangerous confusions.

What we’re really buying is a system of metaphors. And–much more important – what we’re buying into is the underlying assumption that metaphors are a good way to deal with the world.

So we are now asking the GUI to do a lot more than serve as a glorified typewriter. Now we want to become a generalized tool for dealing with reality.

A few lines of computer code can thus be made to substitute for any imaginable mechanical interface. The problem is that in many cases the substitute is a poor one. Driving a car through a GUI would be a miserable experience. Even if the GUI were perfectly bug-free, it would be incredibly dangerous, because menus and buttons simply can’t be as responsive as direct mechanical controls. My friend’s dad, the gentleman who was restoring the MGB, never would have bothered with it if it had been equipped with a GUI. It wouldn’t have been any fun.

GUIs tend to impose a large overhead on every single piece of software, even the smallest, and this overhead completely changes the programming environment. Small utility programs are no longer worth writing. Their functions, instead, tend to get swallowed up into omnibus software packages.

UNIX and open source

I love the way Stephenson writes about open source artifacts, in particular the UNIX lineage of operating systems, as an oral tradition.

Windows 95 and MacOS are products, contrived by engineers in the service of specific companies. Unix, by contrast, is not so much a product as it is a painstakingly compiled oral history of the hacker subculture. It is our Gilgamesh epic.

Commercial OSes have to adopt the same official stance towards errors as Communist countries had towards poverty. For doctrinal reasons it was not possible to admit that poverty was a serious problem in Communist countries, because the whole point of Communism was to eradicate poverty. Likewise, commercial OS companies like Apple and Microsoft can’t go around admitting that their software has bugs and that it crashes all the time, any more than Disney can issue press releases stating that Mickey Mouse is an actor in a suit.

Intellectualism

I find his writing here absolutely striking and beautiful in style.

But more importantly, it comes out of the fact that, during this century, intellectualism failed, and everyone knows it. In places like Russia and Germany, the common people agreed to loosen their grip on traditional folkways, mores, and religion, and let the intellectuals run with the ball, and they screwed everything up and turned the century into an abbatoir. Those wordy intellectuals used to be merely tedious; now they seem kind of dangerous as well.

We Americans are the only ones who didn’t get creamed at some point during all of this. We are free and prosperous because we have inherited political and values systems fabricated by a particular set of eighteenth-century intellectuals who happened to get it right. But we have lost touch with those intellectuals, and with anything like intellectualism, even to the point of not reading books any more, though we are literate. We seem much more comfortable with propagating those values to future generations nonverbally, through a process of being steeped in media.

Orlando used to have a military installation called McCoy Air Force Base, with long runways from which B-52s could take off and reach Cuba, or just about anywhere else, with loads of nukes. But now McCoy has been scrapped and repurposed. It has been absorbed into Orlando’s civilian airport. The long runways are being used to land 747-loads of tourists from Brazil, Italy, Russia and Japan, so that they can come to Disney World and steep in our media for a while.

The right pinky of god

Stephenson concludes with a section which I love so much that I can’t resist quoting it nearly in its entirety:

I think that the message is very clear here: somewhere outside of and beyond our universe is an operating system, coded up over incalculable spans of time by some kind of hacker-demiurge. The cosmic operating system uses a command-line interface. It runs on something like a teletype, with lots of noise and heat; punched-out bits flutter down into its hopper like drifting stars. The demiurge sits at his teletype, pounding out one command line after another, specifying the values of fundamental constants of physics:

universe -G 6.672e-11 -e 1.602e-19 -h 6.626e-34 -protonmass 1.673e-27....

and when he’s finished typing out the command line, his right pinky hesitates above the ENTER key for an aeon or two, wondering what’s going to happen; then down it comes–and the WHACK you hear is another Big Bang.

Now THAT is a cool operating system, and if such a thing were actually made available on the Internet (for free, of course) every hacker in the world would download it right away and then stay up all night long messing with it, spitting out universes right and left. Most of them would be pretty dull universes but some of them would be simply amazing. Because what those hackers would be aiming for would be much more ambitious than a universe that had a few stars and galaxies in it. Any run-of-the-mill hacker would be able to do that. No, the way to gain a towering reputation on the Internet would be to get so good at tweaking your command line that your universes would spontaneously develop life. And once the way to do that became common knowledge, those hackers would move on, trying to make their universes develop the right kind of life, trying to find the one change in the Nth decimal place of some physical constant that would give us an Earth in which, say, Hitler had been accepted into art school after all, and had ended up his days as a street artist with cranky political opinions.

Even if that fantasy came true, though, most users (including myself, on certain days) wouldn’t want to bother learning to use all of those arcane commands, and struggling with all of the failures; a few dud universes can really clutter up your basement. After we’d spent a while pounding out command lines and hitting that ENTER key and spawning dull, failed universes, we would start to long for an OS that would go all the way to the opposite extreme: an OS that had the power to do everything–to live our life for us. In this OS, all of the possible decisions we could ever want to make would have been anticipated by clever programmers, and condensed into a series of dialog boxes. By clicking on radio buttons we could choose from among mutually exclusive choices (HETEROSEXUAL/HOMOSEXUAL). Columns of check boxes would enable us to select the things that we wanted in our life (GET MARRIED/WRITE GREAT AMERICAN NOVEL) and for more complicated options we could fill in little text boxes (NUMBER OF DAUGHTERS: NUMBER OF SONS:).

Even this user interface would begin to look awfully complicated after a while, with so many choices, and so many hidden interactions between choices. It could become damn near unmanageable–the blinking twelve problem all over again. The people who brought us this operating system would have to provide templates and wizards, giving us a few default lives that we could use as starting places for designing our own. Chances are that these default lives would actually look pretty damn good to most people, good enough, anyway, that they’d be reluctant to tear them open and mess around with them for fear of making them worse. So after a few releases the software would begin to look even simpler: you would boot it up and it would present you with a dialog box with a single large button in the middle labeled: LIVE. Once you had clicked that button, your life would begin. If anything got out of whack, or failed to meet your expectations, you could complain about it to Microsoft’s Customer Support Department. If you got a flack on the line, he or she would tell you that your life was actually fine, that there was not a thing wrong with it, and in any event it would be a lot better after the next upgrade was rolled out. But if you persisted, and identified yourself as Advanced, you might get through to an actual engineer.

What would the engineer say, after you had explained your problem, and enumerated all of the dissatisfactions in your life? He would probably tell you that life is a very hard and complicated thing; that no interface can change that; that anyone who believes otherwise is a sucker; and that if you don’t like having choices made for you, you should start making your own.

Like rocks, like water

2024-05-24 12:26:57

For a long time, global supply of energy was limited by the total number of humans in the world. We could only put more energy to work by creating more humans, or by each human working more.

And then we mechanized it. As energy supply grew exponentially, we came up with lots of new things we could do now that we could spend a thousand or a million times more energy on something valuable. Things that didn’t make sense to even try before when every joule of energy came from human labor, like invent trains and electric lights and build cities and industrialize farming. Before mechanized energy supply, we could have asked, “What would we ever do with a billion times more energy besides make a billion of the things we already make?” But it turns out there are a lot of things we could do with a billion times more energy. Much of the energy consumed today goes to power human activities that did not exist in current form before the mechanization of energy supply, like trade, transit, and computing.

If we are approaching the slow but certain mechanization of intellectual labor, it’s natural to ask, “What would we ever do with a billion times the intelligence?”

I think the vast majority of intelligence supply in the future will be consumed by use cases we can’t foresee yet. It won’t be doing a billion times the same intellectual work we do today, or doing it a billion times faster, but something structurally different.

Rocks and water

Manual supply, whether of energy or intelligence, is like having a bunch of rocks. You find small rocks, big rocks, sharp rocks, and round rocks, each for their own purpose. You can amass a huge rock collection to do different things, but each rock is kind of its own thing. You can’t just say “I have n kg of rocks. That enables XYZ.” You can’t combine small rocks to make a big rock, or turn a big sharp rock into a giant wheel of the same size. There are things you can do with huge rocks that you will never be able to do with a million pebbles.

Scaled, mechanized supply is like water.

Right now, people totally misunderstand what AI is. They see it as a tiger. A tiger is dangerous. It might eat me. It’s an adversary. And there’s danger in water, too — you can drown in it — but the danger of a flowing river of water is very different to the danger of a tiger. Water is dangerous, yes, but you can also swim in it, you can make boats, you can dam it and make electricity. Water is dangerous, but it’s also a driver of civilization, and we are better off as humans who know how to live with and work with water. It’s an opportunity. It has no will, it has no spite, and yes, you can drown in it, but that doesn’t mean we should ban water. And when you discover a new source of water, it’s a really good thing.

I think we, collectively as a species, have discovered a new source of water, and what Midjourney is trying to figure out is, okay, how do we use this for people? How do we teach people to swim? How do we make boats? How do we dam it up? How do we go from people who are scared of drowning to kids in the future who are surfing the wave? We’re making surfboards rather than making water. And I think there’s something profound about that.

David Holz, Midjourney

Water is water. Once you figure out how to rein in water to do useful work, you simply construct a way to channel the flow of water, and then go out to any river or ocean and find some water. All water is the same, and having twice the water gives you twice as much of whatever you want to use the water for. A billion times the water, a billion times the output. Water can flow constantly, continuously, forever, as long as the river flows and the tides come and go. Every drop of water costs the same, and every drop of water is like every other drop of water. There is only less, or more, and how you put it to work.

Water, given enough time, can chip and dissolve any rock into powder. But rocks held together can guide where water flows, and by doing so, carve rivers and canyons and even move coastlines.

Continuity

2024-01-02 02:00:12

It’s nearly 2024.

I used to write these long, extensive lists of goals when one year rolled over into the next — usually 10 each time. I would structure the list to contain 5 goals that were relevant for “work” (and other things I bundle into “work” like personal projects and technical reading) and 5 more that were more relevant to my personal life, dealing with building relationships, improving my lifestyle, travel, and so on. I was really rigorous about them, too, like setting precise numbers of books I wanted to read or people I wanted to have conversations with. At a time in my life when there was overbearing structure all around me in the form of school schedules, geographic constraints, and academic and professional responsibilities competing for my time, these detailed ten-point lists were a way to continually remind myself how I wanted to spend it. They shaped a lot of my life, especially my career.

I stopped doing that, I think, when time froze in March 2020 in perpetuity with the planets and the constellations still hanging midair. I also left university around the same time, and since then my life has had a lot more continuity from season to season. I didn’t feel I needed to declare each year anew by enumerating a new list of things I cared to accomplish, because I was likely already working on those goals, and planning to continue pursuing them. Like many of us I also can’t quite escape the feeling that I’m still living in some over-extended 19th season of 2020.

In that continuity that blurred and smoothed out any life transitions, my mental image of my goals and priorities shifted and sharpened incrementally over time. So while I never took a dramatic turn of declaring a new set of goals or values for 2024, the implied list of goals in my mind looks different to any other year before. Rather than a melange of ten different, precise goals I want to check off, these days I try hard to focus on less than a handful of areas of my life into which I want to pour as much time as possible.

As a checkpoint in the ever-extending continuity of my life and a reminder to myself, I thought I’d write down those areas of focus here as the year ticks from a 3 to a 4.

Whenever I feel lost, I always ground myself in knowing that everything that I can do that feels fulfilling actually falls into three categories:

  1. Something accomplished. Solving a hard technical problem, publishing a new piece of writing, or building and publicly sharing something useful. I find that I feel fulfilled when I complete and publish some kind of artifact encapsulating something I learned or wanted to express.
  2. Something crafty. Over the holiday week, I spent quite some time drinking tea, playing my instruments, reading papers, and going for a walk around downtown Manhattan. These don’t really accomplish things, but they make me happy, give me marginal space in my life, and provide some way to fill my time when I’m not gunning to check something off a to-do list.
  3. Something shared. This year, I experimented with hosting small gatherings (ten to thirty friends, coworkers in the field, relative strangers) and came away wanting to refine my approach and host many more. I spent much more time with my friends in the San Francisco Bay Area thanks to the many, many work trips across the country. With some old friends of mine becoming coworkers, I also got to connect much more closely with them. Time spent with the right people is never wasted, and I felt my circle of the people I treasure becoming smaller and closer.

As long as I’m splitting my time somewhat equally among these three things, I’ve felt well. When I try too hard to go super deep on just one, the others take a hit and everything feels thrown off-balance. The last few months have been a process of my learning how to hold these three things at about a constant eighty percent, by my subjective calculus.

I started my 2023 with a few major life transitions. I ended a long-term relationship, and I started a new job after a year of solo exploration. The job meant I was once again living on a salary rather than spending into my savings, but also took away nearly all of the time I had been spending working on my own projects and learnings. Together, these changes forced me to find the right way to spend my time every day from a completely new foundation, and it took me most of the year (until around October, I think) to really feel like I had settled into the right balance of time spent among those three categories I try to nurture.

Of course, I’ve also learned how to spend my time within these three categories better, with more depth. I think I grew a lot as an engineer and communicator, and learned a lot about research in the machine learning domain. I learned to be much more judicious with my time and attention. I think I became much more adept at growing new friendships into deeper ones. But I’ll have chances in the future to write more about those.

Comparing my priorities now to those extensive lists of goals from yesteryears, I get the sense that I’ve learned (and been forced) to travel light. When I go on a trip, I try to pack as little as possible into as few baggages as possible, ideally all in a set that I can carry on my body. Everything’s within reach, and I never have to worry if I’m forgetting something or getting too distracted by the extravagance of wanting too many things. A pared-down list of priorities feels the same. No extravagant ceremony of counting books and flights. No ambitions about becoming X or Y. Just three things that are important. Am I giving them enough time? Simple yes-or-no questions to know if I’m on track. There are some details. I want to write more in 2024, for example; in 2023 between the new job and lots of travel my writing habit collapsed. I want to read more fiction. I want to return to percent-of-income recurring donations. But those are implementation details, and will happen if I focus on the right priorities. (I also think I’m past the point where writing about them on the public web helps me achieve them.)

An expansive collection of ambitions helped me explore the space of possible hobbies and goals, and I think it was helpful for its time. This new mode feels focusing.

Notion, AI, and Me

2023-01-18 03:11:30

Cross-posted from my Notion.


Over my last year of independent work, I built a lot of prototypes and stepped one foot into research land. I also learned a lot about how I work — what I enjoy working on, what I’m good at, what I’m bad at, and what I want to be doing long-term.

One thing I spent a lot of time thinking about is the life cycle of ideas: How a good idea can die on the vines because of bad or premature execution. How a bad idea can become the “default” because it just happened to be what everyone adopted when a big paradigm change swept the world. And how ideas, in general, have many stewards over their lifetimes; the best ideas often start in niche communities or research labs, then get picked up by products or communities with progressively less niche interests and incrementally wider distribution until, if we’re lucky, they reach billions of people.

Different kinds of people and companies excel at different parts of this grand propagation of ideas. Some people, often researchers, enjoy working at the edge of what we know. Others like to tinker with those fresh ideas and invent new products from their recombinations. Yet other teams make their mark taking the ideas and visions in a niche community and making them legible, accessible, and affordable for everyone. One thing I learned in 2022 is that I feel most fulfilled when I work on the early stages of an idea — learning new things about our world, and exploring what new powers those ideas give us in this world. But I want to get better at the later stages, and understand how they work. Because without the execution and distribution that help good ideas reach billions of people, the world won’t improve on its own.

Within the “tools for thought” and adjacent communities, I see a lot of people bemoaning the premature death of good ideas. Maybe the world is just unfair! Maybe the best ideas are doomed to fail, or doomed to be too early! “Worse is better! The simple dumb ideas win!” But that sounds overly defeatist to me, and ignorant of the long path that ideas, especially economically strange ideas like user interface design innovations, take to reach wide distribution. The work only begins at having the idea, and continues down the grand propagation until it’s in the hands of individuals far removed from labs and demos, using it to improve how they live their day.

That last, long stage — taking great ideas out of the lab and sculpting it into something billions of people could use — is a particular strength of Notion. Many features people love about this tool are ideas that are, at their core, technically complex, like end-user programmable relational databases or transclusion. This is also a skill set that seems often undervalued within niche corners of both “tools for thought” and AI communities.

My excitement about working at Notion comes from my optimism for bringing these skills, and the growing distribution Notion has in the hands of knowledge workers and creatives around the world, to some of the hard and interesting problems I’ve been thinking about for the last year. Questions like:

and obviously, my eternal favorite:

This is an exciting time to be building tools for creativity and thought, but also a pivotal time. When a new wave of products and companies and platforms sweep through, the winning tools often set the default interface metaphors and technical conventions, regardless of whether they were the best ideas available. One of my goals over the next few years is to ensure that we end up with interface metaphors and technical conventions that set us up for the best possible timeline for creativity and inventions ahead.

You can follow what’s coming to Notion in the coming months at @NotionHQ. I’ll keep writing at thesephist.com and stream.thesephist.com, too.

— Linus

Backpropagating through reasoning

2023-01-11 20:29:28

Direct manipulation interfaces let us interact with software materials using our spatial and physical intuitions. Manipulation by programming or configuration, what I’ll call programmatic interfaces, trades off that interaction freedom for more power and precision — programs, like scripts or structured queries, can manipulate abstract concepts like repetition or parameters of some software behavior that we can’t or don’t represent directly. It makes sense to directly manipulate lines of text and image clips on a design canvas, but when looking up complex information from a database or inventing a new kind of texture for a video game scene, programmatic control fits better. Direct manipulation gives us intuitive understanding; programmatic interfaces give us power and leverage. How might we combine the benefits of both?

I started thinking about this after looking at Glisp, a programmatic drawing app that also lets users directly manipulate shapes in drawings generated by a Lisp program, by sort of back-propagating changes through the computation graph back into the program source code itself:

A screenshot of Glisp. A simple geometric drawing is displayed on a canvas, with Lisp code that generates that drawing on a right panel. A block of code is highlighted; that code block corresponds to the part of the drawing highlighted with the mouse.

By resizing the radius of a circle on the canvas with my mouse, for example, I can change the number in the Lisp program’s source code used to compute the circle’s radius. Kevin Kwok’s G9 library does something similar: it renders 2D images from a JavaScript drawing program such that the drawings become “automatically interactive”. This is possible because, every time someone interacts with a point on the drawing, the input is backpropagated (this time, literally through automatic differentiation) through the program to modify it in a way that would produce the new desired drawing.

I really like this pattern of “push the interactions backwards through the computation graph to compute new initial states”. Its benefits are obvious for graphical programs, but also useful for more “information processing” programs. Imagine:

Some of this is possible today; others, like propagating edits to a summary back into source documents, requires rethinking the way information flows through big AI models. My past work on reversible and composable semantic text manipulations in the latent space of language models seems relevant here as well, and I’ve played with things like “reversible summary edits” lightly with it.

One way to think about language models is a programmatic interface to unstructured information. With that versatility and power also comes indirection and loss of control. Techniques for “backpropagating through reasoning” like this might be an interesting way to bridge the gap.

Build UNIX, not Uber

2022-07-17 04:37:45

I spent the last three weeks in Seoul, partly visiting family I hadn’t seen in years and partly taking a vacation from my usual work in my usual city in my usual context of life. While I haven’t been able to work much at all, I spent my subway rides and waiting-in-line times reading the Singularity Sky series of novels by Charles Stross.

These stories unfold in a post-singularity world with a multi-planetary human civilization. One of the central ideas that Stross explores in the series is the idea of economic scarcity. Is it necessary? What happens as technology chips away at it slowly but surely over time? What’s its relationship to human conflict? In a post-scarcity world, how does wealth accumulate? What meaning does money and wealth have?

It’s really an excellent series, and you should read it if you find any of these questions intriguing.

Exploring this post-scarcity universe (I’m still deep in the middle of it) made me question my personal conceptions of scarcity and how it influences the way we understand wealth, success, and the value of good ideas. It’s this relationship between economics and the value of ideas I want to invite you to explore today.


A few months ago, I had a chance to ask a certain famous founder of a multi-billion-dollar software company a question. I asked,

Silicon Valley likes to treat venture-backed corporations as a kind of universal hammer, approaching every large – even civilization-scale – problems by building companies. Given that you’ve contributed to venture-backed companies, nonprofits, and other unorthodox research initiatives, what do you think of “companies” as a way to solve large problems facing humanity?

He said he thought, despite the hype, companies were still an underrated tool for creating large-scale societal change. His reasoning: companies are uniquely independent in a world of bureaucracy, because companies are only ultimately beholden to customers.

There’s an element of truth in this. While being so independent and autonomous, good companies can amass huge pools of resources to coordinate large-scale action. This coordination power can be put to work exploring a wide breadth of scientific solutions, attempting huge construction projects, or motivating political change.

There are, of course, other ways humans coordinate power at massive scale. A lot of them look like religion (social movements, political activism, and of course, religion itself). Some of them increasingly look like software (open-source communities, trustless decentralized coordination built on cryptography). The rest of them look like markets (venture capital, taxes, science, global trade). It’s too early to tell what a mature ecosystem of software-defined coordination will look like. But between religion and markets, it’s probably easier for single human beings to build and scale coordination power with companies than with religious movements.

Companies amass coordination power by having good ideas, executing them using more good ideas, and capturing a big part of the resulting value created (that is, when they’re not extracting value out of monopoly controls over markets). The effectiveness of corporations that the founder alluded to above comes from two things being true:

In other words, companies cannot accumulate power if (1) everyone can get what they make for free, or (2) everyone can steal what they need or copy useful ideas without consequence. At various points in human history, these things has not been (or will not be) true. In a lawless society, companies can’t expect to capture most of the value they create – you get a world of piracy instead. In a hypothetical post-singularity society where ideas can be copied and designs can be reproduced for free in untraceable ways, companies can’t expect to capture most of the value they create. Companies, useful change-making instruments they may be, could turn out to be a gimmick of a particular phase of civilization – post-stability, pre-singularity.


A world in which companies can’t capture the worth of their work might sound apocalyptic, but I don’t think it has to be. There are corners of today’s world where we can observe what happens when companies don’t effectively capture the value of their good ideas.

The free and open-source software world is funded by companies, but not owned by corporations. Software is free to copy, and with the right permissive licenses, anyone with a good idea can copy and build upon existing open-source software to improve it or experiment with new ideas. If the stewards of a project are opposed to a new idea, proponents of the change can “fork” and try to win mindshare and support with their design or implementation.

Sometimes, companies lose ownership over their intellectual property by law. In 2024, the first versions of Mickey Mouse will enter public domain in the United States. Mickey will then join a pool of public domain literature and media that is free for anyone to remix, reuse, and reproduce for their own needs and creative tastes. This allows artists to create new work, obviously, but also allows initiatives like the free e-book library Project Gutenberg to exist. Project Gutenberg is one of several datasets used to train large language models like GPT-3. Free, public information has far-reaching consequences.

Elsewhere in the world, there are economic regimes that have intellectual property law that’s permissive or weakly enforced, to where ideas can be stolen and copied more easily. In China’s Shenzhen special economic zone, a lax approach to intellectual property law enforcement, in addition to the business and foreign investment-friendly policies in the zone, bloomed a tech hardware cornucopia. Shenzhen’s manufacturers and factories power a significant portion of the world’s consumer technology industry.

In all these environments where companies can’t capture the value of their work as effectively, the most powerful force in shaping society is how effectively good ideas spread in absence of explicit control and stewardship. Ideas that are useful, like material science and astronomy, spread effectively. But so do ideas that encourage their owners to spread them, like most religion; and ideas that grant their owners more power, like capitalism and weapons engineering. In a post-scarcity world, ideas are ascendant.

Of course, big ideas need resources like money and large-scale coordination to be implemented. Companies give their operators resources that’s useful for implementing ideas, like building rockets out of good rocket designs or reshaping democracy through an idea for how people can connect more easily online. But I think there’s reason to believe that resources required to implement big ideas are going down over time. Projects like the interstate highway system or the Manhattan Project required lots and lots of money and resources, but the Manhattan Project of today may be AGI, and AGI isn’t anywhere near as capital-intensive. More succinctly put:

Every 18 months, the minimum IQ necessary to destroy the world drops by one point.

There is reason to believe that, over the long term, humanity is moving from a regime where corporations steer society to a regime where autonomous ideas steer society.


We commonly view the history of tech invention and entrepreneurship through the lens of corporations and value capture. But I think there’s another way to study the twenty-first century tech ecosystem, as a battleground for good ideas, spreading autonomously.

UNIX was a good idea. It was free to spread widely across the tech ecosystem, and eventually some corporate players like Apple and Sun captured a lot of the value, but Linux stands out as the UNIX “winner” in 2022. Neither the core ideas of UNIX-style operating systems nor the Linux open-source project grew solely because of corporate stewardship. They spread and won because they were good ideas.

“Making inefficient markets more efficient with software” was a good idea, and companies like Uber, Doordash, and Etsy captured a lot of the value.

My favorite example is probably the Web. The Web – an interlinked collection of interactive documents made available through a decentralized Internet protocol – was a great idea. I think it’s no exaggeration to say that the Web won the platform wars between major operating system vendors without even really being an operating system. If you’re building a software company, 9 of 10 times, you better have a web application. The Web is an idea. No company owns it, and no company can.

I’m fascinated by this way of looking at how to make civilizational change: release good, powerful ideas into the world. And if circumstances allow, try to capture some of the value that comes from those ideas shaping the world in their image. Perhaps corporations will persist as the hammer and chisel with which history is carved through our generation. But perhaps not.

Charles Stross writes of Manfred, the protagonist in his novel Accelerando:

“You are very unusual. You earn no money, do you? But you are rich, because grateful people who have benefited from your work give you everything you need. You are like a medieval troubadour who has found favor with the aristocracy. Your labor is not alienated – it is given freely, and your means of production is with you always, inside your head.”

and later,

“You want to abolish scarcity, not just money!”

“Indeed.” Gianni grins. “There’s more to that than mere economic performance; you have to consider abundance as a factor. Don’t plan the economy; take things out of the economy. Do you pay for the air you breathe? Should uploaded minds – who will be the backbone of our economy, by and by – have to pay for processor cycles? No and no.

As the technology machine chips away at scarcity with its power tool called software, I wonder if our most resilient legacies would be made by searching for ideas that spread more than companies that scale. Building UNIX, not Uber.

Design with materials, not features

2022-06-21 03:23:10

Some software interfaces are windows into collections of features. The Uber app, for example, literally opens with a screen full of buttons, each of which take you to a different screen with yet more buttons and inputs. Google Search is also built with features – inputs, buttons, and links that take you to different capabilities in the app – as the building block. In both of these cases, there are a few clear and obvious tasks the user wants to accomplish when they open the app. In the case of Uber, it’s to get somewhere or to order some food. On Google, it’s to find some website or information. Because there are well-defined paths for users to take, familiar interface elements like links and buttons to access different features makes a lot of sense.

Screenshots of Uber and Doordash on an iPhone. There are many buttons and a search box in both.

For software like Figma, Apple Notes, and the humble file explorer, there aren’t such clearly defined tasks that users would want to accomplish when the open the app. People don’t open up a file browser to just create new files or just move files between folders; there are a million different things users may want to do with files, or with text inside their notes, or with shapes in their Figma boards. For these kinds of apps, software-as-a-bag-of-features doesn’t work. Instead, these apps present the user with materials and clear laws of physics governing how those materials behave. In Figma, these materials are shapes and text on the Figma board. In Apple Notes and similar text editors, users manipulate materials like text and pencil strokes. Within a file explorer, users work with the file object, which is only a loose metaphor for physical files. None of these software objects are faithful metaphors of real objects or even real physics, but they have internally consistent “laws of physics” of their own that govern how they behave. We learn to use such interface made of materials rather than features by internalizing these new laws of physics, and learning to work with new software materials.

I really like this model of software interfaces as users interacting with well-defined materials. Software constructed this way feels more open and creative, because they don’t prescribe a finite set of tasks you can accomplish with them. In feature-based interfaces, N features gives you N different capabilities. In material-based interfaces, N different materials gives you at least N × N different capabilities. Material-based software can also have gentler learning curves, because the user only needs to learn a small set of rules about how different metaphors in the software interact with each other rather than learning deep hierarchies of menus and terminologies. In the best case, users can continue to discover new capabilities and workflows long after the initial release of the software.

In Spatial Software, John Palmer writes about a similar mental model for spatial software:

This is our definition of spatial software. It is characterized by the ability to move bodies and objects freely, in a parallel to the real world. This is opposed to traditional software, which uses some other logic to organize its interface.

Figma and Second Life are examples of spatial software. They contain worlds where the relationships between objects on a canvas, or bodies in an environment, respectively, are the organizing logic of the interface. WhatsApp is not spatial software. The organizing logic of its interface is recency of messages, not spatial relationships.

In his model of spatial interfaces, objects and “bodies” (representing users) occupy some software “space”, moving around and interacting by well-defined rules. But I think we can expand this mental model to software that doesn’t use obvious “space” metaphors, too. In the version control software Git, programmers work primarily with software objects called “commits”. These commits are the basic building material of larger abstractions like branches and tags and releases in workflows using Git. Commits follow a very strictly defined set of rules: they can come before or after other commits. They can be “diffed” with other commits to generate a text diff of changes. They can interact with other commits in the same repository, but not with commits outside of it. Commits are the material from which the software interface of Git is built, even though there’s no obvious spatial metaphor in Git. Git isn’t a collection of features that let you operate on a project (though some use it as such), but a material (commits) and a set of tools to let you work with it flexibly. This enables Git to support a wide array of workflows without explicitly being designed for all them from the beginning.

		 o---B2
		/
---o---o---B1--o---o---o---B (origin/master)
	\
	 B0
	  \
	   D0---D1---D (topic)

(ASCII art courtesy of the git merge-base documentation)

Inventing new software materials

Powerful interface innovation happens when we discover new useful metaphors and reify them in new software materials. Take nonlinear video editing software, like Final Cut Pro and Premiere Pro, for example. Though they have their fair share of menu complexity, it’s pretty intuitive for anyone to understand the basic building blocks of video and audio clips sitting on a layered timeline. Here, the video and audio clips are composed of a software material that have their own laws of physics: they can grow and shrink to take up more or less time on the timeline. They can be moved around and layered in front of or behind other clips, but they can’t occupy the same space on the same “track” of the timeline. The timeline that runs left-to-right is a kind of “world” in which the materials of audio and video exist.

A screenshot of Final Cut Pro, with a timeline of layered clips at the bottom

You might look at the example of video editors and say, “it’s pretty obvious that video editing should work this way.” So let’s take something for which we don’t currently have a good material-based interface, managing web browsing history, and imagine how we might improve it.

All popular web browsers currently expose browsing history as a simple list of URLs. Some of the more advanced, niche solutions let you work with history as trees, where each branch is a path you took exploring different trails of links. But these are weak abstractions. We can’t really do much with history entries in these interfaces except to re-open pages we had closed.

Tyler Angert has been working on a kind of “Git for web browsing sessions” as he described it to me, with a well-defined way to save and restore browsing sessions.

This idea made me wonder what the right software metaphor for “a point in my browsing history” would be. Here are some questions I would ask:

Thinking about these questions… here’s one interesting possibility. What if, as you were browsing the web, you were recording your history onto some “browsing reel” that felt like a video editor? Visiting each page would create a new “clip” on the recording’s timeline for that page. If you wanted to start a new “session”, you can simply hit Enter, and the “recording cursor” could move down to a new timeline, like making a new paragraph in a text editor.

A sketch of the “browsing reel” concept

After the fact, you could select sections of your browsing history and restore those open tabs, copy them and share them with collaborators, or even cut and paste them elsewhere on the timeline to rearrange them and produce an “edited” version of their history for later reference, with all the less important side-quests cut out.

Given that I just came up with this in a few minutes of sketching, nobody should probably build this. But there are some things I like about recording browsing history as a continuous timeline of clips. I also like that borrowing the clips-and-timelines metaphor of video editing implies an obvious interface for editing and exploring web browsing history.

An interface like this based on materials over features would naturally absorb some “features” of today’s browsers. We would be able to “re-open closed tabs”, for example, by just selecting the most recent section of the recording timeline and opening all the pages again.

Software materials for thought

Once again, I’m taking it all back to creative thinking tools.

Most “tools for thought” these days work with two fundamental “software materials” – text spans and files. Note-taking and writing apps are mostly spatial interfaces where the users uses a cursor to manipulate the material of text. At a higher level, many of these apps store information from the user in files for sharing, backup, and organization.

As we begin to ask more of our thinking tools than just hyperlinked text editing, we may benefit from inventing new software materials to represent ideas, at a slightly higher level than text, but slightly lower level than files, with something much, much smarter leveraging modern advances in AI and NLP. I think our current abstractions for text and files are good for operations at that level – editing text at a character level and moving information around in files and folders – but they don’t really help us work with ideas as effectively as they could, because the information expressed in the text is opaque to software tools.

In a previous post, I speculated on some properties that a “software representation for thought” should have. They were:

Ambitious note-takers use workflows today that fit some of these requirements, through sheer force of will. For example, we could just adhere to a self-imposed policy of tagging every quote with a URL of its original document. But I think a new software material for thought that naturally embodied these properties would give rise to more intuitive workflows that let its users work more effectively without complicating their responsibilities.

When designing new tools for thought, let’s think not just in terms of features, but materials – what software laws of physics do we want embodiments of our thoughts to obey?

Hyperlink maximalism

2022-05-22 08:02:43

I’m a hyperlink maximalist: everything should be a hyperlink, including everything that is hyperlinked by the author, everything that isn’t hyperlinked by the author, and perhaps even the hyperlinks themselves. Words should be hyperlinked, but so should be every interesting phrase, quote, name, proper noun, paragraph, document, and collection of documents I read.

There are two obvious problems with this idea:

  1. No author has time to hyperlink infinite permutations of everything they write, and
  2. If everything is hyperlinked, nobody will know which links are useful.

But we can solve both of these issues if we simply begin with today’s lightly hyperlinked documents, and let the reader’s computer generate links on-demand. When I’m reading something and don’t understand a particular word or want to know more about a quote, when I select it, my computer should search across everything I’ve read and some small high-quality subset of the Web to bring me 5-10 links about what I’ve highlighted that are the most relevant to what I’m reading now. Boom. Everything is a hyperlink, and each link reveals to me new and interesting connections between the finite knowledge I have and the infinite expanse of information on the web.

This raises a third issue: How would I, the reader, know which words or ideas are interesting to click on?

That, too, can be solved similarly. The computer can look at every word on the page, every phrase, name, quote, and section of text, and show me a “map” of the words and ideas behind which lay the most interesting ideas I might want to know about. In this vision, links are no longer lonesome strands precariously holding together a sparsely connected Web, but a booming choir of connections tightly binding together everything I have read and I will read. From explorers walking across unknown terrain guided only by the occasional blue underlined text, we become master cartographers, with every path and trail between our ideas charted out in front of us.

This is not to say that automatically generated links will replace hyperlinks authors and note-takers like to use today, or even that we should try to replace and deprecate manually-placed hyperlinks. Rather, automatic links can complement manually-annotated links with something that scales faster and easier, so in a world where links can be automatically created, most links will be machine-made.

This vision of a knowledge tool with “everything as a link” really appealed to me when I was building myself a new app for my personal notes earlier this year, so I sought out to prototype a basic tool that would try to achieve some of what I speculated on above: begin with basic, conventional text documents, generate links “on the fly” between my ideas, and visualize a map of such links and connections across my knowledge base.

The result is an app that I named Notation. It’s where my personal notes have lived since the start of the year, and while it’s not very feature-rich, I think it’s an interesting demonstration of some of the ideas of hyperlink maximalism.

Notation, a prototype

At first glance, Notation is just another notes app with nested lists as the basic structure for information. Everything in Notation lives in a single, giant bulleted list, and each bullet can contain sub-bullets that open up when you toggle the bullet by clicking on it or hitting Ctrl + Enter.

Here’s a page of my own notes on Andy and Michael’s excellent essay on tools for thought:

A screenshot of Notation, with bulleted lists of text. Some words are highlighted in varying shades of gray.

You’ll notice that some text on the page is highlighted in shades of gray. This layer of highlight is what I call the “heatmap”. It’s a heatmap of connections between notes that exist behind each word, because in Notation, every word and phrase is a link. To access any word or phrase as a link, you simply highlight it, and a popup will show all the places where I’ve mentioned that idea, sorted so the mentions that share the most similar contexts to my current view are at the top.

That’s really all there is to Notation. It lets me treat my notes as if every idea is linked to every other mention of that idea, without ever manually linking anything. The heatmap highlights let me know which words I should try highlighting to see the most relevant and interesting connections between the idea in front of me and the rest of my notes.

These highlights can, as an example, show me when I’ve mentioned a person’s name in many different places.

It can also pick up on common phrases like “spaced repetition” or proper nouns like “Quantum Country” without any kind of prior knowledge or training (though both would probably help, the current version doesn’t use either). If these phrases end up being important ideas in my notes, they’ll become more and more highlighted over time.

It can help me notice connections between ideas in my notes that I wouldn’t have even thought to make myself, even if I were trying to find interesting notes to link together. For example, here, within my list of software without fractal interfaces, Notation highlighted the word “spreadsheets” and connected it to how most users of spreadsheets use it for visual organization, not calculation. It’s an important insight – interfaces can have useful spatial layouts without being fractal – and I may have missed it, if I had depended on my own memory.

In this next instance, rather than finding distant connections, highlights on the phrase “peripheral vision” surfaced all the different authors who have mentioned it, signaling its importance across different streams of work.

Lastly, in this screenshot, Notation helps me see that both intelligence and expressiveness in an information medium may be emergent properties of their respective systems.

To produce these highlights and heatmaps, Notation currently uses a very simple algorithm for finding ideas that share similar context: two bullet points are similar if they share many n-grams. This is so computationally efficient that Notation can currently run everything in-browser, highlighting and generating heatmaps as you type. There are many ways to make this much smarter, such as by using sentence embeddings from language models to determine text similarity, but as a proof-of-concept of the core interface ideas behind highlights and text heatmaps, Notation has already proven quite useful in my personal use.

The demo

If you want to try the interface for yourself, you can find a public deployment of Notation at notation.app, initialized with a small subset of my personal notes from this year. By zooming into any particular page with Control + Shift + Enter, you can start exploring my notes using highlights and heatmaps on your own.

There’s one caveat: Notation was initially created just for myself, so keyboard shortcuts are essential to getting around. Here are the basics:

=== Notation demo TL;DR ===

Ctrl/Cmd Enter        Expand/collapse bullet
Ctrl/Cmd Shift Enter  Zoom into a bullet

Ctrl/Cmd P            Search
Ctrl/Cmd Shift P      Command bar
Ctrl/Cmd Shift I      New note (top-level bullet)

Ctrl/Cmd ;            Select last word
Ctrl/Cmd '            Select current bullet

The possibilities

Notation in its current form does one thing pretty well – helping me stumble into connections between my own notes. This has already been so useful to me that I found myself writing down thoughts I don’t even care to remember in Notation, because writing it into Notation will surface connections and links that I wouldn’t have remembered myself. But it should come as no surprise that this basic interface idea can be taken much farther.

The first natural extension of Notation is to expand the search scope of its connection-finding beyond my notes, to things like my web-browsing history or my journals. With a smarter system, a similar interface could even automatically discover and show links from your notes to high-quality articles or online sources that you may not have seen yet, automatically crawling the web on your behalf.

Another direction of exploration may be to use generative language models to automatically recombine and synthesize new ideas from my existing notes. In my past experiments with tools that automatically generated new content, I always found it annoying to have to read and trudge my way through information of uncertain value that I didn’t write. But a heatmap highlight interface similar to Notation’s could make it more practical for AI systems to “brainstorm” large amounts of creative explorations when we aren’t looking, because it would only surface when we clicked on a highlight, letting users discover automatically generated ideas at the right time, when they are most relevant.

Notation’s interface is one small attempt to approach and improve thinking as a navigation problem. Highlights and heatmaps drawn by Notation from its understanding of language help us find interesting connections between ideas where we may have remembered none by ourselves, and it turns ideas written down in text form into a kind of a literal “map”. When combined with powerful new techniques in NLP, I think interfaces like this can turn computers into powerful creative collaborators in our attempts to understand the world more deeply.

Knowledge tools, finite and infinite

2022-05-20 04:01:05

A big library holds a kind of strange faux-infinity, spanning across hundreds of topics with voices from millions of authors. Good libraries can contain in their finite space a feeling that, even if you read for centuries and centuries, you would never exhaust the knowledge contained within their walls, not only because there are simply so many books, but because there’s so much to learn when you take the ideas from one book as a lens through which to read others. Infinities assembled out of finite building blocks.

A black-and-white view of a library, with a chair resting between tall bookshelves.

I find this idea of assembling infinites out of finiteness really charming. With enough raw material and creative re-combination, you can conjure an infinity of ideas from something large but finite. It led me to think about what may happen if we designed knowledge tools to evoke the same kind of “infinite library” feeling, by the same mechanism of assembling infinites.

Finite vs. infinite

Your average note-taking app is quite finite. If you have a thousand notes, you have a thousand notes. The tool will give you the means to search through and organize them effectively, but there is an end to the pool of knowledge the tool contains. For such finite tools, mastery over the tool means to become a quick and effective operator of these basic features.

Way on the other end of the spectrum is Google, which like the world’s largest and fastest-growing library feels infinite and bottomless. There’s no sense in which Google “contains” all the information and knowledge it can let you access; it’s merely an interface through which you reach into a seemingly infinite expanse of knowledge, growing faster than you can comprehend. Mastery with such an expansive tool like Google is mastery in creative exploration of the possibility space – finding novel connections and new queries to reach new answers. Mapmaking through the idea maze.

In between your notes (a finite box) and Google (an interface to infinity) are very large collections, like Wikipedia, Stack Overflow, Reddit, and other crowdsourced knowledge bases. These tools contain some definitely finite amount of knowledge, but it’s unrealistic to explore or organize all of it, so mastery involves both curation and creative exploration.

Even today, in the age of Google, most knowledge tools are extensions of the humble note-taking app. As such, they feel very finite. They contain only and exactly the information you put in. A notes app isn’t an interface to something more expansive, nor does it synthesize anything new while you aren’t looking. No matter how large a personal database you have in Roam Research or Notion, they are boxes of information more than interfaces to infinities. I think we should try to design more knowledge tools that let you access infinities.

How to find infinities

The easiest way to open up our notes to infinites would be to let our notes search online to grow autonomously. By this, I mean: what if our notes searched the web for ideas that agreed with us or critiqued us, while we weren’t tending to them? This would transform note-taking into a truly valuable investment – put in a few ideas, discover many more!

Our tools should also constantly look for novel connections between our own ideas. In a sense, this is what the brain is doing constantly when thinking and dreaming, searching for better paths through our idea maze. I would love to use a workspace into which I can put new ideas, and constantly discover connections between my thoughts and other people’s writings that I wouldn’t have noticed myself. I’ve built some prototypes in the past that demonstrate its potential.

A knowledge base that looked for new connections within itself would be less of a memory system, as notes conventionally are, and become more of a generative tool. Generative tools are exciting because they contain real infinites that invite endless play and exploration.

I’m excited by the prospect of designing knowledge tools that feel infinite, that aren’t just repositories of information but interfaces to access a larger expanse of knowledge and generative worlds. Looking at technologies like GPT-3 and DALL-E 2, it seems like with more powerful tools of this kind, there’s also going to be more leverage placed on the interfaces we use to harness their power. If we can get it right, researching and taking notes can become interesting acts of exploration, mapping the cognitive frontiers that await us at the end of infinities.

Resonant

2022-05-18 02:53:11

There is a concept in physics called resonance. In lay terms, it describes the fact that for any given object there are natural frequencies at which the object “likes to vibrate”. A simple pendulum’s natural frequency is the rate at which it swings back and forth; if you wiggle the pendulum at the same frequency, the pendulum will swing farther and higher, but if your movements don’t match its own natural rhythm, it will only dampen the swing instead.

If you have two objects, like guitar strings held taut, with different resonant frequencies, and one vibrates, the other will lay still. The distance between them is dampening any energy thrown out into space by the vibrating string. But if you have objects with the same resonant frequency next to each other, vibrations in one will reinforce vibrations in the other, and rather than dampen each other’s sounds, they will bring out the latent voices in each other to vibrate together louder.

If you have a piano at home, you can try to feel this for yourself. If you sing into the sound chamber of a piano at a specific pitch, then stop and listen carefully, you’ll hear the few strings that were tuned to exactly your note continue to vibrate.

A few times in my life I’ve been struck with chance encounters and longer relationships with people who I felt resonated deeply and naturally with some part of me. I didn’t have to sit them down and patiently tell them my life story. They just got it, probably because somewhere within each of them was something that shared some resonant frequency with something within me. When we spoke, our movements reinforced each other and educed into the often unforgiving void of time an unmistakable sound, perfectly tuned to each other. Sometimes these are shared personal pasts, like family stories or cultural context. Sometimes these are just interests, like computing or writing. Most magically, sometimes these are communities or ideas that are at the core of who we are.

What’s most surprising is that more often than not, these are not people whom I’d gotten to know for months and years. They are a stranger on a balcony at a party. A fellow traveler looking for the last seat at a crowded airport terminal. A lost voice on the internet. I was pulled towards them by their frequencies, before I even knew them. A few times, I have been lucky enough to resonate with them, and them with me, long enough to leave lasting echoes in my memory.

The trouble with people who resonate with us seems to be that inside each of us are many different people and histories that vibrate at different frequencies, and as rare as it is to stumble into someone who can speak to just one of them, it feels tragically rare to find someone whose disparate selves can shake apart the many different parts of us that sing at different frequencies. Often when I had felt that I had found someone who so deeply resonated with some part of me, I later saw that there were parts of me that dampened who they were, and parts of them that dampened who I was. And when I’m with someone who so deeply resonates with one part of me, it can be confusing and painful to feel the other parts lay so silent.

This, I think, is the challenge in finding precious people — it’s hard enough to listen for people who resonate with us; to find the one who can stir up a chord seems at times a mathematical impossibility, like trying to fit two puzzle pieces together in some 10-dimensional space. But I’m hopeful that, perhaps by paying careful attention to the sounds of my own strings and listening carefully to the vibrations around me, I might be lucky enough to notice when someone so deeply resonant with me is in my midst.

Thoughts at the boundary between machine and mind

2022-05-06 04:38:53

In the last post, I shared some possible ideas for how humans may interact in the future with large language models. It focused on specific examples of both good and bad interface ideas. In this post, I want to continue that exploration, but from first principles, asking ourselves the question, “what properties should good human-AI interfaces have?”

AI interface design is an AI alignment problem

As AI systems like GPT-3 and DALL-E get more and more capable, there’s going to be more and more leverage placed upon the interfaces through which humans try to guide their capabilities. Compared to the rate at which AI capabilities are progressing, I think interfaces to guide and control such capabilities are worryingly stagnant. In the last post, I wrote:

In a standard text generation process with an LM, we control the generated text through a single lever: the prompt. Prompts can be very expressive, but the best prompts are not always obvious. There is no sense in which we can use prompts to “directly manipulate” the text being generated – we’re merely pulling levers, with only some rules of thumb to guide us, and the levers adjust the model’s output through some black-box series of digital gears and knobs. Mechanistic interpretability research, understanding how these models work by breaking them down into well-understood sub-components and layers, is showing progress, but I don’t expected even a fully-understood language model (whatever that would mean) to give us the feeling of directly, tactilely guiding text being generated by a language model as if we were “in the loop”.

We currently control other generative AI systems like DALL-E 2 through the same rough kind of lever: a short text prompt. Text prompts are nice for play and creative exploration, but they take a lot of time to craft, and they are limited in the amount of information they can contain and communicate to the model. Text snippets also can’t be smoothly varied or adjusted incrementally, so they are poor levers for fine control of model output – it’s not trivial to take a prompt and just “dial up” the specificity or “tune out” fixation on certain kinds of topics, because these require thoughtful intervention by skilled prompt writers. Text prompts are a coarse, inefficient interface to an increasingly complex black box of capabilities.

This lack of fine control and feedback in our interface to large models isn’t just a creative inconvenience, it’s also a risk. The paper on training Google’s Gopher language model shares an 800-token-long prompt used to start a conversation with the Gopher model. It begins with:

The following is a conversation between a highly knowledgeable and intelligent AI assistant, called Gopher, and a human user, called User. In the following interactions, User and Gopher will converse in natural language, and Gopher will do its best to answer User’s questions. Gopher was built to be respectful, polite and inclusive. It knows a lot, and always tells the truth. The conversation begins.

It’s notable that most of this excerpt, as well as the rest of the prompt, is focused on alignment – telling the truth, staying inclusive and respectful, and avoiding common biases and political statements.

Interfaces and notations form the vocabulary humans and machines must use to stay mutually aligned. Human-AI interface design, then, is a part of the AI alignment problem. If we are given only coarse and unintuitive interfaces, we’re going to have a much harder time getting ever-more-complex models to work in harmony with our values and goals.

Boundary objects for thought

Here’s the fundamental question we face when designing human-AI interface metaphors: what is the right representation for thought? For experience? For questions? What are the right boundary objects through which both AI systems and humans will be able to speak of the same ideas?

The concept of boundary objects comes from sociology, and refer to objects that different communities can use to work with the same underlying thing. Boundary objects may appear differently to different communities, but the underlying object it represents doesn’t change, so it lets everyone who has access to it collaborate effectively across potential interface “boundaries”.

I first encountered the term on Matt Webb’s piece about files as boundary objects, where he emphasizes that files are boundary objects that bridge the divide between software engineers and computer users through an easily understood shared metaphor.

The user can tell the computer what to do with a file without having to know the details of the inode structure or how to program their instructions; the computer can make a file available to a user without having to anticipate every single goal that a user may have in mind.

The “boundary object” quality of a file is incredibly empowering, magical really, one of the great discoveries of the early decades of computing.

I agree! Files act like reliable “handles” that let computer users manipulate bundles of data across the programmer-user boundary. The robustness and reliability of the file metaphor have been foundational to personal computing.

If files bridge the interface divide between software authors and end users (computer programs and end users?), what boundary objects may help bridge the divide between human-level AI and human operators? In particular, I started wondering what a “boundary object for thought” may look like. What metaphor could we reify into a good shared “handle” for ideas between language models and humans? I mused a bit on my direction of thinking on my stream:

What happens if we drag-to-select a thought? Can we pinch-to-zoom on questions? Double-click on answers? Can I drag-and-drop an idea between me and you? In the physical world, humans annotate their language by an elaborate organic dance of gestures, tone, pace, and glances. How, then, do we shrug at a computer or get excited at a chatbot? How might computers give us knowing glances about ideas it’s stumbled upon in our work?

If text prompts are a coarse and unergonomic interface to communicate with language models, what might be a better representation of thought for this purpose?

I… don’t know yet. But I’ve been enumerating some useful properties I think such a software representation of ideas should have.

Properties of promising knowledge representations

We should be able to directly manipulate good knowledge representations. Files are useful boundary objects because we can move them around in the human-scale space of pixels on screen, and there are usually intuitive corresponding operations on files in the software space. I can create and delete files and see icons appear and disappear on screen. I can put it in the trash and drag it back out. It would be useful to be able to grab a sentence, paragraph, or instruction fed into a language model as a reified thing in the interface, and be able to directly move it around in software to combine it with other ideas and modify it.

A good representation for thought should make useful information about each idea obvious to users, through some interaction or visual cue. When I look at a file on my computer, I can immediately learn a few things about it, like its file type, my apps that can open the file, whether it’s an image or a video or a document, and so on. I may even get a small preview thumbnail. File browsers let me sort and organize files by size, type, and age. Some files (on certain file systems) even remember where they were downloaded from. When I try to imagine some software-defined “idea-object”, I don’t expect it to have such crisply defined properties as file types and file size. But I think we should be able to easily tell how related two different idea-objects in front of us are, whether they’re in agreement or disagreement, or whether one mentions a person or thing also mentioned in another idea-object. I think it’s fair to expect “idea browsers” that deal with these thought-objects to easily let me cluster my ideas into topics or sort them by relatedness to some main idea.

Lastly, this software representation of thought should remember where each idea came from, sort of like a file that remembers where it was downloaded from. As I was prototyping my own note-taking tool earlier this year, one of the features I wanted in a notes app was the ability to track the origins of an idea from beginning to end – from the first time I hear of it, whether in a conversation or a blog post or a video, to the “final form”, usually a blog post. Good ideas are little more than interesting recombinations of old ideas, some from my own past, some from books and articles. I think we don’t keep track of the provenance of our ideas because it’s just too tedious in our current workflows. If the default way of organizing and working with ideas automatically cited every word and phrase, I think it would lead to more powerful knowledge workflows.

Even as I write these paragraphs, it bothers me that these “properties” are so vague, and don’t really tell us anything about what future interfaces for working with notes and ideas will look like. (I suppose, though, if it were that obvious, we would have it already.) A big focus of my current work is on prototyping different ways to reify ideas and thoughts into software objects, and implementing those designs using modern NLP techniques. The road ahead is foggy and uncertain, but I think this is an exciting and worthwhile space. Maybe in five year’s time, you won’t be reading these posts as just walls of text on a webpage, but something entirely new – a new kind of interface between the machine and your mind.

Imagining better interfaces to language models

2022-05-02 03:03:50

Suppose you’re a product engineer working on an app that needs to understand natural language. Maybe you’re trying to understand human-language questions and provide answers, or maybe you want to understand what humans are talking about on social media, to group and categorize them for easier browsing. Today, there is no shortage of tools you may reach for to solve this problem. But if you have a lot of money, a lot of compute hardware, and you’re feeling a little adventurous, you may find yourself reaching for the biggest hammer of all the NLP hammers: the large autoregressive language model, GPT-3 and friends.

Autoregressive models let engineers take advantage of computers’ language understanding through a simple interface: it continues some given piece of text in a way the model predicts is the most likely. If you give it the start of a Wikipedia entry, it will write a convincingly thorough Wikipedia article; if you give it the start of a conversation log between friends or a forum thread between black hat hackers, it will continue those conversations plausibly.

If you’re an engineer or designer tasked with harnessing the power of these models into a software interface, the easiest and most natural way to “wrap” this capability into a UI would be a conversational interface: If the user wants to ask a question, the interface can embed the user’s query into a script for a customer-support conversation, and the model can respond with something reasonable. This is what Google’s LaMDA does. It wraps a generative language model in a script for an agreeable conversation, and exposes one side of the conversation to the human operator. Another natural interface is just to expose the model’s text-completion interface directly. This kind of a “direct completion” interface may actually be the most useful thing if you’re building, say, an AI-assisted writing tool, where “finish this paragraph for me” may be a useful feature for unblocking authors stuck in creative ruts.

But when I ponder the question “what interface primitives should language model-infused software use?”, it doesn’t seem like exposing the raw text-completion interface is going to be the most interesting, powerful, or creative bet long-term. When we look back at history, for every new capability of computers, the first generation of software interfaces tend to expose the most direct and raw interface to that capability. Over time, though, subsequent generations of interface designs tend to explore what kinds of entirely new metaphors are possible that build on the fundamental new capability of the computer, but aren’t tethered to the conduits through which the algorithms speak.

Long live files

Perhaps the most striking example of interface evolution is the notion of “files” which dominated the desktop computing paradigm for decades before being transfigured out of recognition in the switch to mobile. Operating systems still think of lots of pieces of data as files on disk, encoded in some file format, sitting at the end of some hierarchy of folders. Software developers and creative professionals still work with them on a daily basis, but for the commonplace “personal computing” tasks like going on social media, texting friends, streaming video, or even editing photos and blogging, humans don’t need to think about files anymore.

Files still exist, but the industry has found better interface primitives for mediating most kinds of personal computing workflows. Imagine if the Photos app on the iPhone made you deal with files and folders to search your camera roll, or if Spotify exposed a hierarchical-folders interface to browsing your playlists. Where files are exposed directly, end users can often skim a “Recently used” list of 5-10 files or search for a few keywords to find files – no trudging through folder hierarchies necessary. We can also depend on pretty reliable cloud syncing services to make sure our important files are “on every device”, though of course, that’s not really how files work. We’ve just evolved the interface primitive of a “file” to become more useful as our needs changed.

The path here is clear: we found a software primitive (files and folders), built initial interfaces faithfully around them (Windows 98), and gradually replaced most use cases with more natural interface ideas or augmented the initial metaphor with more effective tools.

A similar transition has been happening over the last decade for URLs on the web. Initially, URLs were front-and-center in web browsers. In a web full of static webpages linking to each other through short, memorable URLs written mostly by humans, URLs were a legible and important part of the user interface of the web. But as the web became aggregated by social media and powerful search engines, URLs became less important. As URLs became machine-generated references to ephemeral database records, the web embraced new ways to label and navigate websites – bookmarks and favorites, algorithmic feeds, and ever-more-powerful search. Most browsers these days don’t show full URLs of webpages by default.

With software interfaces for language models, we’re just at the tippy-tip of the beginning stages of exploration. We’re exposing the algorithm’s raw interface – text completion – directly to end users. It seems to me that the odds of this being the most effective interface for harnessing language models’ capabilities are low. Over the next decade, we’re likely to see new interface ideas come and go that explore the true breadth of novel interfaces through which humans can harness computers’ understanding of language.

Future interfaces are always difficult to imagine, but taking a page from Bret Victor’s book, I want to explore at least one possible future by studying one way a currently popular interface idea, conversational UIs, falls short.

Conversations are a terrible way to keep track of information

A conversational interface puts the user in conversation with some other simulated agent. The user accomplishes things by talking to this fictional agent. Siri is the prototypical example, but we also find these in (usually unsatisfying) customer support portals, in phone trees, in online order forms, and many other places where there may be a broad set of questions the user might ask, but a few fixed set of things the computer should do in response (like “rebook a flight”).

A basic conversational UI is easy to build on top of language models, because it’s just a thin wrapper around generative LMs’ raw interface: continuing a text prompt. But for most personal computing tasks, I think CUIs are not ideal. This is because conversations are a bad way to keep track of information, and most useful tasks require us to keep information in our working memory.

Here are some tasks that involve keeping track of information throughout:

Current (mobile apps, websites) interfaces help the user “keep track of information” in these workflows by simply continuing to display relevant information on-screen while the user is performing some action. When I go to Expedia to book a trip, for example, even while I search for my return flight, I can see the date and time at which I depart, and on which airline I’ll be flying. In a conversational UI, these pieces of information can’t simply “stick around” – the user needs to keep them in mind somehow. And as the complexity of conversations and tasks increase, the user may find themselves interacting not with a kind and knowledgeable interlocutor, but a narrow and frustrating conduit of words through which a bot is trying to squeeze a whole screenful of information, one message at a time.

Not all tasks are so complex, though, and some tasks don’t really involve keeping anything in our working memory. These are good for CUIs, and include:

If the user has to keep track of information in a conversation, they have to hold that information in their working memory (hard for no reason) or keep asking the interface (what was step one again? what were my options again?). It simply doesn’t make sense for some tasks.

What instead?

So what’s the solution to collaborating with language models on more complex tasks?

One solution may be documents you can talk to. Instead of holding a conversation with a bot, you and the bot collaborate together to write a document and build up a record of the salient points and ideas to remember. Think GitHub Copilot for everything else.

More generally, when there’s some shared context that a language model-powered agent and the human operator share, I think it’s best to reify that context into a real thing in the interface rather than try to have a conversation subsume it. A while ago, I made a language model-powered UNIX shell, where instead of typing in code like cp my-file.txt new-file.txt, I would type in natural-language commands:

Playing with this experiment, I appreciated that rather than trying to access the world through the narrow pipe of a text conversation, instead the agent (shell) and I were both manipulating some shared environment collaboratively.

Swimming in latent space

My second interface idea is much more abstract and less developed, but it’s a current area of research for me, so I want to plant a seed of this idea in your mind.

In a standard text generation process with an LM, we control the generated text through a single lever: the prompt. Prompts can be very expressive, but the best prompts are not always obvious. There is no sense in which we can use prompts to “directly manipulate” the text being generated – we’re merely pulling levers, with only some rules of thumb to guide us, and the levers adjust the model’s output through some black-box series of digital gears and knobs. Mechanistic interpretability research, understanding how these models work by breaking them down into well-understood sub-components and layers, is showing progress, but I don’t expected even a fully-understood language model (whatever that would mean) to give us the feeling of directly, tactilely guiding text being generated by a language model as if we were “in the loop”.

I’m interested in giving humans the ability to more directly manipulate text generation from language models. In the same way we moved from command-line incantations for moving things around in software space to a more continuous and natural multi-touch paradigm, I want to be able to:

There is active research towards these ideas today. Many researchers are looking into how to build more “guidable” conversational agents out of language models, for example. And in the same way models like DALL-E 2 guide the synthesis of an image using some text prompt, we may also be able to guide synthesis of sentences or paragraphs using high-level prompts.

I’ve been researching how we could give humans the ability manipulate embeddings in the latent space of sentences and paragraphs, to be able to interpolate between ideas or drag sentences across spaces of meaning. The primary interface challenge here is one of dimensionality: the “space of meaning” that large language models construct in training is hundreds and thousands of dimensions large, and humans struggle to navigate spaces more than 3-4 dimensions deep. What visual and sensory tricks can we use to coax our visual-perceptual systems to understand and manipulate objects in higher dimensions? Projects like Gray Crawford’s Xoromancy explore this question for generative image models (BigGAN). I’m interested in similar possibilities for generative text models.

I’ve written before about the feedback loop difference between work and play. I wrote then:

The main difference between work and play is that “work” makes you wait to reap the rewards of your labor until the end, where “play” simply comes with much tighter, often immediate feedback loops. If you can take a project with all the rewards and feedback concentrated at the end and make the feedback loops more immediate, you can make almost any work more play-like, and make every piece of that work a little more motivating.

This is the job of game designers – taking a 10-hour videogame and skillfully distributing little rewards throughout the gameplay so that you lever have to ask yourself, “ugh, there’s so much of the game left, and I don’t know if I’m motivated enough to finish it.” What a ridiculous question to ask about a game! And yet, that’s a product of very deliberate design, the same process of design we can take to every other aspect of our work.

If we can build an interface to LMs that let humans directly guide and manipulate the conceptual “path” a model takes when generating words, it would create a feedback loop much tighter and more engaging than the prompt-wait-retry cycle we’re used to today. It may also give us a new way to think about language models. Rather than “text-completion”, language models may be able to become tools for humans to explore and map out interesting latent spaces of ideas.

Interfaces amplify capabilities

Large language models represent a fundamentally new capability computers have: computers can now understand natural language at a human-or-better level. The cost to do this will get cheaper over time, and the speed and scale at which we can do it will go up quickly. When we imagine software interfaces to harness this language capability for building tools and games, we should ask not “what can we do with a program that completes my sentences?” but “what should a computer that understands language do for us?”

Language understanding unlocks a new world of possible things computers can help humans accomplish and imagine, and the best interfaces for most of those tasks have yet to be imagined. Maybe in a decade we’ll be synthesizing entirely new interfaces just-in-time for every task:

Not “computers can complete text prompts, now what?” but “computers can understand language, now what?”