2024-09-11 21:35:25
I’ve joined Thrive Capital as an EIR and advisor, working with the Thrive team to support our founders in understanding and deploying AI thoughtfully, while furthering my own research and explorations around AI interpretability, knowledge representations, and interface design.
August was my last month as a part of Notion’s AI engineering team.
It’s been a privilege at Notion to get to work with a world-class team at the frontier of applied LLM products. Notion was one of the first companies to preview an LLM product, even before ChatGPT. I’m grateful to have been a part of the many ways the team has grown in the last almost-two years, across product launches, generational leaps in models, and many orders of magnitude of scale. We learned alongside the rest of the industry about prompt programming, retrieval, agents, evals, and how people are using AI day-to-day in their schools, companies, meetings, and life.
Notion’s “AI team” is really a team-of-teams spanning product, design, engineering, partnerships, finance, marketing, sales, and growth across the company. With all the talented Notinos that have joined since my first days, Notion’s AI team is poised to build and share a lot of beautiful, useful products in the next months, I have no doubt.
As Notion brings all the ideas we’ve explored (and more) to the world, I’ve been feeling an urge to (1) take a step back and understand how the broader world is adapting to this new technology, and (2) spend much more time on my research agendas around interpretability and interfaces.
After a brief break, I joined Thrive earlier this month.
As I’ve gotten to know the Thrive team over the last couple years, I’ve been consistently impressed with the thoughtfulness, depth of partnership with founders, and clarity of conviction behind investments across stages and industries. They’ve generously granted me an ambitious remit to pursue both of my goals as a part of the team, and based on my first week I’ve got a lot to be excited about on both fronts.
I hope to share what I learn and build along the way, as I always have.
2024-08-08 13:18:20
It’s not easy, but with a lifetime of dedication to the craft, many people become virtuoso guitar players. They learn the intricate nuances and details of the instrument and attain a level of mastery over it as if it were an extension of their mind. Some of the best instrumentalists even transcend traditional techniques of guitar performance and find their own ways of creating music with the instrument, like using it as a percussion voice. Alex Misko’s music is a beautiful example of virtuosity and novel techniques on the acoustic guitar.
No amount of such dedication can make you a virtuoso at Guitar Hero. Though also an instrument of sorts, Guitar Hero does not admit itself to virtuosity. At the cost of a lower barrier to entry, its ceiling of mastery and nuanced expression is capped. Its guardrails also prevent open-ended use. There is only one way to create music with Guitar Hero — exactly the way the authors of the video game intended.
I think great creative tools are more like the acoustic guitar than Guitar Hero. They:
Creative tools like Logic, Photoshop, or even the venerable paintbrush can be mastered. In these creative tools, artists can deftly close the gap between an image in their mind and the work they produce while worrying less about the constraints that the tool imposes on their expression. And where there are constraints, they are free to think beyond the tool’s designed purpose.
Both capacity for virtuosity and open-endedness contribute to an artist’s ability to use a medium to communicate what can’t be communicated in any other way. The converse is also true; if a creative instrument has a low ceiling for mastery and can only ever be used in one intended way, the operator can only use it to say what’s already been said.
Every established artistic practice and creative medium, whether acoustic instruments, digital illustrations, photography, or even programming, has standards of virtuoso-level mastery. The communities behind them all intimately know the huge gap that spans being able to just barely use it and being a master creative. Virtuosos attain their level of intimacy with their mediums from extensive experience and a substantial portfolio that often takes a lifetime to build up.
When new creative mediums appear, it’s never immediately obvious what virtuoso-level performance with that medium looks like. It takes time for virtuosity to take form, because several things have to happen concurrently to open up the gap between novices and virtuosos.
These changes happen in lockstep with each other: as tools improve, people must refine their sense for telling the bad from the good, and as the standard of mastery diverges from previous artistic medium’s standards, a new community of practice forms, slowly, over time. As a new community forms around the new medium, there is more space for its practitioners to develop their own sense of mastery and refine their toolset.
Electronics, software, and computing have all birthed their own communities of artistic practice through this process. I’m reminded of computational artists like Zach Lieberman. I have no doubt AI models will lead to another schism, another inception of a new legitimate creative community of practice with its own standard of virtuoso performance, cornucopia of tools, and unique set of values. AI models will become a creative medium as rich and culturally significant as animation and photography.
But we are clearly at the very beginning:
Popular culture still judges AI artwork by the standards of traditional digital art rather than a new set of values.
Creative tools today use AI to cheaply emulate or automate existing artistic techniques rather than embracing AI as its own kind of creative material. AI, like every other creative medium (photography, animated film, electronic music) has its own texture that I think practitioners will come to embrace. Artists like Helena Sarin have long demonstrated it. Today’s tools are built for the old world rather than the new.
Today’s tools for AI art also tend to have an extremely low ceiling for mastery. Many commercial tools meant to be accessible, like DALL-E, are so basic and guardrailed in its interface that it’s difficult to imagine any person become a virtuoso (there’s a ceiling to how complex prompts can be), let alone find a novel way to use the tool in a way its creator didn’t expect. This is why many artists I know are gravitating toward more complex, open-ended tools like ComfyUI, and perhaps soon Flora. Tools built first on prompting, like Midjourney, have also expanded their feature set over time to allow for other more nuanced forms of expression.
The AI creative tools of today are still mostly designed as augmentations for existing communities of practice like digital illustrators, photographers, or animators. There are pockets of nascent scenes emerging specifically around using AI as a creative medium, but in my view, we are barely on day one.
Over time, I think we will see creative tools built natively around AI separate itself from tools for augmenting existing mediums in applications like Photoshop. We’ll witness virtuoso levels of performance for expressing new ideas through this new medium, as difficult as it is for us to imagine now what such mastery might look like. We’ll see artists use neural networks and data in ways they were never meant to be used. Through it all, our capacity for creation can only expand.
I feel lucky to be present for the birth of a new medium.
Thanks to Weber Wong and Avery Klemmer for helpful discussions that sparked many ideas in this post.
2024-08-03 14:29:24
When I discuss interfaces on this blog, I’m most often referring to software interfaces: intermediating mechanisms from our human intentions to computers and the knowledge within them. But the concept of a human interface extends far before it and beyond it. I’ve been trying to build myself a coherent mental framework for how to think about human interfaces to knowledge and tools in general, even beyond computers.
This is the second of a pair of pieces on this topic. The other is Instrumental interfaces, engaged interfaces.
What makes a user interface good?
What are the qualities we want in something that mediates our relationship to our knowledge and tools?
When I visited Berlin earlier this year for a small conference on AI and interfaces, I spent my last free night in the city wandering and pondering whether there could be a general answer to this expansive question. My focus — both then and now — is on engaged interfaces, interfaces people use to deeply understand or explore some creative medium or knowledge domain, rather than to complete a specific well-defined task. (In this post, when I write interface, I specific mean this type.) The question of what makes an interface compelling is particularly interesting for this type because, as I noted in the other post in this series, inventing good primitives for engaged interfaces demands broad, open-ended exploration. I was hopeful that foundational principles could guide our exploration process and make our search more efficient.
I returned home from that trip with a hazy sense of those principles, which have since become more crisp through many conversations, research, and experiments.
What makes a good human interface?
A good engaged interface lets us do two things. It lets us
see information clearly from the right perspectives, and
express our intent as naturally and precisely as we desire.
To see and to express. This is what all great engaged interfaces — creative and exploratory tools — are about.
A good engaged interface makes visible what is latent. In that way, they are like great maps. Good interfaces and maps enable us to more effectively explore some domain of information by visualizing and letting us see the right slices of a more complex, underlying reality.
Data visualizations and notations, the backbone of many kinds of graphical interfaces, are maps for seeing better. Primitives like charts, canvases, (reverse-)chronological timelines, calendars, are all based on taking some meaningful dimension of information, like time or importance, and mapping it onto some space.
If we take some liberties with the definition of a data visualization, we can consider interface patterns like the “timeline” in an audio or video editing app. In fact, the more capable a video editing tool, the greater variety of maps that tool offers users, enabling them to see different dimensions of the underlying project. An experienced video editor doesn’t just work with video clips on a timeline, but also has a “scope” for visualizing the distribution of color in a frame, color histograms and curves for higher-level tuning, audio waveforms, and even complex filtered and categorized views for navigating their vast library of source footage. These are all maps for seeing information clearly from diverse perspectives.
Straying even further, a table of contents is also a kind of data visualization, a map of a longer document that helps the reader see its structure at a glance. A zoomed-out thumbnail grid of a long paged document is yet another map in disguise, where the reader can see a different more scannable perspective on the underlying information.
Even when there isn’t an explicit construction of space in the interface, there is often a hidden metaphor gesturing at one. When we open a folder in a file browser, for example, we imagine hierarchies of folders above and below to which we can navigate. In a web browser, we imagine pages of history coming before and after the current page. When editing a document, the undo/redo “stack” gestures at a hidden chronological list of edits. Sometimes, these hidden metaphors are worth reifying into concrete visuals, like a list of changes in a file history view or a file tree in the sidebar of a code editor. But over time these inherently cartographic metaphors get collapsed into our imagination as we become more adept at seeing them in our minds.
Once we’ve seen what is in front of us, we need to act on that understanding. Often that comes in the form of manipulating the thing being visualized — the thing we see in the interface. A good engaged interface also helps us here by transparently translating natural human interactions into precise intents in the domain of the tool.
Simple applications accomplish this by letting the user directly manipulate the element of interest. Consider the way map applications allow the user to explore places by dragging and zooming with natural gestures, or how the modern WIMP desktop interface lets users directly arrange windows that logically correspond to applications. When possible, directly manipulating the underlying information or objects of concern, the domain objects, minimizes cognitive load and learning curve.
Sometimes, tools can give users much more capability by inventing a new abstraction. Such an abstraction represents latent aspects of a domain object that couldn’t be individually manipulated before. In one type of implementation, a new abstraction shows individual attributes of some underlying object that can now be manipulated independently. We often see this in creative applications like Photoshop, Figma, or drag-and-drop website builders, where a sidebar or attribute panel shows independent attributes of a selected object. By interacting directly a color picker, font selector, or layout menus in the panel — the surrogate objects — the user indirectly manipulates the actual object of concern. To make this kind of interaction more powerful many of these tools also have a sophisticated notion of selection. “Layers” in image editing apps are a new abstraction that makes both selection and indirect attribute manipulation more useful.
A second type of surrogate object is focused not on showing individual attributes, but on revealing intermediate states that otherwise wouldn’t have been amenable to direct manipulation, because they weren’t concrete. Spreadsheet applications are full of UI abstractions that make intermediate states of calculation concrete. A typical spreadsheet will contain many cells that store some intermediate result, not to mention the concept of a formula itself, which is all about making the computation itself directly editable. Version control systems take the previously inaccessible object of past versions of a document or the concept of a single change — a “diff” — and allow the user to directly manipulate them to undo or reorder edits.
All of the interfaces I mention above are examples of direct manipulation, a term dating back at least to 1983 for interfaces that:
This kind of an interface lets us re-use our intuition for physical objects, movement, and space to see and express ideas in more abstract domains. An underrated benefit of direct manipulation is that it enables low-friction iteration and exploration of an idea space. Indeed, I think it’s fair to say that direct manipulation is itself merely a means to achieve this more fundamental goal: let the user easily iterate and explore possibilities, which leads to better decisions.
In the forty years since, direct manipulation has eaten away at nearly every corner of the landscape of knowledge tools. But despite its ubiquity, the most interesting and important part of creative knowledge work — the understanding, coming up with ideas, and exploring options part — still mostly takes place in our minds, with paper and screens serving as scratchpads and memory more than true thinking aids. There are very few direct manipulation interfaces to ideas and thoughts themselves, except in specific constrained domains like programming, finance, and statistics where mathematical statements can be neatly reified into UI elements.
Of course, we have information tools that use direct manipulation principles, like graphical word processors and mind mapping software. But even when using these tools, a user has to read and interpret information on screen, transform and manipulate them in the mind, and then relay their conclusions back into the computer. The intermediate states of thinking are completely latent. In the best thinking tools today, we still can’t play with thoughts, only words.
We are in the pre-direct manipulation, program-by-command-line age of thinking tools, where we cannot touch and shape our thoughts like clay, where our tools let us see and manipulate words on a page, but not the concepts and ideas behind them.
This realization underlies all of my technical research and interface explorations, though I’m certainly not early nor unique in pursuing this vision. To me, solving this problem means freeing our most nuanced and ineffable ideas from our individual heads. It would give us a way to translate those thoughts into something we can hold in our hands and manipulate in the same way we break down an algebra problem with pencil and paper or graphs on a grid.
What could we accomplish if, instead of learning to hold the ever more complex problems in our world within our minds, we could break down and collaborate on them with tools that let us see them in front of us in full fidelity and bring our full senses and dexterity to bear on understanding and exploring the possibilities?
2024-08-01 14:26:32
When I discuss interfaces on this blog, I’m most often referring to software interfaces: intermediating mechanisms from our human intentions to computers and the knowledge within them. But the concept of a human interface extends far before it and beyond it. I’ve been trying to build myself a coherent mental framework for how to think about human interfaces to knowledge and tools in general, even beyond computers.
This is the first of a pair of pieces on this topic. The other is What makes a good human interface?.
Maps are my favorite kind of interface, so I want to begin with a brief story about a map I use every day.
New York, where I live, is a fast-moving river. Friends and neighbors move in just as quickly as they move out. In my three years living in the city, most of my friends and acquaintances have moved apartments every year. Too many to count have also moved in, and then away again, to some other city.
In these circles, one of my sources of childlike pride is the Manhattan subway map and schedule that’s now as clear in my memory as in the posters on station walls. I know where which trains run, at what times of the day, and which stops different trains skip during rush hour. Sometimes, when luck cooperates, I can beat transit apps’ time estimates with a clever series of transfers and brisk walks.
Obviously, I didn’t start this way.
When I first moved here, I was glued to Google Maps, following its directions and timestamps religiously. I relied on turn-by-turn directions to get around, but I also checked the iconic New York subway maps to see how many stations were left or if I was passing any landmarks or neighborhoods I liked. Over time, I learned to navigate my routes from the hazy map taking shape in my head, and now I can find the shortest path between any location in Manhattan below 100th St from memory, any time of day. (Brooklyn and Queens, I’m still working on…)
These two kinds of navigation aids — turn-by-turn directions and the subway map — were valuable to me in different ways. Though both maps of New York, I relied on the directions to reach a specific goal, namely getting to my destinations on time. The maps on the train, though, were more multipurpose. Sometimes I was looking for landmarks, other times simply getting oriented, and all along, I was also learning local geography by engaging with the map in a deeper way than the directions on my phone.
These two different uses of a map represent two different kinds of interfaces, one more focused on a specific goal, and the other more about the process of engaging with the interface.
On second thought, most interfaces have elements of both. So perhaps it’s better to say:
A human interface serves two different kinds of uses:
Instrumental use. An instrumental user is goal-oriented. The user simply wants to get some good-enough solution to a problem they have, and couldn’t care less how it’s done.
Here’s a good litmus test to find out whether an interface is instrumental: If the user could press a magic button and have their task at hand completed instantly to their requirements, would they want that? If so, you are likely looking at an instrumental interface.
A turn-by-turn nav, a food delivery app, and a job application form are all interfaces that are used almost exclusively in an instrumental way. Let’s call these instrumental interfaces.
Engaged use. Engaged users want to be intimately involved in the mechanics of an interface. They’re using the interface not just to check off a to-do item, but because they get some intrinsic value out of interacting with the interface, or they can only get what they want by deeply engaging with the interface’s domain of information.
A musical instrument, a board game, and a flash card set are all engaged interfaces, because they’re used almost exclusively for the intrinsic value of spending time using them. The user wants to feel the joy of performing music, not just listen to a track on a computer. They want to enjoy playing a board game, not just be handed a victory or loss. They want to learn information they’ve written in a flash card by repeatedly engaging with it, not simply read a textbook for facts they may forget.
Many interfaces are sometimes instrumental and sometimes engaged. Consider:
Instrumental users have very different requirements, expectations, and goals from engaged users of an interface, and understanding the blend that applies to your particular context is a prerequisite to designing a good interface.
As I noted earlier, the ideal instrumental interface for any task or problem is a magic button that can (1) read the user’s mind perfectly to understand the desired task, and (2) perform it instantly and completely to the desired specifications.
In absence of such a perfect button, you, the designer, must conceive of the closest possible approximation you can manage within the limits of technology. In a sense, building an instrumental tool is very straightforward: you can work with your users to find out as much as you can about their intent when using the tool, and then engineer a solution that accomplishes that goal in the cheapest, fastest, most reliable way possible. The interesting details are in the necessary tradeoffs between how well you understand the user’s intent and how cheaply, quickly, and reliably you can deliver the result.
An engaged interface has no such top-line metric to optimize. Each kind of engaged interface has a different way it can be improved. A video game, for example, can sometimes be better by being more realistic and easier to learn. But this isn’t always true. Sometimes, the fun of a game comes from the challenge of learning its mechanics, or strange, surrealist laws of physics in the game world. A digital illustration tool is usually better off giving users more precise controls, but there are creative tools that lead artists to discover surprising results by adding uncertainty or elements of surprise.
In absence of a straightforward goal, to build a good engaged interface requires exploration and play. To discover the ideas that make good maps, data visualizations, video games, musical instruments, and social experiences, we need to try new ideas and see people experience them firsthand. This is a stranger environment in which to do design work, but I find the surprising nature of this process motivating and rewarding.
As a designer and engineer, I used to have a kind of moral aversion to instrumental tools and interfaces. I was drawn to creative, deeply engaging tools that I felt were most meaningful to my personal life, and viewed open-endedness as a kind of virtue unto itself.
I don’t think this way anymore.
These days, I think both instrumental and engaged interfaces are worth working on, and bring value and meaning to their users.
I do believe that the culture of modern life makes the benefits of instrumental interfaces much more legible than engaged ones: marketing tactics tout how fast and affordable things are. They talk about discounts and deals and out-compete the market based on easily quantifiable factors. Especially in business products, product makers view their customers as cold, calculating agents of reason that only pay for hard numbers. But the reality is more nuanced, and even the coldest corporate organizations are made of people. Look at the dominance of supposedly business tools like Notion or Slack. Those tools won not purely because it made employees more efficient workers, though these companies will lead with that argument. These tools won because they are beautiful and fun to use. In a tool that consumes hours of people’s days every week, beauty, taste, and fun matter, too.
Following any transformative leap in technology, it takes some time for popular design practice to catch up. This is especially the case for design practice of engaged interfaces, because unlike instrumental interfaces, where the goal is always straightforward and the leverage is in the enabling technology, better engaged interfaces often come from surprising new ideas that can only be discovered through a more open ended design exploration process.
There is always a delay between technological leaps and design explorations bearing fruit. I believe we’re going through just such a period right now. Most current work in “AI UI” is concerned about fulfilling the promise of faster, better, cheaper workflows with language models, used “out of the box” in conversational settings. This is because the implementation possibility is more obvious, and the goals are clear from the start. But there is still a second shoe to drop: interfaces that lean on foundation models somehow to enable humans to search, explore, understand, and engage with media deeper using completely new interaction mechanics we haven’t discovered yet. What direct manipulation is to the graphical user interface, we have yet to uncover for this new way to work with information.
2024-07-30 11:52:45
I have a few friends who are in the midst of navigating very hazy idea spaces, guided by strong intuition and taste, but early enough in the process that very little is visible through the fog. Whenever I’m in this situation I feel a strange internal conflict. One side of me feels conviction in a direction of exploration, while the other part of me feels the risk of potential dead ends, afraid to go out on a limb and say I should put my full effort into pursuing where my hunch leads.
In fact, I’m navigating a version of this conflict right now:
Last week, I was catching up with one of my friends going through something similar, and ended up describing to them a mental model of exploration that I’ve developed in the hopes that it was helpful. It’s a framework I think my past self would have found meaningful, so I’m also sharing it with you, dear reader.
Let’s begin by understanding that exploration is far from the only way to do meaningful creative work in the world. Following popular rhetoric, it was easy for my past self to fall victim to the idea that somehow exploratory, open-ended, research-y work was more virtuous and interesting than the work of incremental improvements. On the contrary, I now think it’s quite likely that most valuable things were brought into this world by a million little incremental improvements, rather than by a lightning strike of a discovery. Incremental work can be just as rewarding as exploration. When you work in a known domain:
But you’re not here to work on optimizing what’s known. You’re here because something about the process of exploration draws you in. Maybe it’s the breadth of ideas you encounter in exploration. Perhaps it’s about the surprising things you learn on the way. But you’ve decided you’re more interested in navigating and mapping out unknown mazes of ideas than building cathedrals on well-paved streets.
Here’s what I would tell my past self:
If you’re on this path of exploration, there’s probably some voice inside of you that’s worth listening to. This is your taste, your gut feeling, whatever you want to call it. It’s the part of your intuition that tells you, “this might not look that great right now, but on the other side of this uncertain maze of ideas is something worth the trouble.” It’s worth listening to your intuition because this is what’ll set your perspective apart from everyone else who’s also looking around for problems to solve. In the beginning, your intuition is all that you have as you begin your exploration.
This voice — your intuition — is strongest at the outset. At the start of your exploration, when you have very few hard facts about what will work out in the end, you can lean on your inner voice, and the intrinsic excitement you feel about the potential of your idea, to keep you going. Nearly all of your motivation comes from your intuition at the starting line.
But excitement from your intuition is volatile. As time passes, and as you run experiments that don’t succeed, that innate excitement you felt at the beginning will begin to dwindle. To keep your momentum, you need to replenish your motivation over time with evidence that backs your intuition and your vision.
There are many short-term faux-remedies to faltering intrinsic excitement — raising money, external validation, encouragements from friends and family — but all of these are like borrowing your motivation on debt. Eventually, they’ll run out, and you’ll be left with a bigger gap you need to fill with real-world evidence of the correctness of your vision. The only way to sustain your motivation long term is to rapidly replenish your excitement and motivation with evidence that your vision is correct.
Evidence comes from testing clear, falsifiable claims about your vision against the real world.
For my personal explorations, here’s a claim that I can use to find evidence to support my vision:
By using interfaces that expose the inner workings of machine learning models, experts in technical and creative jobs can uncover insight that was otherwise inaccessible to them.
This claim can still be more precise, but the bones are there. I can polish this into a testable statement about the real world, build prototypes, put it in the hands of experts, and collect evidence supporting my vision that interpretable machine learning models can be a foundation for transformative new information interfaces.
Very frequently, your initial claim will turn out to be incorrect. This doesn’t mean your vision is doomed. On the contrary, this often gives you a good opportunity to understand where your intuition and the real world diverged, and come up with a more precise statement of what initially felt right in your gut. While collecting evidence that supports your vision motivates you towards your goal, finding evidence that contradicts your claim can bring much-needed clarity to your vision.
Once you’ve collected enough evidence to support your initial vision, you’ll often see that the answers you found through your evidence-gathering experiments gave rise to many more new bets you’re interested in taking. That same taste for good ideas you had in the beginning, now strengthened by repeated contact with reality, sees new opportunities for exploration.
So the cycle repeats. On the shoulders of your original vision, which is now a reality after all the evidence you’ve collected, you can make a new bet, starting with a renewed jump in motivation. You can collect more evidence against this new vision, and on and on.
All the skills involved in this cycle — clearly stating a vision, tastefully choosing an exploration direction, and collecting useful evidence from reality — will get sharper over time as you repeat this process. Every time you make a new bet on a new idea, you’ll be able to take bigger swings.
If you look at this chart a little differently, this story isn’t about making many disparate bets in a sequence. It’s about bringing the world towards your ever-more-refined statement of your vision. Every piece of evidence you collect to support your vision will be another stair step in this upward motion, and over time, the world behind your experimental evidence will come to resemble the world you originally envisioned.
In this process, you’ll periodically have to find new ideas that give you motivation to take on those new bets. These are what someone else may call “leaps of faith”, but I don’t like that expression. A “leap” makes it sound as if the motivation is coming from nowhere, and you should blindly jump into a new idea without structure.
But there is structure. The “leap” is guided by your intuition and aimed at your vision, and as your internal motivation runs out, you’ll collect evidence from reality to close that gap.
Taking a step back, I think there’s another helpful perspective we can find. Chasing a vision is about a dance between two forces: the stories you believe and tell about a world you envision, and the world you build on top of reality to follow that story.
Your stories draw on your taste, your experience, and your intuition. These stories need to bottle up the special things that you’ve seen in your life that others aren’t seeing, and compel yourself and others to work through the process of turning that motivation into evidence.
Your evidence, in turn, needs to continually close the gap between reality and your storytelling. Over time, as your stories and your reality evolve around each other, you’ll bridge the distance between your vision and the real world.
This is all well and good as a way of thinking about exploration at a distance, but when I’m buried in the thick of confusing experiments and dwindling motivation, it’s difficult to know exactly what I need to do when I wake up in the morning.
During those times, I focus on two things:
I find my life and work most invigorating when I can work next to people on exploratory paths. There is no special virtue in novelty or risk-taking, but exploratory work often leads to surprising new facts about how the world works, and if those surprises are well leveraged, exploration can yield transformative progress. At a personal level, I also find myself having the most fun when I get the fortune of working next to people doing exploratory work.
If you find yourself in such a path, as I do, I hope this way of thinking about your path ahead helps take some of the burden off your shoulders, and allows you to shape the world to your ever sharper vision for what ought to be.
Thanks to my friends at UC Berkeley for discussions that ultimately sparked this blog post. You know who you are.
2024-07-29 03:49:35
This post is a read-through of a talk I gave at a demo night at South Park Commons in July 2024.
I’ve spent my career investigating how computers could help us not just store the outputs of our thinking, but actively aid in our thinking process. Recently, this has involved building on top of advancements in machine learning.
In Imagining better interfaces to language models, I compared text-based, conversational interaction paradigms powered by language models to command-line interfaces, and proposed a new direction of research that would be akin to a GUI for language models, wherein we could directly see and manipulate what language models were “thinking”.
In Prism, I adapted some recent breakthroughs in model interpretability research to show how we can “see” what a model sees in a piece of input. I then used interpretable components of generative models to decompose various forms of media into their constituent concepts, and edit these documents and media in semantic latent space by manipulating the concepts that underlie them.
Most recently, in Synthesizer for thought, I began exploring the rich space of interface possibilities opened up by techniques like feature decomposition and steering outlined in the Prism project. These techniques allow us to understand and create media in entirely new ways at semantic and stylistic levels of abstraction.
These explorations were based on the premise that foundational understanding of pieces of media as mathematical objects opens up new kinds of science and creative tools based on our ability to study and imagine these forms of media at a deeper more complete level:
Interpretable language models can give us a similar foundational understanding of ideas and thoughts as mathematical objects. With Prism, I demonstrated how by treating them as mathematical objects, we can similarly break down and recompose sentences and ideas.
Today, I want to share with you some of my early design explorations for what future creative thinking tools may look like, based on these techniques and ideas. This particular exploration involves what I’ve labelled a computational notebook for ideas. This computational notebook is designed based on one cornerstone principle: documents aren’t a collection of words, but a collection of concepts.
Intro – Here, I’ve been thinking about a problem very dear to my heart. I want to explore creating a community of researchers interested in the future of computing, AI, and interface explorations. I simply write that thought in my notebook.
Documents – When we think within our minds, a thought conjures up many threads of related ideas. In this notebook, writing down a thought brings up a gallery of related pieces of media, from images to PDFs to simple notes. These documents may be fetched from my personal library of collected documents, or from the wider Web.
But unlike many other tools with similar functionality, this notebook treats documents as a set of ideas. So in addition to what media are similar, we can see which concepts they share. As we hover over different documents, we can see what part of our input recalled that document.
Concepts – We’ve pivoted around our idea space anchored on documents. We can instead pivot around concepts. To do this, we open our concepts sidebar to see a full list of features, or concepts, our input contains. This view is like a brain scan of a model as it reads our input, or a DNA reading of our thought.
As we hover over different concepts in our input, we can see which pieces of media in our library share that particular concept.
Composition – A key part of thinking is inventing new ideas by combining existing ones. In this story, I’m interested in both large industrial research bets and AI. By selecting these concepts at once, we can see which pieces of media express this new, higher-level concept.
Heatmap – We can get an even more detailed view of the relationships between these individual concept components using a heatmap. In this view, we assign different colors to disparate concepts and see the way they co-occur in our media library. Through this view, we can not only discover documents that contain these concepts together, but also find deeper relationships like, perhaps, that many papers on AI and automation later discuss the idea of industrial shifts.
Abstraction – If we find a composition of ideas we like, such as for example these three concepts about large-scale creative collaboration, we can select and group them into a new, custom concept, which I’ll call collective creative project ideas.
This workflow, in which detailed exploration of existing knowledge leads us to come up with a new idea, is a key part of the creative and scientific process that current AI-based tools can’t capture very well, because existing tools rarely let you see a body of knowledge through the lens of key ideas.
Visualization – Now that we have a new lens in the form of this custom concept, we can explore our entire media library through this lens using data graphics. We can see the evolution and prominence of our new concept over time, for example.
Surprise – We can compare the relationship and co-evolution of this idea with another one in our dataset, perhaps this concept related to interviews with creative leaders like film directors. This relationship is something I personally discovered in my real concept dataset while preparing this talk, and was a surprising relationship between ideas I didn’t expect.
The ability for knowledge tools to surprise us with new insight is fundamental to the process of discovery, and too often ignored in AI-based knowledge tools, which largely rely on humans asking precise, well-formed questions about data.
Based on this clue, we may choose to examine this relationship more deeply, with a scatter plot. Notice that by treating documents as collections of ideas rather than words, we can benefit from the well-established field of data graphics to study an entirely new universe of documents and unstructured ideas.
Conclusion – So, there we have it. A tool that lets us discover new concepts and relationships between ideas, and use them to see our knowledge and our world through a new lens.
Good human interfaces let us see the world from new perspectives.
I really view language models as a new kind of scientific and creative instrument, like a microscope for a mathematical space of ideas. And as our understanding of this mathematical space and our instrument improves, I think we’ll see rapid progress in our ability to craft new ideas and imagine new worlds, just as we’ve seen for color and music.