2026-03-17 22:00:00
A couple months ago, I wrote about how design tools are the new design deliverables and built the LukeW Character Maker to illustrate the idea. Since then, people have made over 4,500 characters and I regularly get asked how it stays consistent. I recently updated the image model, error-checking, and prompts, so here's what changed and why.
Google recently released a new version of their image generation model (Nano Banana 2) and I put it to the test on my Character Maker. The results are noticeably more dynamic and three-dimensional than the previous version. Characters have more depth, better lighting, and more active poses. So I'm now using it as the default model (until Reve 1.5 is available as an API).
One of the ways I originally reinforced consistency in my character maker was by checking whether an image generation model's API returned images with the same dimensions as the reference images I sent it. If the dimensions didn't match, I knew the model had ignored the visual reference so I forced it to try again. In my testing, this was needed about 1 in every 30-40 images. A very simple check, but it worked well.
A week into using Nano Banana 2, that sizing check started throwing errors. Generated images were no longer coming back with the exact dimensions of my reference images, breaking my verification loop. I had to resize the reference images to match Google's default 1K image size (1365px by 768px). But that took away my consistency check, so I had reinforce my prompt rewriter to make up for it.
Update: A day after publishing this overview, Google quietly changed the image format their API returns (from PNG to WEBP). This made image dimensions read incorrectly, causing every generation attempt to fail. Had to implementation a fix that works regardless of what format Google decides to send back.
This is where most of the ongoing work happens. As real people used the tool, edge cases piled up and the first step of my pipeline (prompt rewriting) had to evolve. For example, my character is supposed to be faceless (no eyes, no mouth, no hair). This had to be reinforced progressively over several iterations. Turns out image models really want to put a face on things.
For color accuracy, I shifted from named colors like "lime-green" that relied on the reference images for accuracy to explicitly adding both HEX codes and RGB values. Getting the exact greens to reproduce consistently required that level of specificity. I also added default outfit color rules for when people try to request color changes.
Content moderation expanded steadily as people found creative ways to push boundaries. I blocked categories like gore, inappropriate clothing, and full body color changes, while loosening rejection criteria from blocking any "appearance changes" to only rejecting clearly inappropriate inputs. The goal: allow creative freedom while preventing abuse.
The overall approach was: start broad, then iteratively tighten character consistency while expanding content moderation guardrails as real usage revealed what was needed.
At this point, my character comes back consistent almost every time. About 1 in 50 generations still produces an extra arm or a mouth (he's faceless, remember?). I've tested checking each image with a vision model then sending it back for regeneration if something is off (examples above). But given how rarely this happens and how much latency and cost it would auto check every image, it's currently not worth the tradeoff for me. For other uses cases, it might be?
If you haven't already, try the LukeW Character Maker yourself. Though I might have to revisit the pipeline again if you get too creative.
2026-03-12 22:00:00
In my recent Designing AI Products talk, I outlined several of the lessons we've learned building AI-native companies over the past four years. Specifically the patterns that keep proving durable as we speed-run through this evolution of what AI products will ultimately become.
I opened by framing something I think is really important: every time there's a major technology platform shift, almost everything about what an "application" is changes. From mainframes to personal computers, from desktop software to web apps, from web to mobile, the way we build, deliver, and experience software transforms completely each time.
There's always this awkward period where we try to cram the old paradigm into the new one. I dug up an old deck from when we were redesigning Yahoo, and even two years after the iPhone launched, we were still just trying to port the Yahoo webpage into a native iOS app. The same thing is happening now with AI. The difference is this evolution is moving really, really fast.
From there, I walked through the stages of AI product evolution as I've experienced them.
The first stage is AI working behind the scenes. Back in 2016, Google Translate was "completely reinvented," but the interface itself changed not at all. What actually happened was they replaced all these separate translation systems with a single neural network that could translate between language pairs it was never explicitly trained on. YouTube made a similar move with deep learning for video recommendations. The UIs stayed the same; everything transformative was happening under the hood.
I remember being at Google for years where the conversation was always about how to make machine learning more of a core part of the experience, but it never really got to the point where people were explicitly interacting with an AI model.
That changed with the explosion of chat. ChatGPT and everything that looks exactly like it made direct conversation with AI models the dominant pattern, and chat got bolted onto nearly every software product in a very short time. I illustrated this with Ask LukeW, a system I built almost three years ago that lets people talk to my body of work in natural language. It seems pretty simple now, but building and testing it surfaced a few patterns that have carried over into everything we've done since.
One is suggested questions. When you ask something, the system shows follow-up suggestions tied to your question and the broader corpus. When we tested this, we found these did an enormous amount of heavy lifting. They helped people understand what the system could do and how to use it.
A huge percentage of all interactions kicked off from one of these suggestions. And they've only gotten better with stronger models. In our newer products like Rev (for creatives) and Intent (for developers), the suggestions have become so relevant that people often just pick them with keyboard shortcuts instead of typing anything at all.
Another pattern is citation. Even just seeing where information comes from gives people a real trust boost. In Ask LukeW, you could hover over a citation and it would take you to the specific part of a document or video. This was an early example, but as AI systems gain access to more tools and can do much more than look up information, the question of how to represent what they did and why in the interface becomes increasingly important.
And the third is what I call the walls of text problem. Because so much of this is built on large language models, people are often left staring at big blocks of text they have to parse and interpret. We found that bringing back multimedia, like responding with images alongside text, or using diagrams and interactive elements, helped a lot.
Through that walkthrough of what now seems like a pretty simple AI application, I'd actually touched on what I think are the three core issues that remain with us today: capability awareness (what can I do here?), context awareness (what is the system looking at?), and the walls of text problem (too much output to process).
The next major stage is things becoming agentic. When AI models can use tools, make plans, configure those tools, analyze results, think in between steps, and fire off more tools based on what they find, the complexity of what to show in the UI explodes. And this compounds when you remember that most of this is getting bolted into side panels of existing software. I showed a developer tool where a single request to an agent produced this enormous thread of tool calls, model responses, more tool calls, and on and on. It's just a lot to take in.
A common reaction is to just show less of it, collapse it, or hide it entirely. And some AI products do that. But what I've seen consistently is that users fall into two groups. One group really wants to see what the system is thinking and doing and why. The other group just wants to let it rip and see what comes out. I originally thought this was a new-versus-experienced user thing, but it honestly feels more like two distinct mindsets.
We've tried many different approaches. In Bench, a workspace for knowledge work, we showed all tool calls on the left, let you click into each one to see what it did, and expand the thinking steps between them. You could even open individual tool calls and see their internal steps. That was a lot.
As we iterated, we moved from highlighting every tool call to condensing them, surfacing just what they were doing, and eventually showing processes inline as single lines you could expand if you wanted. The pattern we've landed on in Intent is collapsed single-line entries for each action. If you really want to, you can pop one open and see what happened inside, but for the most part, collapsing these things (and even finding ways to collapse collapses of these things) is where we are now.
We also experimented with separating process from results entirely. In ChatDB, when you ask a question, the thinking steps appear on the left while results show up on the right. You can scroll through results independently while keeping the summary visible, or open up the thought process to see why it did what it did. Changing the layout to give actual results more prominence while still making the reasoning accessible has worked well.
On the capability awareness front, I showed several approaches we've explored. One is prompt enhancement, where you type something simple and the model rewrites it into a much more detailed, context-aware instruction. This gets really interesting when the system can automatically search a codebase (like our product Augment does) to find relevant patterns and write better instructions that account for them.
Another approach was Bench's visual task builder, where you compose compound sentences from columns of capabilities: "I want to... search... Notion for... a topic... and create a PowerPoint summarizing the findings." This gives people tremendous visibility into what the system can do while also helping them point it in the right direction.
And then there's onboarding. Designers are familiar with the empty screen problem, and the usual advice is to throw tooltips or tutorials at it. But it turns out we can have the AI model handle all of this instead. In ChatDB, when you drag a spreadsheet onto the page, the system picks a color, picks an icon, names the dashboard, starts running analysis, and generates charts for you. You learn what it does by watching it do things, rather than trying to figure out what you can tell it to do.
For context awareness, I showed how products like Reve let you spatially tell the model what to pay attention to. You can highlight an object in an image, drag in reference art, move elements around, and then apply all those changes. You're being very explicit through the interface about what the model should focus on. I also showed context panels where you can attach files, select text, or point the model at specific folders.
The final stage I explored is agents orchestrating other agents. In Intent, there's an agent orchestration mode where a coordinator agent figures out the plan, shows it to you for review, and then kicks off a bunch of sub-agents to execute different parts of the work in parallel. You can watch each agent working on its piece. I think there's a big open question here about where the line is.
How much can people actually process and manage? If you use the metaphor of being a manager or a CEO, can you be a CEO of CEOs? I don't think we know yet, but this is clearly where the evolution is heading.
The throughline of the whole talk was that while the final form of AI applications hasn't been figured out, certain patterns keep proving their value at each stage. Those durable patterns, the ones that hang around and sometimes become even more important as things evolve, are the ones worth paying close attention to.
2026-03-10 22:00:00
As AI products have evolved from models behind the scenes to chat interfaces to agentic systems to agents coordinating other agents, the design question has begun to shift. It used to be about how people interact with AI. Now it's about where and how people fit in.
The clearest example of this is in software development. In Anthropic's 2025 data, software developers made up 3% of U.S. workers but nearly 40% of all Claude conversations. A year later, their 2026 Measuring Agent Autonomy report showed software engineering accounting for roughly 50% of AI agent deployments. Whatever developers are doing with AI now, other domains are likely to follow suit.
And what developers have been doing is watching their role abstract upward at a pace that's hard to overstate.
To make this more tangible, our developer workspace, Intent, makes use of agent orchestration where a coordinator agent analyzes what needs to happen, searches across relevant resources, and generates a plan. Once you approve that plan, the coordinator kicks off specialized agents to do the work: one handling the design system, another building out navigation, another coordinating their outputs. Your role is to review, approve, and steer.
Stack that one more level and you've got machines running machines running machines. At which point: where exactly does the human sit?
To use a metaphor we're all familiar with: a manager keeps tabs on a handful of direct reports. A director manages managers. A CEO manages directors. At each layer, the person at the top trades direct understanding for leverage. They see less of the actual work and more of the summaries, status updates, and roll-ups.
But being an effective CEO is extraordinarily rare. Not just thinking you can do it, but actually doing it well. And a CEO of CEOs? The number of people who have operated at that scale is vanishingly small.
Which raises two questions. First, how far up the stack can humans actually go? Agent orchestration? Orchestration of orchestration? Where does it break down? Second, at whatever level we land on, what skills do people need to operate there?
The durable skills may turn out to be steering, delegation, and awareness: knowing what to ask for, how much autonomy to grant, and when to look under the hood. These aren't programming skills. They're closer to the skills of a good leader who knows when to let the team run and when to step in.
We used to design how people interact with software. Now we're designing how much they need to.
2026-03-01 22:00:00
I’ve always believed in the power of small teams. The start-ups I co-founded never exceeded five employees, yet achieved a lot. With today's technology, even more companies can remain extremely small and be extremely effective. And that's awesome.
When Twitter acquired Bagcheck in 2011, Sam (CTO) and I were shipping multiple times a day. We started with a command line interface that let us figure out what objects and actions we needed before ever building any UI. When we did, we used logic-less templates so I could iterate on the front-end quickly while Sam managed the back-end code.
The point was to move fast and learn. With just two people building the product, we never got bottlenecked on decision-making or coordination. While conventional wisdom says "add more resources" to go faster, it rarely works out that way. Most companies go slow because of plodding decision making and opaque alignment. Smaller teams naturally don't have this problem.
But small teams can only do so much right? That's why every team in a big company is always asking for more resources. Not anymore.
Armed with highly capable AI systems, everyone (designer, developer, etc.) on a team can get more done. In big teams, though, these new capabilities smack head first into the decision-making and alignment problems that have always been there. In small teams, they don't.
So how small? Surely we need at least 100? 50? Bagcheck never crossed four employees and when Google acquired my next company, Polar, in 2014 there was five of us. These companies pre-dated AI coding agents and large language models. With today's AI capabilities, the number of people you need to get a lot done fast is probably a lot smaller than you think.
2026-02-17 22:00:00
As AI products lean more heavily into agentic capabilities, the same design challenges keep surfacing across projects. Here's a look at how we've approached one of these recurring debates: showing the work of agents, or not.
An AI product becomes agentic when the model doesn't just respond to a prompt, but plans which tools to use, configures them, and decides its next steps based on the results. This additional set of process means AI products are able to do more, check their work, and thereby provide better results. The downside, though, is it can be a lot for people to take in.
Whether people are using agentic products for coding, data analysis, or writing, I keep seeing the same split: some users find the agent's work overwhelming and want the interface to focus purely on results. Others say seeing that work is essential for monitoring and checking what the agent is doing. Strongly worded feedback comes in from both sides.
I initially assumed this was a temporary divide. New users tend to watch closely and check the system's progress, but as trust builds, that scrutiny fades and monitoring starts to feel like a chore. Yet it still seems like there's two camps (for now). So how does a product strike the balance?
When working on Bench, a workspace for knowledge work, we explored many approaches to displaying tool use, results, and configuration. (though we quickly learned, no one configured tools, that's the agent's job.) In this exploration, results from each tool are grouped beneath it and open in the right column when selected (video below).
A later iteration featured several levels of progressive disclosure. Tool calls were collapsed by default, and selecting one would show its results in the right column. Selecting the timeline highlighted all the process and decision points between tool uses. You could even open each tool's settings, re-run it, or stop it mid-execution (video below). Tools were new back then and we were working off the assumption that people would want visibility and control. It was too much.
In subsequent iterations we focused on reducing the visual weight of tools and showing less process by default. This became even more important as the number of tools grew..
For ChatDB, which helps people understand and visualize data, we split the interface into two columns. While the agent works (video below), the left side shows what it's doing: the decisions it's making, the tools it's picking, and so on. When results appear in the right column, the left side collapses down to a summary and link so the focus shifts to the output. Anyone who wants to review the steps can open it back up.
This approach allows the agent's work to serve a detailed progress indicator, instead of forcing people to watch a spinner while things work.
More recently in Intent, a developer workspace for working with agents, we used a single line to show an agent's work with the ability to expand it for more details. It's an attempt to strike a balance between too much and not enough but I still hear opinions on both sides.
2026-01-29 22:00:00
Quite quickly, AI products have transitioned from models behind the scenes powering features to people talking directly to models (chat) to models deciding which tools to use and how (agents) to agents orchestrating other agents. Like the shifts that came before it, orchestration is a another opportunity for new AI products and UI solutions.
I charted the transition from AI models behind the scenes to chat to agents last year in The Evolution of AI Products. At the time, we were wrestling with how to spin up sub-agents and run them in the background. That's mostly been settled and agent orchestration (coordinating and verifying the work of multiple agents on unified tasks) is today's AI product design challenge.
As Microsoft CEO, Satya Nadella put it:
"One of the metaphors I think we're all sort of working towards is 'I do this macro delegation and micro steering [of AI agents]'. What is the UI that meets this new intelligence capability? It's just a different way than the chat interface. And I think that would be a new way for the human computer interface. Quite frankly, it's probably bigger."
He's right. When you have multiple agents working together, you need more than a conversation thread as anyone that's tried to manage a team through a single Slack or email thread can attest.
Intent by Augment (in early preview today) is a new software development app with agent orchestration at its core. You're not managing individual model calls or chat threads. You're setting up workspaces, defining your intent (what you want to get done), and letting specialized agents work in parallel while staying aligned.
To ground this in a real-world analogy, if you want to accomplish a large or complicated task you need...
That's a space in Intent in a nutshell. Software developers create a new space for every task they want to get done. Each space makes use of specific agents and context to complete the task. Each space is isolated using git worktrees so agents can work freely and safely. Fire up as many spaces as you want without having them interfere with each other.
I've often said "context is king" when talking about what makes AI products effective. That's especially true when you need to coordinate the work of multiple parallel agents with varying capabilities. In Intent, context is managed by a living spec which provides a shared understanding that multiple agents can reference while working on different parts of a problem. This living spec is written and updated by a coordinator agent as it manages the work of implementer and verifier agents. It's a whole agent dev team.
Because agents operate from the same spec, parallel work becomes possible. Assumptions, tradeoffs, and decisions stay aligned and updated as code changes without requiring constant human intervention to keep things on the same page. For instance, one agent handles the theme system while another works on component styles. Both reference the same context, so their work fits together.
By default, a coordinator writes a spec and delegates to specialists for you. But you can also set up spaces with custom agents and manage your own context if you want. Think of it as manual vs. auto mode.
The UI for agent orchestration in Intent isn't a fancier chat interface. It's context management, agent specialization, and a unified developer workflow. It's not hard to squint and see very similar orchestration UI being useful for lots of other domains too.