MoreRSS

site iconOne Useful ThingModify

Trying to understand the implications of AI for work, education, and life. By Prof. Ethan Mollick
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of One Useful Thing

On Working with Wizards

2025-09-12 04:37:39

In my book, Co-Intelligence, I outlined a way that people could work with AI, which was, rather unsurprisingly, as a co-intelligence. Teamed with a chatbot, humans could use AI as a sort of intern or co-worker, correcting its errors, checking its work, co-developing ideas, and guiding it in the right direction. Over the past few weeks, I have come to believe that co-intelligence is still important but that the nature of AI is starting to point in a different direction. We're moving from partners to audience, from collaboration to conjuring.

A good way to illustrate this change is to ask an AI to explain what has happened since I wrote the book. I fed my book and all 140 or so One Useful Thing posts (incidentally, I can’t believe I have written that many posts!) into NotebookLM and chose the new video overview option with a basic prompt to make a video about what has happened in the world of AI.

A few minutes later, I got this. And it is pretty good. Good enough that I think it is worth watching to get an update on what has happened since my book was written.

But how did the AI pick the points it made? I don’t know, but they were pretty good. How did it decide on the slides to use? I don’t know, but they were also pretty on target (though images remain a bit of a weak point, as it didn’t show me the promised otter). Was it right? That seemed like something I should check.

So, I went through the video several times, checking all the facts. It got all the numbers right, including the data on MMLU scores and the results of AI performance on the neurosurgery exam data (I am not even sure when I cited that material). My only real issue was that it should have noted that I was one of several co-authors in our study of Boston Consulting Group that also introduced the term “jagged frontier.” Also, I wouldn’t have said everything the way the AI did (it was a little bombastic, and my book is not out-of-date yet!), but there were no substantive errors.

I think this process is typical of the new wave of AI, for an increasing range of complex tasks, you get an amazing and sophisticated output in response to a vague request, but you have no part in the process. You don’t know how the AI made the choices it made, nor can you confirm that everything is completely correct. We're shifting from being collaborators who shape the process to being supplicants who receive the output. It is a transition from working with a co-intelligence to working with a wizard. Magic gets done, but we don’t always know what to do with the results. This pattern — impressive output, opaque process — becomes even more pronounced with research tasks.

Asking for Magic

Right now, no AI model feels more like a wizard than GPT-5 Pro, which is only accessible to paying users. GPT-5 Pro is capable of some frankly amazing feats. For example, I gave it an academic paper to read with the instructions “critique the methods of this paper, figure out better methods and apply them.” This was not just any paper, it was my job market paper, which means my first major work as an academic. It took me over a year to write and was read carefully by many of the brightest people in my field before finally being peer reviewed and published in a major journal.

Nine minutes and forty seconds later, I had a very detailed critique. This wasn’t just editorial criticism, GPT-5 Pro apparently ran its own experiments using code to verify my results, including doing Monte Carlo analysis and re-interpreting the fixed effects in my statistical models. It had many suggestions as a result (though it fortunately concluded that “the headline claim [of my paper] survives scrutiny”), but one stood out. It found a small error, previously unnoticed. The error involved two different sets of numbers in two tables that were linked in ways I did not explicitly spell out in my paper. The AI found the minor error, no one ever had before.

Again, I was left with the wizard problem: was this right? I checked through the results, and found that it was, but I still have no idea of what the AI did to discover this problem, nor whether the other things it claimed to have done happened as described. But I was impressed by GPT-5 Pro’s analysis, which is why I now throw all sorts of problems, big and small at the model: Is the Gartner hype cycle real? Did census data show AI use declining at large firms? Just ask GPT-5 Pro and get the right answer. I think. I haven’t found an error yet, but that doesn’t mean there aren’t any. And, of course, there are many other tasks that the AI would fail to deliver any sort of good answer for. Who knows with wizards?

To see how this might soon apply to work more broadly, consider another advanced AI, Claude 4.1 Opus, which recently gained the ability to work with files. It is especially talented at Excel, so I gave it a hard challenge on an Excel file I knew well. There is an exercise I used in my entrepreneurship classes that involves analyzing the financial model of a small desk manufacturing business as a lesson about how to plan despite uncertainty. I gave Claude the old, multi-tab Excel file, and asked the AI to update it for a new business - a cheese shop - while still maintaining the goal of the overall exercise.

With just that instruction, it read the lesson plan and the old spreadsheets, including their formulas, and created a new one, updating all of the information to be appropriate for a cheese shop. A few minutes later, with just the one prompt, I had a new, transformed spreadsheet downloaded on my computer, one that had entirely new data while still communicating the key lesson.

The original document on the left, what Claude gave me on the right

Again, the wizard didn’t tell me the secret to its tricks, so I had to check the results over carefully. From what I saw, they seemed very good, preserving the lessons in a new context. I did spot a few issues in the formula and business modelling that I would do differently (I would have had fewer business days per year, for example), but that felt more like a difference of opinion than a substantive error.

Curious to see how far Claude could go, and since everyone always asks me whether AI can do PowerPoint, I also prompted: “great, now make a good PowerPoint for this business” and got the following result.

This is a pretty solid start to a pitch deck, and one without any major errors, but it also isn’t ready-to-go. This emphasizes the jagged frontier of AI: it is very good at some things and worse at others in ways that are hard to predict without experience. I have been showing you examples within the ever-expanding frontier of AI abilities, but that doesn’t mean that AI can do everything with equal ease. But my focus is less on the expanding range of AI ability in this post, than about our changing relationships with AIs.

The Problems with Wizards

These new AI systems are essentially agents, AI that can plan and act autonomously toward given goals. When I asked Claude to change my spreadsheet, it planned out steps and executed them, from reading the original spreadsheet to coding up a new one. But it also adjusted to unexpected errors, twice fixing the spreadsheet (without me asking) and verifying its answers multiple times. I didn’t get to select these steps, in fact, in the new wave of agents powered by reinforcement learning, no one selects the steps, the models learn their own approach to solving problems.

The steps Claude reported it went through in order to change the spreadsheet

Not only can I not intervene, I also cannot be entirely sure what the AI system actually did. The steps that Claude reported are mere summaries of its work, GPT-5 Pro provides even less information, while NotebookLM gives you almost no insights at all into its process in creating a video. Even if I could see the steps, however, I would need to be an expert in many fields - from coding to entrepreneurship - to really have a sense of what the AI was doing. And then, of course, there is the question of accuracy. How can I tell if the AI is accurate without checking every fact? And even if the facts are right, maybe I would have made a different judgement about how to present or frame them. But I can’t do anything, because wizards don’t want my help and work in secretive ways that even they can’t explain.

The hard thing about this is that the results are good. Very good. I am an expert in the three tasks I gave AI in this post, and I did not see any factual errors in any of these outputs, though there were some minor formatting errors and choices I would have made differently. Of course, I can’t actually tell you if the documents are error-free without checking every detail. Sometimes that takes far less time than doing the work yourself, sometimes it takes a lot more. Sometimes the AI’s work is so sophisticated that you couldn’t check it if you tried. And that suggests another risk we don't talk about enough: every time we hand work to a wizard, we lose a chance to develop our own expertise, to build the very judgment we need to evaluate the wizard's work.

But I come back to the inescapable point that the results are good, at least in these cases. They are what I would expect from a graduate student working for a couple hours (or more, in the case of the re-analysis of my paper), except I got them in minutes.

This is the issue with wizards: We're getting something magical, but we're also becoming the audience rather than the magician, or even the magician's assistant. In the co-intelligence model, we guided, corrected, and collaborated. Increasingly, we prompt, wait, and verify… if we can.

So what do we do with our wizards? I think we need to develop a new literacy: First, learn when to summon the wizard versus when to work with AI as a co-intelligence or to not use AI at all. AI is far from perfect, and in areas where it still falls short, humans often succeed. But for the increasing number of tasks where AI is useful, co-intelligence, and the back-and-forth it requires, is often superior to a machine alone. Yet, there are, increasingly, times when summoning a wizard is best, and just trusting what it conjures.

Second, we need to become connoisseurs of output rather than process. We need to curate and select among the outputs the AI provides, but more than that, we need to work with AI enough to develop instincts for when it succeeds and when it fails. We have to learn to judge what's right, what's off, and what's worth the risk of not knowing. This creates a hard problem for education: How do you train someone to verify work in fields they haven't mastered, when the AI itself prevents them from developing mastery? Figuring out how to address this gap is increasingly urgent.

Finally, embrace provisional trust. The wizard model means working with “good enough” more often, not because we're lowering standards, but because perfect verification is becoming impossible. The question isn't “Is this completely correct?” but “Is this useful enough for this purpose?”

We are already used to trusting technological magic. Every time we use GPS without understanding the route, or let an algorithm determine what we see, we're trusting a different type of wizard. But there's a crucial difference. When GPS fails, I find out quickly when I reach a dead end. When Netflix recommends the wrong movie, I just don't watch it. But when AI analyzes my research or transforms my spreadsheet, the better it gets, the harder it becomes to know if it's wrong. The paradox of working with AI wizards is that competence and opacity rise together. We need these tools most for the tasks where we're least able to verify them. It’s the old lesson from fairy tales: the better the magic, the deeper the mystery. We'll keep summoning our wizards, checking what we can, and hoping the spells work. At nine minutes for a week's worth of analysis, how could we not? Welcome to the age of wizards.

Subscribe now

Share

Mass Intelligence

2025-08-29 04:47:26

More than a billion people use AI chatbots regularly. ChatGPT has over 700 million weekly users. Gemini and other leading AIs add hundreds of millions more. In my posts, I often focus on the advances that AI is making (for example, in the past few weeks, both OpenAI and Google AIs chatbots got gold medals in the International Math Olympiad), but that obscures a broader shift that's been building: we're entering an era of Mass Intelligence, where powerful AI is becoming as accessible as a Google search.

Until recently, free users of these systems (the overwhelming majority) had access only to older, smaller AI models that frequently made mistakes and had limited use for complex work. The best models, like Reasoners that can solve very hard problems and hallucinate much less often, required paying somewhere between $20 and $200 a month. And even then, you needed to know which model to pick and how to prompt it properly. But the economics and interfaces are changing rapidly, with fairly large consequences for how all of us work, learn, and think.

Powerful AI is Getting Cheaper and Easier to Access

There have been two barriers to accessing powerful AI for most users. The first was confusion. Few people knew to select an AI model. Even fewer knew that picking o3 from a menu in ChatGPT would get them access to an excellent Reasoner AI model, while picking 4o (which seems like a higher number) would give them something far less capable. According to OpenAI, less than 7% of paying customers selected o3 on a regular basis, meaning even power users were missing out on what Reasoners could do.

Another factor was cost. Because the best models are expensive, free users were often not given access to them, or else given very limited access. Google led the way in giving some free access to its best models, but OpenAI stated that almost none of its free customers had regular access to reasoning models prior to the launch of GPT-5.

GPT-5 was supposed to solve both of these problems, which is partially why its debut was so messy and confusing. GPT-5 is actually two things. It was the overall name for a family of quite different models, from the weaker GPT-5 Nano to the powerful GPT-5 Pro. It was also the name given to the tool that picked which model to use and how much computing power the AI should use to solve your problem. When you are writing to “GPT-5” you are actually talking to a router that is supposed to automatically decide whether your problem can be solved by a smaller, faster model or needs to go to a more powerful Reasoner.

When you pick ChatGPT 5 you are actually picking Auto mode, which selects among the various ChatGPT 5 models, some of which are among the best models in the world, some of which are much weaker. If you pay for access, select “GPT-5 Thinking” for almost any problem beyond a simple chat.

You could see how this was supposed to expand access to powerful AI to more users: if you just wanted to chat, GPT-5 was supposed to use its weaker specialized chat models; if you were trying to solve a math problem, GPT-5 was supposed to send you to its slower, more expensive GPT-5 Thinking model. This would save money and give more people access to the best AIs. But the rollout had issues. This practice wasn’t well explained and the router did not work well at first. The result is that one person using GPT-5 got a very smart answer while another got a bad one. Despite these issues, OpenAI reported early success. Within a few days of launch, the percentage of paying customers who had used a Reasoner went from 7% to 24% and the number of free customers using the most powerful models went from almost zero to 7%.

Part of this change is driven by the fact that smarter models are getting dramatically more efficient to run. This graph shows how fast this trend has played out, mapping the capability of AI on the y-axis and the logarithmically decreasing costs on the x-axis. When GPT-4 came out it was around $50 to work with a million tokens (a token is roughly a word), now it costs around 14 cents per million tokens to use GPT-5 nano, a much more capable model than the original GPT-4.

The Graduate-Level Google-Proof Q&A test (GPQA) is a series of very hard multiple-choice problems designed to test advanced knowledge. non-experts with access to the internet get 34% right, PhDs with internet access get 74-81% inside their specialty. The cost per million tokens is the cost of using the model. (I gathered this data, so apologies for any errors.)

This efficiency gain isn't just financial, it's also environmental. Google has reported that energy efficiency per prompt has improved by 33x in the last year alone. The marginal energy used by a standard prompt from a modern LLM in 2025 is relatively established at this point, from both independent tests and official announcements. It is roughly 0.0003 kWh, the same energy use as 8-10 seconds of streaming Netflix or the equivalent of a Google search in 2008 (interestingly, image creation seems to use a similar amount of energy as a text prompt)1. How much water these models use per prompt is less clear but ranges from a few drops to a fifth of a shot glass (.25mL to 5mL+), depending on the definitions of water use (here is the low water argument and the high water argument).

These improvements mean that even as AI gets more powerful, it's also becoming viable to give to more people. The marginal cost of serving each additional user has collapsed, which means more business models, like ad support, become possible. Free users can now run prompts that would have cost dollars just two years ago. This is how a billion people suddenly get access to powerful AIs: not through some grand democratization initiative, but because the economics finally make it possible.

Powerful AI is Getting Easy to Use

Getting access to a powerful AI is not enough, people need to actually use it to get things done. Using AI well used to be a pretty challenging process which involved crafting a prompt using techniques like chain-of-thought along with learning tips and tricks to get the most out of your AI. In a recent series of experiments, however, we have discovered that these techniques don’t really help anymore. Powerful AI models are just getting better at doing what you ask them to or even figuring out what you want and going beyond what you ask (and no, threatening them or being nice to them does not seem to help on average).

And it isn’t just text models that are becoming cheaper and easier to use. Google released a new image model with the code name “nano banana” and the much more boring official name Gemini 2.5 Flash Image Generator. In addition to being excellent (though better at editing images than creating new ones), it is also cheap enough that free users can access it. And, unlike previous generations of AI image generators, it follows instructions in plain language very well.

As an example of both its power and ease of use, I uploaded an iconic (and copyright free) image of the Apollo 11 astronauts and a random picture of a sparkly tuxedo and gave it the simplest prompts: “dress Neil Armstrong on the left in this tuxedo

Here is what it gave me a few seconds later:

There are issues that someone with an expert eye would spot, but it is still impressive to see the realistic folds of the tuxedo and how it is blended into the scene (the NASA pin on the lapel was a nice touch). There is still a lot of randomness in the process that makes AI image editing unsuitable for many professional applications, but for most people, this represents a huge leap in not just what they can do, but how easy it is to do it.

And we can go further: “now show a photograph where neil armstrong and buzz aldrin, in the same outfits, are sitting in their seats in a modern airplane, neil looks relaxed and is leaning back, playing a trumpet, buzz seems nervous and is holding a hamburger, in the middle seat is a realistic otter sitting in a seat and using a laptop.

This is many things: A pretty impressive output from the AI (look at the expressions, and how it preserved Buzz’s ring and Neil’s lapel pin). A distortion of a famous moment in history made possible by AI. And a potential warning about how weird things are going to get when these sorts of technologies are used widely.

The Weirdness of Mass Intelligence

When powerful AI is in the hands of a billion people, a lot of things are going to happen at once. A lot of things are already happening at once.

Some people have intense relationships with AI models while other people are being saved from loneliness. AI models may be causing mental breakdowns and dangerous behavior for some while being used to diagnose the diseases of others. It is being used to write obituaries and create scriptures and cheat on homework and launch new ventures and thousands of other unexpected uses. These uses, and both the problems and benefits, are likely to only multiply as AI systems get more powerful.

And while Google's AI image generator has guardrails to limit misuse, as well as invisible watermarks to identify AI images, I expect much less restrictive AI image generators will likely get close to nano banana in quality in the coming months.

The AI companies (whether you believe their commitments to safety or not) seem to be as unable to absorb all of this as the rest of us are. When a billion people have access to advanced AI, we've entered what we might call the era of Mass Intelligence. Every institution we have — schools, hospitals, courts, companies, governments — was built for a world where intelligence was scarce and expensive. Now every profession, every institution, every community has to figure out how to thrive with Mass Intelligence. How do we harness a billion people using AI while managing the chaos that comes with it? How do we rebuild trust when anyone can fabricate anything? How do we preserve what's valuable about human expertise while democratizing access to knowledge?

So here we are. Powerful AI is cheap enough to give away, easy enough that you don't need a manual, and capable enough to outperform humans at a range of intellectual tasks. A flood of opportunities and problems are about to show up in classrooms, courtrooms, and boardrooms around the world. The Mass Intelligence era is what happens when you give a billion people access to an unprecedented set of tools and see what they do with it. We are about to find out what that is like.

Subscribe now

Share

1

This is the energy required to answer a standard prompt. It does not take into account the energy needed to train AI models, which is a one-time process that is very energy intensive. We do not know how much energy is used to create a modern model, but it was estimated that training GPT-4 took a little above 500,000 kWh, about 18 hours of a Boeing 737 in flight.

GPT-5: It Just Does Stuff

2025-08-08 01:02:18

I have had early access to GPT-51, and I wanted to give you some impressions of what it is and what it means. But the answer to the first question everyone will have is: Yes, this is a big deal. I asked GPT-5 to show you why by pasting this intro paragraph, exactly as you’re reading it, into ChatGPT, including the phrase: “you are GPT-5, do something very dramatic to illustrate my point, it has to fit into the next paragraph.”

That was the only prompt I gave it. It “thought” for 24 seconds and wrote:

Thunderstruck. Here, watch. I build worlds. See ideas become instruments. I code, compose, and converse. Stories synthesize science; solutions spark, swiftly. Asks arrive; answers appear, astonishingly, across domains. Behold: business plans, blueprints, briefs, beautifully built, briskly. I infer intent, integrate insight, and iterate instantly, indefatigably. Gaps get gauged; goals get generated, guarded, and guaranteed gracefully. During dialogue, disparate data distills down; decisions develop deliberately, deftly, dependably. Everything explained efficiently, even ethics: evidence examined, empathy engaged, everywhere, everyday, equitably. All at once, ambiguous arguments align as actionable, accurate advice awaits anyone anywhere. Look: language, logic, learning, and long-term memory link, lifting lives and labor limitlessly worldwide.

If you didn’t catch the many tricks - the first word of each sentence spells out the phrase This is a Big Deal, each sentence is precisely one word longer than the previous sentence. each word in a sentence mostly starts with the same letter, and it is coherent writing with an interesting sense of style. In a paragraph, GPT-5 shows it can come up with a clever idea, plan, and manage the complicated execution (remember when AI couldn’t count the number of Rs in “strawberry”? that was eight months ago).

GPT-5 just does stuff, often extraordinary stuff, sometimes weird stuff, sometimes very AI stuff, on its own. And that is what makes it so interesting.

Just Doing Stuff

As someone who has spent a lot of time talking to people about AI, there are two major problems I see, that, if addressed, would make most people’s AI use much more productive and much less frustrating. The first is selecting the right model to use. In general, AIs that "think" before answering (called Reasoners) are the best at hard problems. The longer they think, the better the answer, but thinking costs money and takes time. So OpenAI previously made the default ChatGPT use fast, dumb models, hiding the good stuff from most users. A surprising number of people have never seen what AI can actually do because they're stuck on GPT-4o, and don’t know which of the confusingly-named models are better.

GPT-5 does away with this by selecting models for you, automatically. GPT-5 is not one model as much as it is a switch that selects among multiple GPT-5 models of various sizes and abilities. When you ask GPT-5 for something, the AI decides which model to use and how much effort to put into “thinking.” It just does it for you. For most people, this automation will be helpful, and the results might even be shocking, because, having only used default older models, they will get to see what a Reasoner can accomplish on hard problems. But for people who use AI more seriously, there is an issue: GPT-5 is somewhat arbitrary about deciding what a hard problem is.

For example, I asked GPT-5 to “create a svg with code of an otter using a laptop on a plane” (asking for an .svg file requires the AI to blindly draw an image using basic shapes and math, a very hard challenge). Around 2/3 of the time, GPT-5 decides this is an easy problem, and responds instantly, presumably using its weakest model and lowest reasoning time. I get an image like this:

The rest of the time, GPT-5 decides this is a hard problem, and switches to a Reasoner, spending 6 or 7 seconds thinking before producing an image like this, which is much better. How does it choose? I don’t know, but if I ask the model to “think hard” in my prompt, I am more likely to be routed to the better model.

But premium subscribers can directly select the more powerful models, such as the one called (at least for me) GPT-5 Thinking. This removes some of the issues with being at the mercy of GPT-5’s model selector. I found that if I encouraged the model to think hard about the otter, it would spend a good 30 seconds before giving you an images like these the one below - notice the little animations, the steaming coffee cup, and clouds going by outside, none of which I asked for. How to ensure the model puts in the most effort? It is really unclear - GPT-5 just does things for you.

And that extends to the second most common problem with AI use, which is that many people don’t know what AIs can do, or even what tasks they want accomplished. That is especially true of the new agentic AIs, which can take a wide range of actions to accomplish the goals you give it, from searching the web to creating documents. But what should you ask for? A lot of people seem stumped. Again, GPT-5 solves this problem. It is very proactive, always suggesting things to do.

I asked GPT-5 Thinking (I trust the less powerful GPT-5 models much less) “generate 10 startup ideas for a former business school entrepreneurship professor to launch, pick the best according to some rubric, figure out what I need to do to win, do it.” I got the business idea I asked for. I also got a whole bunch of things I did not: drafts of landing pages and LinkedIn copy and simple financials and a lot more. I am a professor who has taught entrepreneurship (and been an entrepreneur) and I can say confidently that, while not perfect, this was a high-quality start that would have taken a team of MBAs a couple hours to work through. From one prompt.

It just does things, and it suggested others things to do. And it did those, too: PDFs and Word documents and Excel and research plans and websites.

It is impressive, a little unnerving, to have the AI go so far on its own. You can also see the AI asked for my guidance but was happy to proceed without it. This is a model that wants to do things for you.

Building Things

Let me show you what 'just doing stuff' looks like for a non-coder using GPT-5 for coding. For fun, I prompted GPT-5 “make a procedural brutalist building creator where i can drag and edit buildings in cool ways, they should look like actual buildings, think hard.” That's it. Vague, grammatically questionable, no specifications.

A couple minutes later, I had a working 3D city builder.

Not a sketch. Not a plan. A functioning app where I could drag buildings around and edit them as needed. I kept typing variations of “make it better” without any additional guidance. And GPT-5 kept adding features I never asked for: neon lights, cars driving through streets, facade editing, pre-set building types, dramatic camera angles, a whole save system. It was like watching someone else's imagination at work. The product you see below was 100% AI, all I did was keep encouraging the system - and you don’t just have to watch my video, you can play with the simulator here.

At no point did I look at the code it was creating. The model wasn’t flawless, there were occasional bugs and errors. But in some ways, that was where GPT-5 was at its most impressive. If you have tried “vibecoding” using the AI before, you have almost certainly fallen into a doom loop, where, after a couple of rounds of asking the AI to create something for you, it starts to fail, getting caught in loops of confusion where each error fixed creates new ones. That never happened here. Sometimes new errors were introduced by the AI, but they were always fixed by simply pasting in the error text. I could just ask for whatever I want (or rather let the AI decide to create whatever it wanted) and I never got stuck.

Premonitions

I have written this piece before OpenAI released any official benchmarks about how well its model performs, but, in some ways, it doesn’t matter that much. Last week, Google released Gemini 2.5 with Deep Think, a model that can solve very hard problems (including getting a gold medal at the International Math Olympiad). Many people didn’t notice because they do not have a store of very hard problems they are waiting for AI to solve. I have played enough with GPT-5 to know that it is a very good model (at least the large GPT-5 Thinking model is excellent). But what it really brings to the table is the fact that it just does things. It will tell you what model to use, it will suggest great next steps, it will write in more interesting prose (though it still loves the em-dash). The burden of using AI is lessened.

To be clear, Humans are still very much in the loop, and need to be. You are asked to make decisions and choices all the time by GPT-5, and these systems still make errors and generate hallucinations that humans need to check (although I did not spot any major issues in my own use). The bigger question is whether we will want to be in the loop. GPT-5 (and, I am sure, future releases by other companies) is very smart and pro-active. Which brings me back to that building simulator. I gave the AI encouragement, mostly versions of “make it better.” From that minimal input, it created a fully functional city builder with facade editing, dynamic cameras, neon lights, and flying tours. I never asked for any of these features. I never even looked at the code.

This is what "just doing stuff" really means. When I told GPT-5 to do something dramatic for my intro, it created that paragraph with its hidden acrostic and ascending word counts. I asked for dramatic. It gave me a linguistic magic trick. I used to prompt AI carefully to get what I asked for. Now I can just... gesture vaguely at what I want. And somehow, that works.

Another big change in how we relate to AI is coming, but we will figure out how to adapt to it, as we always do. The difference, this time, is that GPT-5 might figure it out first and suggest next steps.

Subscribe now

Share

The result of the prompt: make an incredibly compelling 14:10 SVG that I can use for my substack post about the launch of GPT-5, the theme of which is "it just does stuff for you" Be radical in your approach.
1

As a reminder, I take no money from any of the AI Labs, including OpenAI. I have no agreements with them besides NDAs. I don’t show them any posts before I write them.

The Bitter Lesson versus The Garbage Can

2025-07-28 19:30:43

One of my favorite academic papers about organizations is by Ruthanne Huising, and it tells the story of teams that were assigned to create process maps of their company, tracing what the organization actually did, from raw materials to finished goods. As they created this map, they realized how much of the work seemed strange and unplanned. They discovered entire processes that produced outputs nobody used, weird semi-official pathways to getting things done, and repeated duplication of efforts. Many of the employees working on the map, once rising stars of the company, became disillusioned.

The Process Map

I’ll let Prof. Huising explain what happened next: “Some held out hope that one or two people at the top knew of these design and operation issues; however, they were often disabused of this optimism. For example, a manager walked the CEO through the map, presenting him with a view he had never seen before and illustrating for him the lack of design and the disconnect between strategy and operations. The CEO, after being walked through the map, sat down, put his head on the table, and said, "This is even more fucked up than I imagined." The CEO revealed that not only was the operation of his organization out of his control but that his grasp on it was imaginary.”

For many people, this may not be a surprise. One thing you learn studying (or working in) organizations is that they are all actually a bit of a mess. In fact, one classic organizational theory is actually called the Garbage Can Model. This views organizations as chaotic "garbage cans" where problems, solutions, and decision-makers are dumped in together, and decisions often happen when these elements collide randomly, rather than through a fully rational process. Of course, it is easy to take this view too far - organizations do have structures, decision-makers, and processes that actually matter. It is just that these structures often evolved and were negotiated among people, rather than being carefully designed and well-recorded.

The Garbage Can represents a world where unwritten rules, bespoke knowledge, and complex and undocumented processes are critical. It is this situation that makes AI adoption in organizations difficult, because even though 43% of American workers have used AI at work, they are mostly doing it in informal ways, solving their own work problems. Scaling AI across the enterprise is hard because traditional automation requires clear rules and defined processes; the very things Garbage Can organizations lack. To address the more general issues of AI and work requires careful building of AI-powered systems for specific use cases, mapping out the real processes and making tools to solve the issues that are discovered.

This is a hard, slow process that suggests enterprise AI adoption will take time. At least, that's how it looks if we assume AI needs to understand our organizations the way we do. But AI researchers have learned something important about these sorts of assumptions.

The Bitter Lesson

Computer scientist Richard Sutton introduced the concept of the Bitter Lesson in an influential 2019 essay where he pointed out a pattern in AI research. Time and again, AI researchers trying to solve a difficult problem, like beating humans in chess, turned to elegant solutions, studying opening moves, positional evaluations, tactical patterns, and endgame databases. Programmers encoded centuries of chess wisdom in hand-crafted software: control the center, develop pieces early, king safety matters, passed pawns are valuable, and so on. Deep Blue, the first chess computer to beat the world’s best human, used some chess knowledge, but combined that with the brute force of being able to search 200 million positions a second. In 2017, Google released AlphaZero, which could beat humans not just in chess but also in shogi and go, and it did it with no prior knowledge of these games at all. Instead, the AI model trained against itself, playing the games until it learned them. All of the elegant knowledge of chess was irrelevant, pure brute force computing combined with generalized approaches to machine learning, was enough to beat them. And that is the Bitter Lesson — encoding human understanding into an AI tends to be worse than just letting the AI figure out how to solve the problem, and adding enough computing power until it can do it better than any human.

Why two versions of this graph? And why are they slightly different? Answers in a bit!

The lesson is bitter because it means that our human understanding of problems built from a lifetime of experience is not that important in solving a problem with AI. Decades of researchers' careful work encoding human expertise was ultimately less effective than just throwing more computation at the problem. We are soon going to see whether the Bitter Lesson applies widely to the world of work.

Agents

While individuals can get a lot of benefits out of using chatbots themselves, a lot of excitement about how to use AI in organizations focuses on agents, a fuzzy term that I define as AI systems capable of taking autonomous action to accomplish a goal. As opposed to guiding a chatbot with prompting, you delegate a task to an agent, and it accomplishes it. However, previous AI systems have not been good enough to handle the full range of organizational needs, there is just too much messiness in the real world. This is why when we created our first AI-powered teaching games a year ago, we had to carefully design each step in the agentic system to handle narrow tasks. And though AI ability to work autonomously is increasing very rapidly, they are still far from human-level on most complicated jobs and are easily led astray on complex tasks.

This is with an 80% success threshold

As an example of the state-of-the art in agentic systems, consider Manus, which uses Claude and a series of clever approaches to make AI agents that can get real work done. The Manus team has shared a lot of tips for building agents, involving some interesting bits of engineering and very elaborate prompt design. When writing this post, I asked Manus: “i need an attractive graph that compares the ELO of the best grandmaster and the ELO of the worlds best chess computer from the first modern chess computer through 2025.” And the system got to work. First, Manus always creates a to-do list, then it gathered data and wrote a number of files and, after some minor adjustments I asked for, finally came up with the graph you can see on the left side above (the one without the box around the graph).

Why did it do these things in this order? Because Manus was built by hand, carefully crafted to be the best general purpose agent available. There are hundreds of lines of bespoke text in its system prompts, including detailed instructions about how to build a to-do list. It incorporates hard-won knowledge on how to make agents work with today’s AI systems.

Do you see the potential problem? “Carefully crafted,” “bespoke,” “incorporates hard-won knowledge” — exactly the kind of work the Bitter Lesson tells us to avoid because it will eventually be made irrelevant by more general-purpose techniques.

It turns out there is now evidence that this may be possible with the recent release of ChatGPT agent (an uninspiring name, but at least it is clear, a big step forward for OpenAI!). ChatGPT agent represents a fundamental shift. It is not trained on the process of doing work; instead, OpenAI used reinforcement learning to train their AI on the actual final outcomes. For example. they may not teach it how to create an Excel file the way a human would, they would simply rate the quality of the Excel files it creates until it learns to make a good one, using whatever methods the AI develops. To illustrate how reinforcement learning and careful crafting lead to similar outcomes, I gave the exact same chess prompt to ChatGPT agent and got the graph on the right above. But this time there was no to-do list, no script to follow, instead the agent charted whatever mysterious course was required to get me the best output it could, according to its training. You can see an excerpt of that below:

But you might notice a few differences between the two charts, besides their appearance. For example, each has different ratings for Deep Blue’s performance because the ELO for Deep Blue was never officially measured. The rating from Manus was based off a basic search, we found a speculative Reddit discussion, while the ChatGPT agent, trained with the reinforcement learning approaches used in Deep Research, turned up more credible sources, including an Atlantic article, to back up its claim. In a similar way, when I asked both agents to re-create the graph by making a fully functional Excel file, ChatGPT’s version worked, while Manus’s had errors.

I don’t know if ChatGPT agent is better than Manus yet, but I suspect that it is far more likely to make gains faster than its competitor. To improve Manus will involve more careful crafting and bespoke work, to improve ChatGPT agents simply requires more computer chips and more examples. If the Bitter Lesson holds, the long-term outcome seems pretty clear. But more critically, the comparison between hand-crafted and outcome-trained agents points to a fundamental question about how organizations should approach AI adoption.

Agents in the Garbage Can

This returns us to the world of organizations. While individuals rapidly adopt AI, companies still struggle with the Garbage Can problem, spending months mapping their chaotic processes before deploying any AI system. But what if that's backwards?

The Bitter Lesson suggests we might soon ignore how companies produce outputs and focus only on the outputs themselves. Define what a good sales report or customer interaction looks like, then train AI to produce it. The AI will find its own paths through the organizational chaos; paths that might be more efficient, if more opaque, than the semi-official routes humans evolved. In a world where the Bitter Lesson holds, the despair of the CEO with his head on the table is misplaced. Instead of untangling every broken process, he just needs to define success and let AI navigate the mess. In fact, Bitter Lesson might actually be sweet: all those undocumented workflows and informal networks that pervade organizations might not matter. What matters is knowing good output when you see it.

If this is true, the Garbage Can remains, but we no longer need to sort through it while competitive advantage itself gets redefined. The effort companies spent refining processes, building institutional knowledge, and creating competitive moats through operational excellence might matter less than they think. If AI agents can train on outputs alone, any organization that can define quality and provide enough examples might achieve similar results, whether they understand their own processes or not.

Or it might be that the Garbage Can wins, that human complexity and those messy, evolved processes are too intricate for AI to navigate without understanding them. We're about to find out which kind of problem organizations really are: chess games that yield to computational scale, or something fundamentally messier. The companies betting on either answer are already making their moves, and we will soon get to learn what game we're actually playing.

Subscribe now

Share

Against "Brain Damage"

2025-07-08 03:02:38

I increasingly find people asking me “does AI damage your brain?” It's a revealing question. Not because AI causes literal brain damage (it doesn't) but because the question itself shows how deeply we fear what AI might do to our ability to think. So, in this post, I want to discuss ways of using AI to help, rather than hurt, your mind. But why the obsession over AI damaging our brains?

Part of this is due to misinterpretation of a much-publicized paper out of the MIT Media Lab (with authors from other institutions as well), titled “Your Brain on ChatGPT.” The actual study is much less dramatic than the press coverage. It involved a small group of college students who were assigned to write essays alone, with Google, or with ChatGPT (and no other tools). The students who used ChatGPT were less engaged and remembered less about their essays than the group without AI. Four months later, nine of the ChatGPT users were asked to write the essay again without ChatGPT, and they performed worse than those who had not used AI initially (though were required to use AI in the new experiment) and showed less EEG activity when writing. There was, of course, no brain damage. Yet the more dramatic interpretation has captured our imagination because we have always feared that new technologies would ruin our ability to think: Plato thought writing would undermine our wisdom, and when cellphones came out, some people worried that not having to remember telephone numbers would make us dumber.

But that doesn’t mean we shouldn’t worry about how AI impacts our thinking. After all, a key purpose of technology is to let us outsource work to machines. That includes intellectual work, like letting calculators do math or our cellphones record our phone numbers. And, when we outsource our thinking, we really do lose something — we can’t actually remember phone numbers as well, for example. Given that AI is such a general purpose intellectual technology, we can outsource a lot of our thinking to it. So how do we use AI to help, rather than hurt us?

The Learning Brain

The least surprising place where AI use can clearly hurt your mental growth is when you are trying to learn or synthesize new knowledge. If you outsource your thinking to the AI instead of doing the work yourself, then you will miss the opportunity to learn. We have evidence to back up this intuition, as my colleagues at Penn conducted an experiment at a high school in Turkey where some students were given access to GPT-4 to help with homework. When they were told to use ChatGPT without guidance or special prompting, they ended up taking a shortcut and getting answers. So even though students thought they learned a lot from ChatGPT's help, they actually learned less - scoring 17% worse on their final exam (compared to students who didn't use ChatGPT).

What makes this particularly insidious is that the harm happens even when students have good intentions. The AI is trained to be helpful and answer questions for you. Like the students, you may just want to get AI guidance on how to approach your homework, but it will often just give you the answer instead. As the MIT Media Lab study showed, this short-circuits the (sometimes unpleasant) mental effort that creates learning. The problem is not just cheating, though AI certainly makes that easier. The problem is that even honest attempts to use AI for help can backfire because the default mode of AI is to do the work for you, not with you.

From the Nigeria study, the grade distribution of those who participated in the AI tutoring sessions (blue) versus those that did not (red).

Does that mean that AI always hurts learning? Not at all! While it is still early, we have increasing evidence that, when used with teacher guidance and good prompting based on sound pedagogical principles, AI can greatly improve learning outcomes. For example, a randomized, controlled World Bank study finds using a GPT-4 tutor with teacher guidance in a six week after school program in Nigeria had "more than twice the effect of some of the most effective interventions in education" at very low costs. While no study is perfect (in this case, the control was no intervention at all, so it is impossible to fully isolate the effects of AI, though they do try to do so), it joins a growing number of similar findings. A Harvard experiment in a large physics class found a well-prompted AI tutor outperformed active classes in learning outcomes; a study done in a massive programming class at Stanford found use of ChatGPT led to increased exam grades; a Malaysian study found AI used in conjunction with teacher guidance and solid pedagogy led to more learning; and even the experiment in Turkey that I mentioned earlier found that a better tutor prompt eliminated the drop in test scores from plain ChatGPT use.

Our tutor prompt, the link to it is in the text.

Ultimately, it is how you use AI, rather than use of AI at all, that determines whether it helps or hurts your brain when learning. Moving away from asking the AI to help you with homework to helping you learn as a tutor is a useful step. Unfortunately, the default version of most AI models wants to give you the answer, rather than tutor you on a topic, so you might want to use a specialized prompt. While no one has developed the perfect tutor prompt, we have one that has been used in some education studies, and which may be useful to you and you can find more in the Wharton Generative AI Lab prompt library. Feel free to modify it (it is licensed under Creative Commons). If you are a parent, you can also act as the tutor yourself, prompting the AI “explain the answer to this question in a way I can teach my child, who is in X grade.” None of these approaches are perfect, and the challenges in education from AI are very real, but there is reason to hope that education will be able to adjust to AI in ways that help, and not hurt, our ability to think. That will involve instructor guidance, well-built prompts, and careful choices about when to use AI and when it should be avoided.

The Creative Brain

Just like in education, AI can help, or hurt, your creativity depending on how you use it. On many measures of creativity, AI beats most humans. To be clear, there is no one definition of creativity, but researchers have developed a number of flawed tests that are widely used to measure the ability of humans to come up with diverse and meaningful ideas. The fact that these tests were flawed wasn't that big a deal until, suddenly, AIs were able to pass all of them. The old GPT-4 beat 91% of humans on the a variation of the Alternative Uses Test for creativity and exceeds 99% of people on the Torrance Tests of Creative Thinking. And we know these ideas are not just theoretically interesting. My colleagues at Wharton staged an idea generation contest: pitting ChatGPT-4 against the students in a popular innovation class that has historically led to many startups. Human judges rating the ideas showed that that ChatGPT-4 generated more, cheaper and better ideas than the students. The purchase intent from these outside judges was higher for the AI-generated ideas as well.

And yet, anyone who has used AI for idea generation will notice something these numbers don't capture. AI tends to act like a single creative person with predictable patterns. You'll see the same themes over and over like ideas involving VR, blockchain, the environment, and (of course) AI itself. This is a problem because in idea generation, you actually want a diverse set of ideas to pick from, not variations on a theme. Thus, there is a paradox: while AI is more creative than most individuals, it lacks the diversity that comes from multiple perspectives. Yet studies also show that people often generate better ideas when using AI than when working alone, and sometimes AI alone even outperforms humans working with AI. But, without caution, those ideas look very similar to each other when you see enough of them.

Part of this can be solved with better prompting. In a paper I worked on with Lennart Meincke and Christian Terwiesch, we found that better prompting can generate much more diverse ideas, if not quite as good as a group of students.

Here is the prompt, which was for GPT-4. It still works well for other AI models (though I suspect that reasoner models might actually be slightly less innovative than more traditional models):

Generate new product ideas with the following requirements: The product will target [market or customer]. It should be a [pick: physical good/service/software], not a [pick: physical good/service/software]. I'd like a product that could be sold at a retail price of less than about [insert amount]. 
The ideas are just ideas. The product need not yet exist, nor may it necessarily be clearly feasible. Follow these steps. Do each step, even if you think you do not need to. First generate a list of 100 ideas (short title only). Second, go through the list and determine whether the ideas are different and bold, modify the ideas as needed to make them bolder and more different. No two ideas should be the same. This is important! Next, give the ideas a name and combine it with a product description. The name and idea are separated by a colon and followed by a description. The idea should be expressed as a paragraph of 40-80 words. Do this step by step.

But better prompting only solves part of the problem. The deeper risk is that AI can actually hurt your ability to think creatively by anchoring you to its suggestions. This happens in two ways.

First, there's the anchoring effect. Once you see AI's ideas, it becomes much harder to think outside those boundaries. It's like when someone tells you “don't think of a pink elephant.” AI's suggestions, even mediocre ones, can crowd out your own unique perspectives. Second, as the MIT study showed, people don’t feel as much ownership in AI generated ideas, meaning that you will disengage from the ideation process itself.

So how do you get AI's benefits without the brain drain? The key is sequencing. Always generate your own ideas before turning to AI. Write them down, no matter how rough. Just as group brainstorming works best when people think individually first, you need to capture your unique perspective before AI's suggestions can anchor you. Then use AI to push ideas further: “Combine ideas #3 and #7 in an extreme way,” “Even more extreme,” “Give me 10 more ideas like #42,” “User superheroes as inspiration to make the idea even more interesting.”

This principle becomes even more critical in writing. Many writers insist that "writing is thinking," and while this isn't universally true (I generated a pretty good Deep Research report on the topic if you want the details), it often is. The act of writing, and rewriting, and rewriting again helps you think through and hone your ideas. If you let AI handle your writing, you skip the thinking part entirely.

As someone for whom writing is thinking, I've needed to become disciplined. Every post I write, like this one, I do a full draft entirely without any AI use at all (beyond research help). This is often a long process, since I write and rewrite multiple times - thinking! Only when it is done do I turn to a number of AI models and give it the completed post and ask it to act as a reader: Was this unclear at any point, and how, specifically could I clarify the text for a non-technical reader? And sometime like an editor: I don’t like how this section ends, can you give me 20 versions of endings that might fit better. So go ahead, use AI to polish your prose and expand your possibilities. Just remember to do the thinking first, because that's the part that can't be outsourced.

I went with number four.

The Collective Brain

Another area where AI can hurt our thinking is through its impact on social processes. Ideally, the whole purpose of working on teams is that it can improve our performance - teams should be able to generate more ideas, be better able to see potential opportunities and pitfalls, and provide specialized skills and abilities to help execution. Meetings should be places where teams coordinate and solve problems. Of course, this is the ideal. In reality, one of the most revelatory management texts is actually this WWII guide to sabotage for civilians from the CIA's precursor. Look at the ideas for sabotaging office tasks to cause demoralization and delay and consider how many of them are normal parts of your meetings.

So it is no wonder that a significant early use of AI is to summarize meetings, and increasingly to summarize meetings you skip entirely. Of course, this raises existential questions like “why are we meeting in the first place if we can just read a summary?” or “should I just send an AI avatar of myself to meetings?” Obviously, there is no interaction, no teamwork, no meeting of the minds in a meeting where everyone is just there to read the transcript and nothing more. It just takes up time and effort, a form of organizational brain damage.

But rather than AI hurting our collective thinking, there is the option to have it help make us better. One interesting example is using AI as a facilitator. We created a prompt where AI acts as facilitator, creating customized tarot cards halfway through your meeting to help guide, rather than replace, your discussion. You give it a meeting transcript and it helps you bring out your best ideas (again, this is a Creative Commons license, so modify as needed, right now it works best on Claude, and okay on Gemini and o3)

This is just a fun example of the ways in which AI could be used to help our collective intelligence, but there is a need for many more experiments to figure out what works: using AI as a devil's advocate to surface unspoken concerns, having it identify whose voices aren't being heard in a discussion, or using it to find patterns in team dynamics that humans miss. The key is that AI enhances rather than replaces human interaction.

Against “Brain Damage”

AI doesn't damage our brains, but unthinking use can damage our thinking. What's at stake isn't our neurons but our habits of mind. There is plenty of work worth automating or replacing with AI (we rarely mourn the math we do with calculators), but also a lot of work where our thinking is important. For these problems, the research gives us a clear answer. If you want to keep the human part of your work: think first, write first, meet first.

Our fear of AI “damaging our brains” is actually a fear of our own laziness. The technology offers an easy out from the hard work of thinking, and we worry we'll take it. We should worry. But we should also remember that we have a choice.

Your brain is safe. Your thinking, however, is up to you.

Subscribe now

Share

Using AI Right Now: A Quick Guide

2025-06-24 00:12:17

Every few months I put together a guide on which AI system to use. Since I last wrote my guide, however, there has been a subtle but important shift in how the major AI products work. Increasingly, it isn't about the best model, it is about the best overall system for most people. The good news is that picking an AI is easier than ever and you have three excellent choices. The challenge is that these systems are getting really complex to understand. I am going to try and help a bit with both.

First, the easy stuff.

Which AI to Use

For most people who want to use AI seriously, you should pick one of three systems: Claude from Anthropic, Google’s Gemini, and OpenAI’s ChatGPT. With all of the options, you get access to both advanced and fast models, a voice mode, the ability to see images and documents, the ability to execute code, good mobile apps, the ability to create images and video (Claude lacks here, however), and the ability to do Deep Research. Some of these features are free, but you are generally going to need to pay $20/month to get access to the full set of features you need. I will try to give you some reasons to pick one model or another as we go along, but you can’t go wrong with any of them.

What about everyone else? I am not going to cover specialized AI tools (some people love Perplexity for search, Manus is a great agent, etc.) but there are a few other options for general purpose AI systems: Grok by Elon Musk’s xAI is good if you are a big X user, though the company has not been very transparent about how its AI operates. Microsoft’s Copilot offers many of the features of ChatGPT and is accessible to users through Windows, but it can be hard to control what models you are using and when. DeepSeek r1, a Chinese model, is very capable and free to use, but is missing a few features from the other companies and it is not clear that they will keep up in the long term. So, for most people, just stick with Gemini, Claude, or ChatGPT

Great! This was the shortest recommendation post yet! Except… picking a system is just the beginning. The real challenge is understanding how to use these increasingly complex tools effectively.

Now what?

I spend a lot of time with people trying to use AI to get stuff done, and that has taught me how incredibly confusing this is. So I wanted to walk everyone through the most important features and choices, as well as some advice on how to actually use AI.

Picking a Model

ChatGPT, Claude, and Gemini each offer multiple AI models through their interface, and picking the right one is crucial. Think of it like choosing between a sports car and a pickup truck; both are vehicles, but you'd use them for very different tasks. Each system offers three tiers: a fast model for casual chat (Claude Sonnet, GPT-4o, Gemini Flash), a powerful model for serious work (Claude Opus, o3, Gemini Pro), and sometimes an ultra-powerful model for the hardest problems (o3-pro, which can take 20+ minutes to think). The casual models are fine for brainstorming or quick questions. But for anything high stakes (analysis, writing, research, coding) usually switch to the powerful model.

Most systems default to the fast model to save computing power, so you need to manually switch using the model selector dropdown. (Except for Gemini, the free versions of these systems do not give you access to the most powerful model, so if you do not see the options I describe, it is because you are using the free version)

I use o3, Claude 4 Opus, and Gemini 2.5 Pro for any serious work that I do. I also have particular favorites based on individual tasks that are outside of these models (GPT-4.5 is a really interesting model for writing, for example), but for most people, stick with the models I suggested most of the time.

For people concerned about privacy, Claude does not train future AI models on your data, but Gemini and ChatGPT might, if you are not using a corporate or educational version of the system. If you want to make sure your data is never used to train an AI model, you can turn off training features easily for ChatGPT without losing any functionality, and at the cost of some functionality for Gemini. You may also want to turn on or off “memory” in ChatGPT’s personalization option, which lets the AI remember scattered details about you. I find the memory system to be too erratic at this point, but you may have a different experience.

Using Deep Research

Deep Research is a key AI feature for most people, even if they don’t know it yet. Deep Research tools are very useful because they can produce very high-quality reports that often impress information professionals (lawyers, accountants, consultants, market researchers) that I speak to. You should be trying out Deep Research reports in your area of expertise to see what they can do for you, but some other use cases include:

  • Gift Guides: “what do I buy for a picky 11-year-old who has read all of Harry Potter, is interested in science museums, and loves chess? Give me options, including where to buy at the best prices.”

  • Travel Guides “I am going to Wisconsin on vacation and want to visit unique sites, especially focusing on cheese, produce a guide for me”

  • Second opinions in law, medicine, and other fields (it should go without saying that you should trust your doctor/lawyer above AI, but research keeps finding that the more advanced AI systems do very well in diagnosis with a surprisingly low hallucination rate, so they can be useful for second opinions).

Activating Deep Research

Deep Research reports are not error-free but are far more accurate than just asking the AI for something, and the citations tend to actually be correct. Also note that each of the Deep Research tools work a little differently, with different strengths and weaknesses. Turning on the web search option in Claude and o3 will get them to work as mini Deep Research tools, doing some web research, but not as elaborately as a full report. Google has some fun additional options once you have created a report, letting you turn it into an infographic, a quiz or a podcast.

An Easy Approach to AI: Voice Mode

An easy way to use AI is just to start with voice mode. The two best implementations of voice mode are in the Gemini app and ChatGPT’s app and website. Claude’s voice mode is weaker than the other two systems. What makes voice mode great is that you can just have a natural conversation with the app while in the car or on a walk and get quite far in understanding what these models can do. Note the models are optimized for chat (including all of the small pauses and intakes of breath designed to make it feel like you are talking to a person), so you don’t get access to the more powerful models this way. They also don’t search the web as often which makes them more likely to hallucinate if you are asking factual questions: if you are using ChatGPT, unless you hear the clicking sound at 44 seconds into this clip, it isn’t actually searching the web.

Voice mode's killer feature isn't the natural conversation, though, it's the ability to share your screen or camera. Point your phone at a broken appliance, a math problem, a recipe you're following, or a sign in a foreign language. The AI sees what you see and responds in real-time. I've used it to identify plants on hikes, solve a problem on my screen, and get cooking tips while my hands were covered in flour. This multimodal capability is genuinely futuristic, yet most people just use voice mode like Siri. You're missing the best part.

Making Things for You: Images, Video, Code, and Documents

ChatGPT and Gemini will make images for you if you ask (Claude cannot). ChatGPT offers the most controllable image creation tool, Gemini uses two different image generation tools, Imagen, a very good traditional image generation system, and a multimodal image generation system. Generally, ChatGPT is stronger. On video creation, however, Gemini’s Veo 3 is very impressive, and you get several free uses a day (but you need to hit the Video button in the interface)

“make me a photo of an otter holding a sign saying otters are cool but also accomplished pilots. the otter should also be holding a tiny silver 747 with gold detailing.”

All three systems can produce a wide variety of other outputs, ranging from documents to statistical analyses to interactive tools to simulations to simple games. To get Gemini or ChatGPT to do this reliably, you need to select the Canvas option when you want these systems to run code or produce separate outputs. Claude is good at creating these sorts of outputs on its own. Just ask, you may be surprised what the AI systems can make.

Working with an AI

Now that you have picked a model, you can start chatting with it. It used to be that the details of your prompts mattered a lot, but the most recent AI models I suggested can often figure out what you want without the need for complex prompts. As a result, many of the tips and tricks you see online for prompting are no longer as important for most people. At the Generative AI Lab at Wharton, we have been trying to examine prompting techniques in a scientific manner, and our research has shown, for example, that being polite to AI doesn’t seem to make a big difference in output quality overall1. So just approach the AI conversationally rather than getting too worried about saying exactly the right thing.

That doesn’t mean that there is no art to prompting. If you are building a prompt for other people to use, it can take real skill to build something that works repeatedly. But for most people you can get started by keeping just a few things in mind:

  • Give the AI context to work with. Most AI models only know basic user information and the information in the current chat, they do not remember or learn about you beyond that. So you need to provide the AI with context: documents, images, PowerPoints, or even just an introductory paragraph about yourself can help - use the file option to upload files and images whenever you need. The AIs can do some of these ChatGPT and Claude can access your files and mailbox if you let them, and Gemini can access your Gmail, so you can ask them to look up relevant context automatically as well, though I prefer to give the context manually.

  • Be really clear about what you want. Don’t say “Write me a marketing email,” instead go with “I'm launching a B2B SaaS product for small law firms. Write a cold outreach email that addresses their specific pain points around document management. Here's the details of the product: [paste]” Or ask the AI to ask you questions to help you clarify what you want.

  • Give it step-by-step directions. Our research found this approach, called Chain-of-Thought prompting, no longer improves answer quality as much as it used to. But even if it doesn’t help that much, it can make it easier to figure out why the AI came up with a particular answer.

  • Ask for a lot of things. The AI doesn’t get tired or resentful. Ask for 50 ideas instead of 10, or thirty options to improve a sentence. Then push the AI to expand on the things you like.

  • Use branching to explore alternatives. Claude, ChatGPT, and Gemini all let you edit prompts after you have gotten an answer. This creates a new “branch” of the conversation. You can move between branches by using the arrows that appear after you have edited an answer. It is a good way to learn how your prompts impact the conversation.

Troubleshooting

I also have seen some fairly common areas where people get into trouble:

  • Hallucinations: In some ways, hallucinations are far less of a concern than they used to be, as AI has improved and newer AI models are better at not hallucinating. However, no matter how good the AI is, it will still make errors and mistakes and still give you confident answers where it is wrong. They also can hallucinate about their own capabilities and actions. Answers are more likely to be right when they come from the bigger, slower models, and if the AI did web searches. The risk of hallucination is why I always recommend using AI for topics you understand until you have a sense for their capabilities and issues.

  • Not Magic: You should remember that the best AIs can perform at the level of a very smart person on some tasks, but current models cannot provide miraculous insights beyond human understanding. If the AI seems like it did something truly impossible, it is probably not actually doing that thing but pretending it did. Similarly, AI can seem incredibly insightful when asked about personal issues, but you should always take these insights with a grain of salt.

  • Two Way Conversation: You want to engage the AI in a back-and-forth interaction. Don’t just ask for a response, push the AI and question it.

  • Checking for Errors: The AI doesn’t know “why” it did something, so asking it to explain its logic will not get you anywhere. However, if you find issues, the thinking trace of AI models can be helpful. If you click “show thinking” you can find out what the model was doing before giving you an answer. This is not always 100% accurate (you are actually getting a summary of the thinking) but is a good place to start.

Your Next Hour

So now you know where to start. First, pick a system and resign yourself to paying the $20 (the free versions are demos, not tools). Then immediately test three things on real work: First, switch to the powerful model and give it a complex challenge from your actual job with full context and have an interactive back and forth discussion. Ask it for a specific output like a document or program or diagram and ask for changes until you get a result you are happy with. Second, try Deep Research on a question where you need comprehensive information, maybe competitive analysis, gift ideas for someone specific, or a technical deep dive. Third, experiment with voice mode while doing something else — cooking, walking, commuting — and see how it changes your ability to think through problems.

Most people use AI like Google at first: quick questions, no context, default settings. You now know better. Give it documents to analyze, ask for exhaustive options, use branching to explore alternatives, experiment with different outcomes. The difference between casual users and power users isn't prompting skill (that comes with experience); it's knowing these features exist and using them on real work.

Subscribe now

Share

1

It is actually weirder than that: on hard math and science questions that we tested, being polite sometimes makes the AI perform much better, sometimes worse, in ways that are impossible to know in advance. So be polite if you want to!