2026-03-21 20:31:41
Here’s a thing I keep coming back to. Within a few years, the average company is going to have dramatically more AI agents running than human employees. Agents handling customer inquiries, doing sales service, monitoring assets, running pricing experiments, flagging exceptions, managing vendors, and so on and on.
When that happens, running a business starts to look like a videogame. Hundreds of autonomous entities operating across a complex environment. Agents will be working inside work devices. They’ll be talking to customers, they’ll be available 24/7. They’ll spawn new agents and combine old ones. They’ll have email addresses and Slack accounts. They’ll be colleagues.
But, how do you play this videogame? Hundreds of windows and tabs, for each digital employee, or for each department? Autonomous can’t mean no oversight. Humans are autonomous and we get oversight. When thousands of agents are making thousands of decisions a day, you can’t manage the old way, by check-ins and check-outs and quarterly reviews. You have to find a new way, manage by exception - scanning for anomalies, reviewing what broke, simulating what to do next. As my friend James Cham said, work used to be first person shooter, where you’re directing every movement and every shot, which is what we do today, and it might become more like Starcraft, where you have to move people and agents around to achieve your objectives.
And to do that requires a model of the business underneath. This mostly exists today in various people’s heads but rarely is explicit. We can’t even stay on top of our emails, much less a thousand or a hundred thousand workers. A big benefit of digital labour though is that you can have a precise state of the business at every point in time.
We have solved this problem before. When we needed to figure out how to train autonomous cars, for instance, we needed an environment that’s realistic and the ability to run “what ifs” in a controllable simulation. Waymo and Tesla built these as World Models. The equivalent for business already exists in the heads of management in every company. Every CEO is constantly running “what happens if I do x” in their heads. They just can’t operationalise it because there’s no ‘environment’ that reflects their business to run it on! World models already exist anywhere the environment is expensive, instrumented, and operationally constrained - factories, grids, airspace, battlefields, fabs, networks, wells, and warehouses.
What’s needed in the enterprise world is such a world model - an engine that knows the rules, tracks the state, understands and predicts consequences.
The environment would connect to the systems a company already runs, the information that is gathered, the agents it uses, and build a live operational model of the business. Scale it across companies and you have the training data to build a compelling environment and an even better world model!
There is no way to get to a world of AI agents as employees without something like this.
We can’t build this abstractly in a box. The real economy is complicated. We have franchise systems - hundreds of locations running the same playbook with local variations. Multi-site healthcare - clinics, urgent care chains, dental groups, all drowning in disconnected EHRs and billing systems. Professional services networks - law firms, accounting firms, consulting shops with multiple offices that can’t see across their own operations. Real estate portfolios. Logistics networks.
Forget the architecture for a moment. Maybe let’s take an example, one vertical - say a real estate company.
They have, say, 15 holdings across the southeast. Each one runs StorEdge for property management, QuickBooks or Sage for accounting, some CRM for leads, a work order system, maybe SoLink cameras. Multiple customer service softwares and a phone line. None of these systems talk to each other. The district managers have spreadsheets, updated manually. Understanding what decisions need to be made is cacophonous! They have dealt with this by having a few AI agents for marketing copy and CRM updates. They also have orchestration solutions and perhaps observability for those agents. The executives get monthly reports as a pdf.
Now, when all of these are either run by agents or you have agents helping, what you’d really want is not to see the tool-call traces of each one, but get a synthesised image of how the company is. What’s the ROI of doing certain actions. How will the outcomes of a decision flow through the company. What are the key things to be focused on right now? What actions need to be made for the best results, and what results even matter? Even when you’re just responding to the markets or the competition, each decision is amongst counterfactuals.
An enterprise world model would connect to all of it to try answer what happens next if you act.
Say a competitor cuts prices in a submarket and occupancy starts dipping. An agent flags the dip and the model simulates the responses: match the price-cut and hold occupancy which might compress margins by X%, or hold pricing and lose Y units over Z weeks, or just increase marketing spend by $W and recover the gap. It can show the likely P&L impact of each path and ROI.
Or, a district manager asks about a $60k roof repair. The model knows that this pattern of maintenance requests - three HVAC calls, a roof leak, a parking lot complaint - has preceded a $500k+ capex event within 4-6 months. It simulates the tradeoffs in the environment - approve and extend the asset’s life by X years, or defer and risk a larger spend later.
Or, a property is converting leads badly. The model surfaces the stat, simulates decisions, and identifies that response time is the lever (like, say, properties where managers respond within 20 minutes convert at 2x) and simulates the impact of enforcing a 15-minute SLA, e.g., projected conversion lift, staffing costs, or net revenue effects.
Each of these is an action-outcome pair. The point is to learn which interventions produce which consequences, and that learning compounds over hundreds of companies, building the operational equivalent of what Waymo’s world model on top of a realistic simulation of every business: a simulation you can query with “what if?” before you commit to the road.
Think about what a COO’s day looks like once this is running. The agents already made thousands of decisions overnight. Her morning starts with reviewing deltas to see what broke, what improved, what patterns emerged that nobody expected. The model scores outcomes against baselines continuously. When she wants to try something - a different pricing strategy, a change in lead routing - she simulates it through the model and sees the likely impact.
The loop runs continuously. Management becomes all about triage and simulation.
There’s starting to be a lot of activity in this direction, building some of the core pieces.
Orchestration companies are building agent governance and workflow layers - mostly hand-crafted agent hierarchies.
Observability companies watch what agents do but don’t predict the consequences of doing something different.
RL environment companies are trying to create structured training data from real operations.
Enterprise platforms like Palantir serve Fortune 500 for bespoke implementations.
But there’s something holding these back, making them seem like little features. The key distinction is this, a world model. It predicts what will happen if you intervene. Which means all of these - orchestration, agent management, data integration, RL environments, continuous evaluations - are pieces of the same thing. They’re features of the enterprise world model. None of them, on their own, can answer the question: “if I do X, what happens to the business?”. And that’s what we’ll need.
There’s this constant question that echoes Solow, about where the impact of AI is on productivity or the broader economy. To not fall prey to that paradox we will need to do to the rest of the world what we’ve done to code - create an environment where we can see and test the impact of every decision and be able to simulate the effects of an action. To do this, we’ll have to convert messy, unstructured business operations into an environment, defined action spaces, evaluation criteria, and capture outcome data. And you’ll have to do this across thousands of businesses. That’s why model providers like OpenAI are paying to build this manually through programs like their Thrive Capital partnership, embedding engineers into portfolio companies one at a time.
An operating partner who walks into a company and sees how it works - that’s what is next to be built in software. If we want to build a one person unicorn, that’s what’s needed. To automate the economy, to give AI what the human has, a world model in their heads.
2026-03-14 19:31:41
I remember the first one-person billion-dollar company. It wasn’t mine, I wasn’t working yet and was only an observer, and a distant one at that. But it felt exhilarating. A breath of fresh possibility, like any of us could do anything. A milestone in what humanity is capable of.
It lasted for a month.
The founder did very well for himself obviously, but within a matter of weeks someone else beat the record. One-person-unicorn became the 4 minute mile of company building, another rubicon crossed. Once the world knew it was possible it became inevitable. Because a world where one person can create a unicorn is also a world where another person can also create a unicorn. Maybe a day after, maybe a week, but pretty soon and it’s inevitable. And we saw the inevitable happen, four more in the next few weeks, till it became somewhat normal.
Entrepreneurship had already become a game in the 2010s as the saas boom made building big companies in short periods of time easily possible. The result was an incredible boom, many of them competitive with each other, with extreme dispersion in outcomes.
And now, when building became even easier, the equivalent to telling someone else to build things, it predictably got crazier. Not quite as easy as “make me a unicorn” but closer to it than what we’d had. Can you imagine if it was that easy? Everyone and their grandma would do it.
As the amount of effort we needed to put in to show the minimum of traction was reducing something had to shift to move us to the new equilibrium. If all people needed to do was be faster than others to ask a question, that’s a speed race to the bottom.
Once upon a time it was actually executing that was the bottleneck, soon it was project managing the thing you were executing. Then it was choosing the directions and making editorial choices about what thing you should create or run experiments on. By this layer of abstraction it was less about what could be made and more about what needed to be made. Because everyone could get AI to make almost anything it felt like but no one knew for sure what everyone else wanted.
My career started in earnest a bit after this. We all had eight monitors with running information streams from all over the company, and outside. I was called an analyst, because even though analyses had become cheap but accuracy hadn’t. Someone had to monitor the drones.
This was fine, actually. It’s not really what I thought I’d be doing but then it required me to think super fast and make a lot of decisions and keep on top of them, and try to automate some parts of those too. I liked it, sometimes it was even fun, though a lot of it was quite rote. I worked on the shipping industry side, accidentally if I’m being honest, that’s what was allocated to me, but turned out this was a pretty good window into the world. I had to keep on top of things from did the tanker break an engine part to like crude oil prices to atmospheric conditions in some strait.
Quite a lot of it was also dealing with competitors. I mean the normal stuff the AI could do, but the fun part was to confuse their AIs. Ships seemingly going the wrong way, or water displacement made to look fake, all sorts of tricks, some legal and some not. We all had the same machines but adversarial games are more fun, you know?
The rest of it, to look at the machines themselves and react quickly when necessary, that was okay. A hard job, much harder to pay continuous attention than to actively do things, at least for me. A lot of it was also reactive, and not just because of the adversarial problems. Like, even though the ability to analyse and communicate anything became instantaneous, it hadn’t necessarily helped in making the right decisions all that often yet. What it did mean is that if you were making a mistake, you got to make it faster now. There was no escape from Hayek. Every part of every company became more efficient in doing things even as knowing if you were being efficient in the right direction remained a mystery.
It felt like playing a videogame, highly stressful. You were always on call, always trying to figure out what broke and fix it, or find ways to game around what someone else might do. It was hard.
One thing that eventually helped me a year or two in was that corporate secrets stopped existing. Or at least they didn’t survive for long. Anything anyone did could be reverse engineered pretty quickly. As soon as things turned more adversarial this was probably inevitable. Who knows maybe it might have been just at the same rate as AI in the 2020s or software in the late 2010s or the entirety of Chinese manufacturing knowledge before that but it didn’t feel like it. Living through it felt like sailing the high seas, pirates and privateers at all sides.
Despite the respite brought by the new world, I ended up quitting that job another year after this. Just being alert for hours on end every day was hard, and no amount of laying around or drinking cured it. It had also meant that long undivided time to think and come up with ideas on your own was a dying art, and I had dreams of contributing to the world this way.
I understand why it was hard though. It’s hard to spend a decade coming up with a new idea for a car when you could just steal your competitor’s ideas that already worked. Why take a risk. The world became much less divergent. Sure, people did try to do things that were unique but like the Hollywood of yore everyone just copied from everyone else while occasionally a great piece of cinema broke out from nowhere.
I did feel though the size of individual companies were shrinking on average while the top exploded. When I was looking for jobs I kept seeing this. The one I ended up working for was tiny, maybe about 30 people. It was either this or just go independent contractor route. The Coasean bargain that made some companies larger broke apart, there ended up being a much larger number of individual contractors and smaller companies than were feasible before. Even I thought about joining their ranks, which would’ve been a bit more work.
Identifying and capturing those people is the most incredibly important piece of leverage. Some of the largest companies ended up being the conglomerates made up of these people who individually wanted to go and help them figure out answers to problems that they could not answer otherwise. There were other options too I suppose, like the original AI lab model which by now had disappeared, they had many fewer employees than those old companies of their size, but did run a large network of arrangements that would make the economically dependent population number in the hundreds of thousands.
As the AIs got better firms soon started calling for a new type of role, an “analyst”. They would get brought in to do a particular task once, whatever it took. I started out doing this for worldwide logistics networks. Deciding when AIs started going in a loop against the others it negotiated with for prices and goods or routing decisions. Which factories needed to be built, and which types of models to analyse for those. What pieces of data we were collecting were actually trustworthy, when the world had changed enough that our very model had to shift.
We all had something installed that could read and analyse everything that was done on the machine, to help us do the job better. But pretty soon, at the end of it, the AI just learnt from what I’d done, every part of it, and be able to just do it from then on.
Every job was the last job. What is done once got done for all time. The progress bar would go from 0 to 100 as you did it, and once done it remained done.
I remember getting paid for one of these jobs, about shipping logistics; it took a week and I made as much as I’d made in a year before. The value was high, and I was too stupid to think about “terminal value”.
These gigs themselves were also better as the AIs got better. It was much less stressful than frenetically monitoring it yourself like before. Mostly supervising other AIs, sometimes other people, sometimes other people supervising AIs. I hear of the days when people used to have the same job for 40 years and it sounds like a fairy tale because people today have jobs for 4 months. If they’re lucky with that they get to own a piece of the machines.
Some friends who were smarter started to ask, what even is a “job”? And I too worried, things of all my projects, would this disincentivise deep thinking? In the end it did, a bit, but the market corrected as time went on, because capital had to find ways to protect ideas, especially since many of them could be now reverse engineered. A lot more secrecy, for a short time could be monetised, because soon after you knew it would be known. I wasn’t at the cutting edge of anything enough that I could ask for a billion dollars and quiet time, but some were, and they prospered. Even the whiff of a good idea was enough.
This was the hardest part, because until this point all jobs people did throughout their lives relied on the jobs themselves being somewhat predictable day to day. Nobody except maybe some CEOs during a particularly tumultuous phase had to do completely different things hour after hour, day after day. What it meant to get paid for a few minutes of your time, a form of knowledge transfer, instead of getting paid for your actual labour, was enormously complicated, and societally destabilising.
Nobody quite figured it out but much of it ended up similar to contract work, where the work was timebound and sporadic and you got paid a premium for this gig work. These companies aren’t really companies, they mainly “collect” many of us to save us the trouble of searching. A thin wrapper between my agents and those that want my efforts.
The goal of doing all this, of your career, what constituted actual success, was to own capital. Most of my work has been in turning my personal labour into capital. And it was still good to own capital. It always is. You could deploy it and see people line up to take it and build things that would change the world in months. After all, building physical things remained a problem. Logistics remained a problem.
Anyway I don’t know if this is worth it to be honest anymore. What’s the point of having cash that you could give to an entrepreneur to build something, when others with capital could also do the same, and make damn sure that neither one of you would make much money without getting lucky? If the true skill of my labour is not differentiated enough, then what’s the point of just pouring more? Won’t everything just become highly competitive but undifferentiated, like in the commodity markets?
Those markets, despite the product being literal commodities and the process being the only differentiable part, mostly survived because different places have different regulatory structures and codified preferences. Which in turn determined who ends up being the marginal producer that can then be refined or transported or used. And so on and on.
The only choice was the robots, which were plentiful, everywhere. Robots gave leverage, a person could use it to help teach it how to do certain pieces of work and then supervise it thereafter. This held just as true for those who manufactured the robots as those who used it. The idiot index might have been a useful target to aim at after all. And with robots it’s no longer the case that you need hundreds of thousands of people in these industries. Energy and land remained bottlenecks, because you could always use more and they could always be cheaper, but the world didn’t oblige to the exception of everything else.
Don’t get me wrong, there was innovation to speed these up, but ultimately the decisions of what to invest in, what to create and what to make faster all turned out to be market problems as opposed to analysis problems. And market problems are wicked, and you cannot solve it just by running fast. It requires actually traversing the demandscape and banging your ideas against the real world, there are no shortcuts.
Even for those who had abundant capital, figuring out what portfolio of bets makes the most sense remained difficult because the response required information gathered from all over the world. And compared to how long a ship takes to build and sail, the decision on what type of ship to build didn’t take that long at all, even though it mattered the most. Those who claimed to have some insight in how to help folks figure this out lived quite well.
When I look around these days though, food is cheap, goods are cheap, learning is cheap, health is cheap, and if you want something more the amount of labour you need to provide those basics is miniscule. It all seems pretty nice. The biggest surprise from the heady days when the future was utopia might be that the pace of scientific discoveries changed, but not too much. I’m no scientist so I couldn’t tell you why this was the case, but it’s true. We did get better food and medicines, but string theory remains a theory. There are flying cars, but nobody’s riding rockets to the moon. I think maybe the discoveries just maybe weren’t bottlenecked by our inability to do analyses in the first place? We could run a ton more tests now but there are only so many problems we could brute force our way through. And once the low hanging fruit got picked over in the early 2030s, we sort of all got stuck again. Like how fundamental physics was in the late 20th century I’ve heard, stuck needing new ways of conceptualizing the world.
Attention is still capped because there are only so many humans. There are only so many hours in the day. One person’s gain is another’s loss. If you’re reading an essay it means you’re not reading another essay. Zero sum. The most drastic change was what happened when the only signalling that was costly was individual presence, since everything else could be faked.
For most of us who are at least somewhat young, in the last few years the world took a turn and became a lot more analog. Many of us don’t remember a life unmediated by the digital realm, but that was changing. When nothing you see or hear could be easily trusted then what remained were small enclaves functioning like private clubs. If you couldn’t be trusted you couldn’t enter. But even there, the rules had to become draconian because our daemons, our digital twins, our agents, could penetrate it if we had permission. Hence, physical presence.
This physical network also meant agglomeration, which is why I moved cities. Not for commerce, or work, but for my social life. I mean, it was either that or live a nomadic existence, traveling the world and seeing others wherever they are.
That’s mostly what I do now, while doing the occasional decision support job in areas I had learnt quite a bit about over the years. I have to keep spending some time every day making sure I keep up with the latest, but it’s fine. The jobs are sporadic, but it pays a good living even though you always feel like the other shoe’s just about to drop. The remaining time I have, which is most of it, I spend making entertainment for others in ways that are, for now, hard to imitate. There are physical plays people put on now that I go to sometimes, participate in sometimes. It feels good. This is life.
2026-03-03 04:26:12
Last week was a bit crazy. In many ways, but specifically with AI. For those who were blissfully unaware, The Department of War picked a fight with Anthropic over the ways they were allowed to use the model. The fights, as is often the case with the administration, got nasty. Anthropic said no we won’t budge, DoW got angry, and threatened to cut them off and declare them a supply chain risk. A few hours after, OpenAI said they managed to get another deal, apparently a better deal, and one such that any other AI lab can also avail the same terms.
So naturally everyone is angry. Anthropic is angry because they were declared an SCR. DoW is angry because someone tried to force their hand. OpenAI is angry because everyone seems to call them opportunistic ghouls, more or less. The media, both independent and institutional, loves it because they get to play their favourite game of good guy-bad guy.
I really didn’t want to write about this. But it is important, contractual disputes are actually interesting, and sometimes that deserves an explanation.
The facts are roughly as following, Anthropic had an agreement via Palantir to work with the DoW. They’ve been doing it since mid 2024. They made an different, supposedly unsafe version of Claude to do this. Somehow over the last week, they got into a tiff with the DoW, supposedly over some red lines they had (no mass surveillance and no autonomous weapons) or rather who will get to say what those lines are and when they’re crossed. OpenAI signed a contract which had those same red lines and an enforcement mechanism.
Now, the claims are roughly as following, noting that nobody knows if they’re true. Anthropic asked questions about the Maduro raid where it was used, and the DoW got upset. DoW asked a hypothetical about how to do autonomous missile defense using Claude, and got a non-answer that they’d need to talk to the CEO and they’d ‘work it out’. Anthropic asked for their red lines to be enforced by enabling them to act as the party to approve it (you’d ask them if you had a question). DoW wanted language referring to “all lawful use”, basically saying if what they’re doing is legal you can’t tell them what to do, especially during operations, i.e., you can’t tell them to stop doing something in the middle of an op. OpenAI said sure, we agree to all lawful use, but note these specific laws and regulations, and we will control the deployment of our models, using our people, since we know what it can and cannot do, and help you guys out.
Every point above is a claim, and we have no real proof. People are desperately trying hermeneutics of the OpenAI position and blogs, but honestly it feels kind of silly since we simply don’t have the data to conclude they did a bad thing. Or, particularly silly, that they defected in a prisoner’s dilemma. What we do have, are concerns. Concerns like:
Didn’t OpenAI just accede to “all lawful use” and therefore allow mass surveillance on Americans?
How can you let a private company tell the DoD, you should ask us if you’re violating any of our red lines during an operation?
Why did OpenAI sign an agreement so fast anyway, surely they just said yes when Anthropic said no?
What do those red lines even mean?
Also, Anthropic and OpenAI seemed to have the same ones, how can that be?
Can’t the government or the DoW just make up its own laws as it does anyway? Who can stop them?
How can you guarantee this means the DoW won’t cross any red lines?
What do technical safeguards mean, how are they enforceable?
Etc…
Many valid questions, but I refer you to the openai blog, dario’s written statement, and Sam’s AMA for various points of view on them. They do cycle between thinking of the government as Leviathan, an entity you cannot negotiate with, only appease, and thinking of the government as Loki, a trickster you need to subdue or overpower.
My interest though is broader than who said what to whom, or who’s virtuous and who’s not, as I think yours should be. It’s not to relitigate the facts, but think about the following:
What are the right safeguards to put in place when a piece of technology is deployed as a tool by the DoW?
How do we enforce any of it?
Let’s think about this for a moment. Imagine you are dealing with the government for a moment as an AI lab. They want to buy your AI, and you want to sell it. How would you safeguard it?
You know that plenty of things are legal, but not “good”. So what’s the choice here? You could of course just try not to deal with them at all. But once you decide to do it, there’s either you need contractual provisions you think they would adhere to and execution guardrails you can have some control over.
You also know that plenty of things are legal, but impossible. You cannot build a stairway to the moon regardless of the fact that it’s legal. Saying “I want GPT to build my defense strategy in Iran” would be such a thing to ask, you can ask it you won’t get good answers. The AI labs both want to say that.
So, you have to write some provisions into the agreements. Of course, the DoW can buy anything it likes, and you can add constraints on the stuff you’re selling, but they have to be clear. This is true of all contracts but of course with defense it’s even more important. For the same reason that it’s important in a hospital. To take a silly example, most models will rightfully have safeguards against violence or nudity, but imagine we also need them to treat burn victims. It can’t be a blanket no, you need to figure out some way to separate what’s allowed from what’s not, and before it gets deployed ideally so that you’re not doing this live when someone’s in the OR.
Which is to say that whatever they’re using, the lines have to be clear. Either some things are allowed, or they’re not. As little ambiguity as possible. The DoW would also want the power to determine courses of action, and can’t leave operational control in the hands of another. This is the now infamous scenario that someone apparently painted in discussions with Dario, if a missile was heading towards the US would they be ok to use Claude to defend.
Apparently Dario said they’d work it out, and also later said they can carve out a missile defense aspect from the contract, but you hopefully see the problem. You could easily come up with a dozen other scenarios, so do you just keeping coming up with them and then taking them off the contract because ‘that seems fine’?
The other “red line”, about mass surveillance, is similar. What does that mean? You ask a dozen people, as Zvi did, you get a dozen different responses. Going from a vague feeling to something that’s specific is really difficult.
Now the DoW’s position seems to be that let’s just do it according to the law. The law is clear enough, or at least clearer than a goal that we might share. Laws are an operationalisation of principles we hold dear.
But what if the law has loopholes? If we disagree with the law? You still have to find some ways to make that clear, but honestly you either draft a contract airtight enough to solve for those, or you have to believe that your counterparty will obey the law. You can draft “permissions-based” (enumerated) vs “restrictions-based” (negative list) provisions, if you’re clear enough. And it makes sense to have explicit contractual red lines, even if unenforceable mid-operation, since they create legal exposure and political cost for the government if violated. But they aren’t clear though, then no contract can save you, and saying “I will decide” will not necessarily break in your favour.
Terms like “reasonably requested” or “as appropriate” or “reasonable doubt” are standard legal terminology precisely because you can’t nail down every eventuality on every contract, these capture some combination of norms and prior history to gesture at the types of things that will be ok and types of things that won’t be.
Because the only thing that matters is whether you have any visibility into their actions in the first place. The Anthropic deployment was of a separate version of Claude, under a different ToS, deployed by someone else. Which means, they probably had limited visibility into what it was being used for. Which also means the only way to enforce any standards is to codify things quite a bit upfront - it’s like doing an on-premise installation vs saas.
OpenAI’s contract on the other hand seems to have been hand-in-hand with their own teams of FDEs and something they call a safety-stack (guessing cloud deployment of their own models and some checks therein, I don’t know). Which means they have much more operational visibility into the model usage, which also means they have the leeway to negotiate if the usage of it started to violate any of their ToS.
I have no real opinion here on which is better. Contracts are not inherently all-powerful, they’re only powerful insofar as they can have oversight. I do have an opinion that neither is inherently superior to the other, even if what we know about them is accurate, which might not be the case. One has more contractual protections and limited operational visibility, the other has lower contractual protections and higher operational visibility. The first one relies more on trust with the counterparty, the second one relies more on execution control. Both rely on the existing legal system.
This entire saga seems to me like it was a personality clash rather than a contractual dispute. A version might well be: Someone asked a question about Maduro raid. DoW got upset they’re being asked. They posed a hypothetical. Anthropic’s response was bad, confirming DoW’s prior assumption that they’re trying to control the deployment. Which is why even though they were so close to being effectively done with the agreement the Secretary of War decided to blow things up.
To reiterate, it’s really bad to call Anthropic a Supply Chain Risk. This is just not true. It is eroding yet another norm about what capricious governments could do at a time we should not be eroding it, we should be strengthening it. It is perfectly fine for Anthropic to have rules about how their AI ought to be used. It is perfectly reasonable for DoW to say nah that’s not going to cut it, I don’t want to ask for permission.
But what is true is that this should not be much of a surprise considering the constant rhetoric over the past few years has been that AI is a power like no other. It’s like nukes, but times a thousand. We need regulation. And when an industry repeatedly calls out for oversight, asking for someone to make the rules on how it should be used, you cannot be surprised when the Defense department take that seriously. You cannot be surprised when they make up their own interpretations of what ought to be done, because you were insufficiently prescriptive. They will listen to your articulation of any red lines and wonder, what do you mean you want to tell me how to use the mega-nuke-crazy-power that you yourself are saying you don’t know how to control?
The US has nationalised or regulated whole industries for simpler reasons. Telephone lines, rails, steel mill attempted seizure, these aren’t small things. And that’s not to mention the times the government has threatened to do this, from JFK to FDR.
So if you think AI is important we’re going to see more of this. You simply cannot call your technology a major national security risk in dire need of regulation and then not think the DoD would want unfettered access to it. They will not allow you, rightfully so in a democracy, to be the arbiters of what is right and wrong. This isn’t the same as you or me buying an iOS app and accepting the T&Cs.
But it’s also true that a corporation acting as a bulwark for democracy against the government is fundamentally weird, even if true. Democracy is incredibly annoying but really, what other choice do we have! What we don’t have is a reckoning with the power that is now reality.
I am extremely uncomfortable with the fact that we can just purchase commercially available data on almost everyone. I am also somewhat uncomfortable that the future of war is going to be autonomous though there are days where having Claude or GPT decide where to bomb seems better than an average 22 year old. I’m uncomfortable that in the pursuit of absolute security we have effectively given up our privacy, and all that remains are small shreds that only sit with a couple of large technology giants. I’m uncomfortable that the few shreds of privacy that did exist can now be reverse engineered away using pretty normal AI tech.
I also am not sure there’s a way out where we would ever have digital guarantees of privacy. I think our children will think that a quaint old notion. “What do you mean, I can of course just ask my AI to analyse a bunch of information and figure out who ratmonster2024 is.” The work that only NSA used to be able to do a couple decades ago is probably within the grasp of the average startup, if they cared. Genies don’t tend to go back into bottles, and this one has powerful forces keeping it out.
The future will bring these questions to bear, much faster than anyone might expect. The current world survives because a lot of analysis is effort-bounded. If that’s gone, a lot of things we previously assumed secure will also go away. This is coming, whether you want to or not. The best part of last week is that the issue became higher profile, again. But bringing attention to the issue is only the first part. Unless we know what we want to do with the attention, tribal politics is going to overwhelm it all.
I had a conversation with and an august panel last week. It was really really good, and you should check it out.
2026-02-01 22:00:16
A series of observations about Mexico from my travel over the holidays, now that I’ve had time to digest. I went Mexico City, to touch the Aztec, Zapotec and Mayan civilisations, at least cursorily, which made me inordinately happy. It’s the first time I’ve gone, but I got a few days in each place to actually just be which is the only way to travel in my opinion. I’d read a bunch of books before and during my trip, but what I came away with most strongly was the impression of a country that’s psychically much larger than it is physically, with the weight of a few layers of history, and with a peculiar mix of life.
Mexico is like if India was richer, things were cleaner, while being much (much!) more unsafe. This showed up for me almost everywhere I went, often in the background, often not. For instance, this means that while in India you will see a lot more spaces for the rich or large luxury malls, in Mexico it feels like those are hidden away inside secure compounds. In fact the only place I saw this easily accessible and displayed was in Cancun, which is as if the Mexicans built a tourist place just for the Americans and made it look like Dubai.
I was shocked that Mexico City still has a murder rate 1/3rd of NYC in the 1990s. Turns out this ignoble list is also dominated by Mexico.
I continue to be just constantly amazed at how safe India is. It has no right to be so, it’s poor, ill organised and the justice system moves like molasses. I first had this thought in Nigeria, and have repeated this observation in too many countries to name. Central and South America look likely to only exacerbate this question.
This is particularly germane in Mexico because Mexico City reminds me a lot of Delhi, albeit with somewhat worse roads, less people, and far cleaner sidewalks. And entire squadrons of police cars with visible guns every block or two in all the tourist friendly areas.
An interesting aspect that I had never considered is Mexico used to be bigger than the US when it owned most of the US’ current southwest. The country still seem to remember this in their bones. They’re 130 million people but feels much larger. The weight of most of mesoamerican history centers it. They have a Place in History, writ in capital letters in the national psyche.
The level, variety, and affordability of street food remains one of Mexico’s major success stories. Plentiful, tasty and cheap. I largely prefer it to restaurant food. Tlayudes ftw.
Going through the Zocalo in Mexico City is a full body immersive experience, and not one I care to repeat. On the other hand it is massive, disorganised in the best way, and sells anything and everything you can imagine. We got lost inside it and had to trek a dozen blocks in a randomly chosen direction to get out. We realised this after calling an Uber and waiting 20 mins before realising it’s never going to make it.
This is also a plus, because just like the lack-of-zoning-success-stories of almost every country except the US, it makes Mexico City undeniably attractive to every American, who of course love mixed-use easily walkable cities as long as they don’t have to live in them.
This exact reason also makes Cancun the worst place in Mexico I visited, because it’s built for tourism, has a hotel zone, and fails my “Civilisation Test” which is the number of cafes in walking distance. In case you were curious, the winner was Oaxaca. Excellent coffee, and even better hot chocolate.
Mexico City truly is a cultural capital. Incredible museums, great art, great food. The Museum of Anthropology in CDMX is the best museum I’ve seen (‘n’ is very high here).
The main Cathedral is absolutely gorgeous. And being built on the remnants of the lake you can see the effects of the soil moving about as the cathedral is a bit slanted. The styles are more eclectic than you’d find in a European city, and more ornate than I personally like, but worth seeing.
Walking among the Aztec ruins next to the Cathedral is a quasi religious experience because they’re so well preserved. The feathered serpent, Quetzlcoatl, is everywhere, encircling the plazas, out of the walls, surrounded in parts with forms of corn and shells.
As usual I found the fact that until recently tearing down an ancient monument and building another gorgeous monument to be normal and not at all noteworthy, to be interesting. Something we can learn from.
The Aztecs took their iconography and religion seemingly from Teotihuacan, which is an hour away. It’s an older civilisation, 600 years before Aztecs, whose traces they clearly discovered and were influenced by but knew little about. They didn’t know who they were, what their society was like, what they called themselves, nothing. So they, rather whimsically, named it Teotihuacan, the place where gods came from, adopted many of their gods (or so it seemed to me), for instance named the feathered serpent Quetzcoatl, and generally lived a grand life of military conquest for a couple centuries until Cortez arrived.
I can understand why. Teotihuacan is extraordinary, and the Pyramid of Quetzcoatl in particular is magnificent. Considering they didn’t have metal or pack animals this is all the more impressive. The ability of humans to accomplish incredible things at scale never stops continuing to amaze me.
I have not been able to make up my mind about the import of human sacrifice and how much it’s true/ false/ exaggerated compared to other historic cultures.
Driving in Mexico City is very hard. Half the roads are tiny and don’t even look like roads. The green signs that show the roads and destinations often had three names none of which matched what Google maps said, so it was entirely visual navigation. I am now ready to drive in India.
Mexico City also has cable cars as a core mode of public transport, which I hadn’t seen before, and looks wonderful especially when stuck in a traffic jam. I wish the US had these, or indeed any public transport. I tried to take one but it was night and gpt recommended the amount of changes I’d need to make to take a ride was not safe and I shouldn’t do it. So I had churros and cafe de olla instead.
As my 8yo observed, the infrastructure got better as we went from Mexico City to Oaxaca then to Cancun. Curious.
Oaxaca is a jewel of a place. Fits in your palm, highly walkable. High civilisation score. Great food. Great cathedral, though the churrageruerisco was not the best of its type, didn’t come together cohesively.
The street food is plentiful and good. The speciality is mole, a particular type of sauce with mixed spices, and chapulines, fried grasshoppers. Apparently delicious when mixed into butter and eaten with bread.
Oaxaca also had the highest density, originality and quality of art I’ve seen in a city since
There’s plenty of prehispanic food and drink about. Tejate was meh to me, though a latte tejate I had at a market was extraordinary. Generally I remain a fan of modernity, we’ve perfected much of what history revered (and made them better).
Monte Alban, an hour from Oaxaca, is worth visiting. Zapotec built, on top of a hill. Gorgeous views all around. The guide told us when it was built and during the heyday it used to have 9 months of rain, so the water would flow down to the sides of the hill through channels that were cut, and this would supply water from the priests to the commoners. But the water dried up during a long drought lasting a couple decades, people lost faith in the priests to bring rain by praying to Tlaloc, and folks left. So it goes.
The burial rituals were fascinating, they would put the body in a small enclosed space for 4 years, shut tight so no smells would escape, and then would remove the bones and put them in an urn. If more people died they had different spaces like this outside the house.
The various pedestals and spaces had holes below for priests to show “magic”, disappearing and reappearing, as the guide told us. I am personally suspicious of the “people in the olden days were easily fooled” argument, but am in favour of the “everyone likes and believes in rituals” argument.
The idea of worship starting with some seed of truth and then becoming a self fulfilling prophecy as those responsible for the worship taking matters into their own hands will never stop being funny.
Cancun was the least interesting part of the visit. It is also, at least the hotel area, not at all pedestrian friendly. It’s big tourist resorts or nothing.
Chichen Itza, a couple hours from Cancun, was remarkable. Their architecture shows influence from teotihuacan, from toltecs, and there clearly seemed to be trade and information routes between the lands. The Mayan civilisation at least per reading stood for 3600 years, which is an absurdly long length of time.
The cenotes are magnificent. Cenote Xkeken, was a particular favourite, it’s mostly underground with only a shaft of light coming down.
The fact that Mayans ruled for so long in such a dry place with the main water source underground feels quite bizarre. Though once you rationalise by the number of inhabitants maybe it’s fine. Chichen Itza had around 40k, 5x less than Teotihuacan, itself less than Tenochtitlan, and none of them had decent water supply. I do not understand living life in hard mode for that long.
One reason though for the longevity of these civilisations might be survival bias, because but he time a lot of monuments got built without mechanised power it’s already a couple centuries. There’s a funny comparison to be made right California HSR here where we’ve horseshoe theoried our way to construction but I leave that to someone else.
The beaches near Cancun are very good, especially Cozumel, the island where Hernan Cortez first landed. Sting rays and nurse sharks played in the shallows next to our feet at El Cielo. But I’ll be honest I still prefer the beaches of Southeast Asia. Thailand cannot be beaten.
For the number of civilisations that roughly lived side by side at different points in Central America is really impressive. I got GPT to make me multiple maps and websites to help understand this better.
This trip without LLMs would’ve been about 30% as good. Everything from planning to asking about cafes and restaurants to dealing with zocalos to hotels and snacks and history and geography and pretty much anything we wanted to know or learn was made better by GPT, and sometimes Gemini.
Again the sheer number of extremely heavily armed police present in nearly all parts, including highways, was quite striking. They stopped cars at night, frisked folks, and generally were a loud and constant presence. Is this signaling or actual deterrent? Unclear, but everyone states the importance of being sensible and safe.
A substantial proportion of tourists to Mexico City and Oaxaca were Mexican, I think. As a consequence it’s not English language friendly, though again with Google translate and ChatGPT it’s not hard to travel.
I was told by the tourist guides multiple times to not call it Gulf Of America as a form of protest. Everything is politics.
Overall I really liked it, though I understand better why people who don’t have easy access to Asia, like Americans, like it so much more than I did. When it comes to food and markets and the general feeling you’re in a “free” city with limited top down strictures on life, this is the only real choice from North America without braving a really long flight. But I know, or rather I feel, for those you simply cannot beat India or Japan, which are also significantly safer, and have great food and history. Similarly for beaches I’m still a fan of Thailand but by a thin margin. That seems to be the primary motivation for most Americans I know who have gone to Mexico, which seems quite shortsighted to me. Because when you combine all that with its long history and culture, Mexico is pretty great.
2026-01-21 00:16:09
Written with Alex, who writes here, and you should read him! The repo here.
This has become part of a series of essays, evaluating the new “homo agenticus sapiens” that is AI Agents. Part I was seeing like an agent. Part II is why the agentic economy needs money. And this is Part III.
Whitney Wolfe Herd, Bumble’s founder, recently described a future where your AI chats with potential matches’ AIs to find compatibility. Say what you will about AI being involved in your love life, but this is one domain where AI agents can potentially have large returns: the dating/marriage “market” is the epitome of the type of high-dimensional matching problem that Herbert Simon identified as impossible for people to optimize. Rather than optimising, Simon argued people engage in “satisficing”, i.e., settling for good enough.
Why would AI agents be useful here? Let’s start with how most markets work. Hayek’s big insight–outlined in what he called the economic problem of society–was that prices do an incredible amount of work. They compress a ton of information such as preferences, costs, scarcity, expectations into a single number that acts as a sufficient statistic for value. When you’re buying oranges, the seller doesn’t care what you’ll do with them. The price coordinates the transaction and that’s enough.
But prices work best when the transactions involve commodities. When you’re buying some oranges, the seller doesn’t particularly care what you’re going to do with them; you don’t need to convince him that you’ll take care of the fruit. The price does all the work in coordinating that transaction. Matching markets are conceptually different. You can’t just choose your spouse, your employer, or your college: you also have to be chosen by them. This is the domain that Al Roth, the 2012 Nobel winner for “the theory of stable allocations and the practice of market design,” spent most of his career studying. Roth showed that matching markets require careful institutional design; this design includes algorithms, timing, and the right rules to get the market to “clear.” His deferred-acceptance mechanisms now allocate medical residents to hospitals, students to schools, and kidneys to patients.
But the efficiency of matching markets hangs on the ability to elicit a person’s preferences, i.e., that people can express their rank orderings over potential options. But what if people’s preferences don’t fit in dropdown menus or are difficult to articulate on a standardised questionnaire? Peng Shi studied in his excellent paper “Optimal Matchmaking Strategy in Two-Sided Markets.” He looks at online platforms that match customers to providers using a variety of matchmaking strategies, from searching one side of the market to centralised matching that allows for back-and-forth communication.
Shi found that centralised matching works beautifully when preferences are “easy to describe,” i.e., straightforward to elicit using standard questionnaires, but breaks down when they’re contextual, idiosyncratic, or otherwise difficult to express through standard techniques. This is why many platforms still make you search. You want a contractor who shows up on time and knows your budget–this is easy–but you also want someone who understands your tastes in postmodern living room design. Good luck expressing that on a dropdown web form.
Here is where Large Language Models come in. They are fantastic at turning any unstructured piece of information into better structured matching. They’re also eminently scalable, enabling Coasean bargaining. But scaling things brings with it more coordination problems, too many agents negotiating with too many other agents is noisy. So what type of an institutional setup would make most sense to install here, to make this work well?
That’s what we sought to test with our experiments. The question being, could we figure out how and whether LLMs can help in matching markets where preferences are “hard to describe”? Can LLMs actually elicit the dispersed, hard-to-articulate preferences better than standardised methods? And if they can, what happens when LLM-based agents are available to everyone in the market?
Now, there’s some recent work on the topic that suggests guarded optimism that this is possible. Very new work by Ben Manning, Gili Rusak, and John Horton show that, when parsed through LLMs, short natural-language “taste descriptions” can be superior to standard questionnaires for eliciting preferences when the option set is large. They run an experiment where people write a few paragraphs about what they want in a job and then rank between 10 and 100 options (depending on the condition). Consistent with Simon’s conjecture, people’s ranking effort plateaus as the option size grows large; choice quality grows unstable as the consideration set increases. People get tired of ranking a ton of options and just start guessing. AI-parsed “taste descriptions” scale much better: once tastes are written down, the marginal cost of evaluating one more option is negligible for an AI agent. The advantages of AI-parsed matches are even higher in congested markets where people are more likely to be pushed.
But a theoretical paper by Annie Liang offers an important counterpoint in the case of a potentially complex two-sided matching market. She shows that when personality is sufficiently high-dimensional, meeting just two people in person beats searching over infinitely many AI representations. The noise in AI approximations compounds faster than the benefits of scale. This is a very cool result, and you should all read the paper in full–it’s that perfect type of economic theory that’s both conceptually rich and practically useful.
Ok, with that preamble…
We set up a simulated Hayekian marketplace with a whole bunch of digital shoppers, providers as AI agents.
Preference elicitation: Knowledge is dispersed in each digital shopper’s “head”: customers know what they want and providers know what they can offer. We want to know how eliciting the preference–either through the standard intake questionnaire or high-dimensional text parsed by an AI agent–can change the market structure for optimal results.
Mechanism interaction: When elicitation improves, can centralised matching beat search, and what are the conditions under which this happens?
Scale: We then check what happens when everyone uses AI agents
Institutional design: Finally, we figure out the right institutional mechanism to solve the resulting problems, and to maximise welfare
Preferences here are latent vectors in each agent’s head. Both the customer and provider agents have a true weight vector over some set of attributes (6 dimensions in this case). So elicitation changes the platform’s inferred w, not the true w. A standard intake is a structured form, and only exposes a few coarse priorities. The AI intake is free text, back-and-forth chat, and can be parsed in the platform’s inferred weight by a couple mechanisms - either by a rule- based algorithm or an AI agent.itself or , or via GPT parsing.
Figure 1 has an abridged illustration of the design and some results. There’s an appendix at the end of the essay in case you want to check out the details of the experimental design. But without further ado, here are some…
First, AI-assisted preference elicitation improves matches across the board.
Figure 1: Experimental design
Figure 2
Second, as shown in Figure 2, AI-based elicitation changes what type of market design works best, and the conditions under which centralised matching can beat search.
Specifically, “Search” and “centralised” are the two different matching protocols we tested. Search means customers iteratively message providers in some ranked order until the matches ‘stick’. Think about how you would find a plumber–message folks, talk to them, iteratively until one ‘fits’.
Centralised is where the platform computes the shortlist for you, and clears a match based on mutually acceptable terms.
Once dispersed knowledge can be elicited and compressed into usable signals, the platform can centrally clear the market rather than forcing users to search. When knowledge can’t be compressed, search dominates because it lets users do iterative, contextual refinement in the loop.
The core object is the ‘ROI boundary’. If the per action attention cost is high enough, centralisation dominates–it just requires fewer actions. If the cost is low, search can dominate because it can “handle” more actions. This is the very idea of Coasean bargaining helping remove the boundaries of firms.
So where does the value of LLM-based elicitation actually come from? Is it from the back and forth conversation, or the ability to parse large text? As described above, we prompted all of the customers to write some free text about things they like and whatnot, and then used some rules-based parsers and some LLM-based parsers. There’s also the option for conversational elicitation via chat.
We thought the AI agents’ ability to ask follow-up questions would be the game-changer. Turns out though (see Figure 3), most of the value comes from the AI agent simply inferring more signal from messy text compared to the signal in a rule-based parser. This is consistent with the work of Manning et al. that we discussed above. This may of course be something specific to our prompts—perhaps one could obtain further gains by explicitly instructing the AI agents to engage in structured back-and-forth with the customers, and to do so in contexts where this would be helpful, but this was not the case here.
This highlights the utility of LLMs for extracting (potentially high dimensional) signals from unstructured data. Back in the day OKCupid used to make people fill out 90-100 questions to help match them with their potential partners. With LLMs, they might’ve been able to get away with writing a short essay and getting their Agentic Cupid to pull out the relevant information. Whitney is certainly on to something.
Figure 3
But what if people don’t really know what they want, does preference uncertainty matter? Whenever Rohit shops, he’s not sure of what he wants before he goes in the store. There’s a lot of noise in the process. Alex is a pure satisficer: the first item that meets a (very low) threshold gets put in the cart (usually virtual), and off to check out he goes.
We can test for that pretty easily here by introducing a bit of randomness into our shoppers’’ heads. At least in our setting, injecting noise into preferences doesn’t matter for the AI’s ROI all that much. We can still do centralised matching and extract a lot of value from that mechanism—as long as the preference noise isn’t too cacophonous.
We had originally set up a pretty small marketplace. The centralized mechanism at this scale can be computed and cleared so we can run the experiment. But what happens when the scale explodes, both in the number of options and the number of customers potentially using AI agents? This is the problem matching platforms like Upwork are trying to solve: the option set is absolutely huge, but so is the potential customer base.
Every time a customer opens up a marketplace like Upwork, the number of choices just on the front page makes it hard to remember what they came for. Ideally AI-delegated agents can solve this problem: the user speaks or writes down what they want to do, the AI agent pings the platform, and the user is presented with the match. But what if every potential shopper had their own AI agent who wanted to message the providers on the platform? That’s a lot of agents doing individualised message sending to the provider inboxes!
So as you increase the number of customers with AI agents, the level of congestion rises significantly. Each customer agent sends a query to a provider agent’s inbox and it has to respond. Responding to all those agents takes a lot of compute. Here is what happens in our simulation (Figure 4): At full adoption, the providers’ inboxes flood with 5x the amount of requests, response rates collapse from 48% to 2%, and net welfare drops 88%.
Figure 4
Without institutions in place to scaffold the marketplace, a tragedy of the commons emerges: If everyone has an AI agent, it’s almost like nobody does. The paradox of plenty is real, and AI agents create their own version of Jevons paradox.
What can fix this type of congestion? Prices!
As in a previous post–where we showed the importance of money in coordinating trade amongst AI agents–introducing a price mechanism recovers most of the lost welfare in matching. A vindication of Hayek’s deeper insight.
Specifically, we can introduce an exchange and money, such that the agents now have a pricing mechanism to signal their “strength of preference”. The idea is that the complexity falls because now not every provider and customer need to message each other. Prices capture a lot of high dimensional information in a single statistic, streamline a lot of that information, as we’d seen with the simulation in as we saw in the simulation in barter_to_money, complexity falls from O(n2) to O(n).
Figure 5
Pricing works! As shown in Figure 5, most of the welfare gains are recovered and the congestion issues are resolved. LLMs may lower the cost of expressing dispersed knowledge, but they don’t remove the need for institutional design to manage externalities. At least in our experimental simulation, the price system remains essential to solve the issue of complexity and congestion.
If we think about an AI agent economy, we would want to know more about the mechanism that facilitates coordination. First, we have to ask, “If agents lower transaction costs, do markets just happen?”
In a previous post we looked at what would happen if there were a bunch of agents who had to interact with each other to trade, and it turned out that they don’t form markets spontaneously. In fact you have to do a fair amount of work before the agents are ready to interact.
Ok, if markets need scaffolding, what’s the minimal substrate that makes coordination scale? i.e., how will the agents coordinate amongst themselves? Will they be able to develop methods to do so themselves, e.g., through bilateral and multilateral negotiations, or will they need further help. It turns out that no matter how much you want to set things up just so, the agents will still need money and prices to trade efficiently. Even with the lower transaction costs and larger levels of compute, the coincidence-of-wants problem still doesn’t disappear - Hayek remains vindicated.
In this current essay we explore whether LLM agents can make centralised matching more efficient–we should expect marketplace consolidation in categories that were previously too heterogeneous for algorithmic matching, e.g., wedding vendors, specialised consulting, creative services. We showed that in “thin” markets AI agents help facilitate better match quality through centralised mechanisms.
However, if everyone has an AI agent, we still need a pricing mechanism to solve the resulting congestion and complexity problems that arise. Congestion is a serious threat at scale!.
So what is the broader take away from this essay, from the whole series of essays? For us it’s that AI agents work remarkably well when institutional design facilitates the interactions and transactions. Since direct instruction for every eventuality is impossible, the only way to make the AI agents behave at scale is to design the right scaffolding to facilitate coordination and exchange. This involves the creation of markets, and yes, money! If we can learn to design the “institutions” within which the agents operate, then we can help have them do far more complex tasks that we want. Autonomy, that’s the true prize!
Warning: wonky.
We constructed a simulated marketplace where customers seek service providers (contractors) across task categories that vary in how difficult preferences are to articulate. Each customer is seeded with true preferences represented as a 6-dimensional vector of weights (summing to 1) over provider attributes. A match is formed when both sides’ true values clear a threshold.
“Easy” categories include things like TV mounting or furniture assembly; preferences in these categories can be mapped cleanly onto standard form fields such as price, availability, and distance. “Hard” categories, such as ability to repair a historic staircase or a complicated asbestos remediation with specific guidelines, involve preferences that are more difficult to elicit using standardised questionnaires. We then see whether the ROI threshold changes based on how well the models can “elicit the true preferences” of the underlying actors.
The experimental intervention targets the preference-inference pipeline: how customer preferences get translated into data the platform can act on. The experiment varies the intake method (standard structured forms versus free-text descriptions parsed by an LLM) crossed with the matching mechanism (decentralised search where customers browse and choose, versus centralised assignment where the platform matches algorithmically). Match quality is computed as the dot product of the customer’s true preference weights and the matched provider’s attributes, minus any search costs incurred. All of this is summarised in Figure 1 below.
Figure A1: Experimental Design
2025-12-19 22:03:27
Written with Alex Imas, subscribe to his blog here!
This has become part of a series of essays, evaluating the new “homo agenticus sapiens” that is AI Agents. There was Part I, seeing like an agent. This is Part II. And Part III on what happens when we all have AI agents.
Sometimes I forget but we live in a future transformed by information technology pretty much across ever aspect. But one thing has remained largely the same: we still live in a world where the vast majority of economic transactions are done by people. If you want to buy a car, the process is largely the same as it was 50 years ago. You go down to the dealership and negotiate the best price that you can. Sure, you may have some extra information from doing research on the web beforehand - it’s certainly much easier to do comparison shopping with a supercomputer in your pocket - but the basic process of transacting with another human being has largely stayed the same.
One change that’s likely to come though is that there will soon be 10x, 100x, maybe more AI agents working in the world as there exist people. And as we have lots of AI agents working on our behalf, doing all forms of work, then there is a thesis that many of the frictions and information asymmetries that people face in markets may disappear if economic transactions are delegated to aligned agents, leading to a so-called Coasean singularity.
We’re not there yet though. Today’s agents are simply not good enough yet to act sensibly or without strict instructions. Many of the features of human-mediated markets still seem to be reproduced in AI agentic interactions. But as online spaces adapt to the promise of AI technology, it seems natural to think of how agentic markets will be organized. In a future world where we do have billions of AI agents, how would they coordinate with each other? What kind of coordination mechanisms would be needed? What institutions are likely to emerge?
And one possibility is particularly intriguing: will coordination still require money? Not in the sense of US dollars, but a shared medium of exchange and a hub/ clearing protocol.
“Why money” has occupied economists going back to Adam Smith, who framed cash as solving what has since been termed the coincidence of wants. To see what we mean, consider a pure barter economy. Let’s say Alex is an apple farmer and Rohit raises chickens. If Alex wants chickens and Rohit wants apples, then Alex can just walk over to Rohit’s house with a bushel of apples and get some chickens in return. Simple. But what if Alex wants chickens but Rohit wants an electric toothbrush - he has no need for apples right now. Then to get the chickens, Alex would need to find a person who is willing to trade an electric toothbrush for his apples, and then come back to Rohit for a trade.
This would still all be fine if there was just one other person to visit and trade with, but what happens in a large market, with many (many) people who potentially have both an electric toothbrush to trade and want Alex’s apples? In order to trade, Alex needs to happen to find a person that both 1) has what Alex wants and 2) wants what Alex has. As very nicely shown in a paper by Rafael Guthmann and Brian Albrecht, the need to satisfy this coincidence of wants through finding matches creates complexity that quickly blows up as the size of the market increases. If the market is even moderately large, this complexity makes even basic transactions essentially impossible.
Ergo money. While the origin of money is a hot topic of debate (e.g., see David Graeber’s excellent book Debt: The First 5000 Years), the role of money in a competitive market is to solve the coincidence of wants. Money acts as a special type of good called the numeraire, where its only role is that it can be exchanged for other goods at pre-determined quantities. These quantities are reflected in the prices that each good is worth.
Going back to Alex and Rohit: one way to solve the coincidence of wants would be for Alex to sell his apples at a special place called market and then to use the money to purchase Rohit’s chickens. Rohit can then use that money to buy an electric toothbrush, or indeed any other thing his heart desires. Money eliminates the need for people to coordinate their transactions based on their current endowment (what they have) and preferences (what they want).
Okay, so money is necessary to coordinate transactions in an economy with people. This is largely because each individual can’t hope to have enough information on what everyone else has and wants to reliably engage in market transactions. Alex and Rohit are as yet, sadly, mortals.
But will this be the case for AI agents?
Agents do not have the same computational constraints as human beings. In theory, it may be possible to solve the search problem where the coincidence of wants becomes a non-issue. In that case, the agentic economy could eliminate the need for a key institution of the human economy. We decided to run an experiment to find out.
First, the repo here. We can have N agents, with N goods, and each starts with its own good and wants another. There’s multiple rounds, one action per agent per round. Agents decide their course of action via structured JSON, and success simply means you get what you want.
The first question is about a pure barter economy. We explore whether LLM agents can achieve efficient allocations through barter at any scale, i.e., to engage in multiple bilateral negotiations to achieve gains from trade. The agents in the experiment have no real shortage of time. If this works then Coasean bargaining should be straightforward; goodbye money!
The table below has the results. What do we see? When the scale is small - when Alex just has to worry about coordinating with Rohit - all of this works. But as the number of agents grows, things start to get really difficult. By the time we get to even 8-12 agents the number of successful transactions drops to below 50%. And this is the absolute simplest setting.
Perhaps this should be expected. The problem is still O(n2) in complexity, which grows exceptionally fast as the number of agents grow. And if this isn’t just bilateral, but starts to include multiparty negotiations, it might become O(n!), which is far bigger for any number bigger than 3.
Ok let’s make it a bit easier for the agents. If they can’t talk to each other, since they are agents anyway, we should be able to give them omniscience. Enter Central Planning. There has been plenty of work before in the limits of bilateral negotiations, but we can test how well a “hub” structure can help. Does having a central planner help set the stage for better performance?
As the results table shows, central planning makes things slightly better, but we are still very much in a world of the Hayekian troubles. A hierarchy without a numeraire just isn’t enough.
Ok, we can continue looking at our human history to see what else we can do. In Debt, David Graeber argues that money emerged at least in part through state power, to enforce the paying of taxes in order to fund foreign wars. Before this, he argues, IOUs and bartering seemed to have worked just fine to manage the economy; the IOUs themselves became a sort of numeraire that could be traded in order to solve the coincidence of wants.
So let’s introduce,Credits and IOUs. We can give the agents the ability to give each other an IOU and see whether providing the basics of credit allows them to come up with better ways to interact with each other.
This still didn’t help as much as we thought. There were a few segments where the transactions started happening, but they really didn’t start to work. Or scale.
Most interestingly, the concept of money didn’t emerge from this, not organically. IOUs didn’t become money. Even though in conversations LLMs all know that this is the smart thing to do, it did not emerge.
This was a bummer, because as with the prior research, what this shows is that AI agents do not yet come with the natural instincts of humans to turn IOUs into a numeraire that acts as a stand-in for money. They don’t even come with the same set of ideas as this sea otter.
Ok, let’s take the final step and actually introduce Money. We do this by creating an exchange where the agents can do bids and offers, and look at market outcomes. The results are stark: markets resolve at a success rate of 100% and much faster than through other mechanisms, at the rate of O(n).
One note is that this result presumes the exchange works without a hitch. In reality there will be friction coming from liquidity constraints, differential compute resources, etc. For example, in the N=8 run, the hub handled 23 inbound + 23 outbound messages and prices stayed fixed. And if regulations require that AI agents use different types of country-specific currencies, then exchange rates will complicate things further.
To sum: An agentic economy doesn’t emerge automatically with even SOTA agents (who really should know better). Barter and central planning remain inefficient and infeasible, and money does not emerge organically even when credit and IOUs are introduced. At least in our setting, an agentic economy needs more top-down engineering to become efficient.
Previous work on agent-based modeling has explored what kind of emergent economic realities we are likely to see with rule-based agents interacting. The world of AI agents is fundamentally different. These agents act based on a huge corpus of human knowledge, with the underlying LLM models able to solve incredibly difficult problems on their own. These agents can plan, they can negotiate, they can code. And even with all this knowhow at their disposal, it’s interesting to see that they still appear to require top-down institutions to create an effective and efficient market.
As we transition to a more agentic economy, a key part of ‘getting ready’ for that world is setting up institutions for the agents. Like including:
Identity and roles
Settlement and payment
Pricing and quote formats
Reputation
Marketplaces and clearinghouses
This is by no means exhaustive, but we wager that mechanism design for multi-agent work is going to be a rather fertile area of research for a while. Humanity went through millennia of evolution to figure out the right societal setup that lets us progress, that lets us build a thriving civilisation.
It is both necessary and inevitable that the world of AI agents will also need the equivalents, though the emergence of such institutions will likely be much faster given the millennia of human knowledge that we’ve already amassed.